it’s really a miracle how all of this is held together tbh while being so cross-platform
the core engine of torch, which contains things like vector calculus (automatic differentiation), some tensor operations, data preprocessing, data de/serialisation, et cetera, is written in regular C++, so it basically runs on anything that a C++ compiler could target, which is basically everything
the problem starts when you want to add gpu acceleration in order to speed up things like matrix multiplication (which is typically the most computationally expensive part of the machine learning pipeline)
when torch (and other ml libs) started out, cuda was basically the most advanced, easiest to use lib for gpu compute (probably still is), and nvidia gpus were far superior to anything the competition could offer, and ml on mobile devices wasn’t a thing, so everyone went for it and for a long time ml existed almost solely on devices with nvidia graphics cards that could support cuda
then amd and arm started to catch up, and things like amd rocm was added to support amd gpus, vulkan was added to support both gpus on mobile devices and also nvidia and amd gpus, and at the moment all of this exists in this kind of mess, where practically all functionality is supported if you use cuda, but for rocm and vulkan a lot of things don’t work, and you often have to compile everything from scratch for things to be supported
and now all of this mess is wrapped in python to simplify the api, which was a big mistake in my opinion, bc not only is the api simplification unnecessary, but now if you want to target any specific architecture, it must be supported by the core torch engine, some version of a gpu compute lib (unless you want to do inference on the cpu, which you prolly don’t), and the python wrapper
so now, bc you want everything to work out of the box, all of these things are put into a binary, which results in this huge file size, and i imagine the maintenance of torch is pretty hard at least partially as a result of this
if you were building something like torch today, things are a lot more simple, bc you could just write the core engine in smth like C++, and then use smth like vulkan kompute, which is the name of a wrapper api around regular vulkan, but massively simpler and more user-friendly, and supports every gpu under the sun, and boom, you have an much more concise and easily maintainable library
it’s really a miracle how all of this is held together tbh while being so cross-platform
the core engine of torch, which contains things like vector calculus (automatic differentiation), some tensor operations, data preprocessing, data de/serialisation, et cetera, is written in regular C++, so it basically runs on anything that a C++ compiler could target, which is basically everything
the problem starts when you want to add gpu acceleration in order to speed up things like matrix multiplication (which is typically the most computationally expensive part of the machine learning pipeline)
when torch (and other ml libs) started out, cuda was basically the most advanced, easiest to use lib for gpu compute (probably still is), and nvidia gpus were far superior to anything the competition could offer, and ml on mobile devices wasn’t a thing, so everyone went for it and for a long time ml existed almost solely on devices with nvidia graphics cards that could support cuda
then amd and arm started to catch up, and things like amd rocm was added to support amd gpus, vulkan was added to support both gpus on mobile devices and also nvidia and amd gpus, and at the moment all of this exists in this kind of mess, where practically all functionality is supported if you use cuda, but for rocm and vulkan a lot of things don’t work, and you often have to compile everything from scratch for things to be supported
and now all of this mess is wrapped in python to simplify the api, which was a big mistake in my opinion, bc not only is the api simplification unnecessary, but now if you want to target any specific architecture, it must be supported by the core torch engine, some version of a gpu compute lib (unless you want to do inference on the cpu, which you prolly don’t), and the python wrapper
so now, bc you want everything to work out of the box, all of these things are put into a binary, which results in this huge file size, and i imagine the maintenance of torch is pretty hard at least partially as a result of this
if you were building something like torch today, things are a lot more simple, bc you could just write the core engine in smth like C++, and then use smth like vulkan kompute, which is the name of a wrapper api around regular vulkan, but massively simpler and more user-friendly, and supports every gpu under the sun, and boom, you have an much more concise and easily maintainable library