Valtonen says that this has made the CPU the weakest link in computing in recent years.
This is contrary to everything I know as a programmer currently. CPU is fast and excess cores still go underutilized because efficient paralell programming is a capital H Hard problem.
The weakest link in computing is RAM, which is why CPUs have 3 layers of caches, to try and optimize the most use out of the bottleneck memory BUS. Whole software architectures are modeled around optimizing cache efficiency.
I’m not sure I understand how just adding a more cores as a coprocesssor (not even a floating-point optimized unit which GPUs already are) will boost performance so much. Unless the thing can magically schedule single-threaded apps as parallel.
Even then, it feels like market momentum is already behind TPUs and “ai-enhancement” boards as the next required daughter boards after GPUs.
For example: memcpy, which is one of their claimed 100x performance tasks, can be IO-bound on systems, where the CPU doesn’t have many memory channels. But with a well optimized architecture, e.g. modern server CPUs with a lot more memory channels available, it’s actually pretty hard to saturate the memory bandwidth completely.
This is contrary to everything I know as a programmer currently. CPU is fast and excess cores still go underutilized because efficient paralell programming is a capital H Hard problem.
The weakest link in computing is RAM, which is why CPUs have 3 layers of caches, to try and optimize the most use out of the bottleneck memory BUS. Whole software architectures are modeled around optimizing cache efficiency.
I’m not sure I understand how just adding a more cores as a coprocesssor (not even a floating-point optimized unit which GPUs already are) will boost performance so much. Unless the thing can magically schedule single-threaded apps as parallel.
Even then, it feels like market momentum is already behind TPUs and “ai-enhancement” boards as the next required daughter boards after GPUs.
Eh, as always: It depends.
For example: memcpy, which is one of their claimed 100x performance tasks, can be IO-bound on systems, where the CPU doesn’t have many memory channels. But with a well optimized architecture, e.g. modern server CPUs with a lot more memory channels available, it’s actually pretty hard to saturate the memory bandwidth completely.