At this point I think speculation attacks are almost being accepted as the price of having high performance processors. It’s almost impossible to rewind all non-architectural state when you hit a mis-speculated branch.
the mitigations just have bugs, and bugs can be fixed
I’m not convinced it won’t be a thing of the past after some time
I’m afraid as long as you have shared architecture you will always have side channel data leaks. The only true mitigation is dedicated resources per compute item. So dedicated cores, dedicated cache etc
CPUs have so many cores these days, that seems like a perfectly reasonable option. Declare a process ‘security sensitive,’ give it it’s own core & memory, then wipe it when done.
I wish it were that easy, there’s a lot of shared architecture in CPU design. So maybe there’s cache lines that are shared, those have to be disabled.
Architecturally, maybe memory tagging for cash lines that in addition to looking at the TLB and physical addresses also looks at memory spaces. So if you’re addressing something that’s in the cache Even for another complete processor, you have to take the full hit going out to main memory.
But even then it’s not perfect, because if you’re invalidating the cache of another core there is going to be some memory penalty, probably infotesimal compared to going to main memory, but it might be measurable. I’m almost certain it would be measurable. So still a side channel attack
One mitigation that does come to mind, is running each program in a virtual machine, that way it’s guaranteed to have completely different physical address space. This is really heavy-handed, and I have seen some papers about the side channel attacks getting leaked information from co guest VMs in AWS. But it certainly reduces the risk surface
The trouble is that “core” is just that. The heart of the processor. There’s a lot of shared state in the caches and the TLBs which is all common to multiple cores.
The only way to do that is to completely disable Out-of-order execution to begin with and disable any shared caches, which would completely neuter modern CPUs. Not a little bit, that’s going to be around ~30% of the prior performance - not a 30% loss, a 70% loss…
From ChatGPT- (query: How much performance would a modern Zen 5 or Intel Alder Lake CPU lose if you completely stripped out/disabled SMT, Out of Order Execution and shared caches - operating in-order and only using dedicated (non-shared) caches?)
Stripping out or disabling key performance-enhancing features like Simultaneous Multithreading (SMT), Out-of-Order Execution (OoOE), and shared caches from a modern CPU based on architectures like AMD’s Zen 5 or Intel’s Alder Lake would result in a significant performance loss. Here’s an overview of the potential impact from disabling each feature:
-
Simultaneous Multithreading (SMT)
Impact: SMT allows a single core to execute multiple threads simultaneously, improving CPU throughput, especially in multi-threaded applications. Disabling SMT would reduce the ability to handle multiple threads per core, decreasing performance for multi-threaded workloads. Expected Loss: Performance drop can be around 20-30% in workloads like video encoding, rendering, and heavily threaded applications. However, single-threaded performance would remain relatively unaffected.
-
Out-of-Order Execution (OoOE)
Impact: OoOE allows the CPU to execute instructions as resources become available, rather than in strict program order, maximizing utilization of execution units. Disabling OoOE forces the CPU to operate in-order, meaning that it would stall frequently when waiting for data dependencies or slower operations, like memory access. Expected Loss: This could lead to performance drops of 50% or more in general-purpose workloads because modern software is optimized for OoOE processors. Tasks like complex branching, memory latency hiding, and speculative execution would suffer greatly.
-
Shared Caches (L2, L3)
Impact: Shared caches (particularly L3 caches) help reduce memory latency by sharing frequently accessed data among multiple cores. Disabling shared caches would increase memory access latency, causing more frequent trips to slower main memory. Expected Loss: Performance could drop by 15-30% depending on the workload, especially for applications that benefit from high cache locality, such as database operations, scientific simulations, and gaming.
-
Operating In-Order Only with Dedicated Caches
Overall Impact: Without OoOE and SMT, and with only in-order execution and dedicated caches, the CPU would be much less efficient at handling multiple tasks and hiding latency. Modern CPUs rely heavily on OoOE to keep execution units busy while waiting for slow memory operations, so forcing in-order execution would significantly stall the CPU. Expected Loss: Depending on the workload, the overall performance degradation could be upwards of 70-80%. Some specialized applications that rely on high parallelism and efficient cache usage might perform even worse.
Summary of Overall Performance Impact:
Single-threaded tasks: May see performance drop by 50-70% depending on reliance on OoOE and cache efficiency. Multi-threaded tasks: Could experience a combined drop of 70-80%, as the lack of SMT, OoOE, and shared caches compound the inefficiencies.
This hypothetical CPU configuration would essentially mimic designs seen in early microprocessors or microcontrollers, sacrificing the massive parallelism, latency hiding, and overall efficiency that modern architectures provide. The performance would be more in line with processors from a couple of decades ago, despite the higher clock speeds and core counts.
Case in point, it’s not feasible, if you’re looking for that in your own computer, you can do it already. I doubt anyone will follow you though.
-
I can’t wait for a non speculative execution, non spooked, not glowing cpu. I honestly don’t care how slow it would be, so long it can run Linux, firefox and VSCodium (or if forced, i’ll learn Neo Vim). I just hope RISC will make my dreams real.
I’m no expert here, but I’m pretty sure branch prediction logic is not part of the instruction set, so I don’t see how RISC alone would “fix” these types of issues.
I think you have to go back 20-30 years to get CPUs without branch prediction logic. And VSCodium is quite the resource hog (as is the modern web), so good luck with that.
Guess I’m fucked 🥲
thank you for your infomative answer
Got a 486 DX4 to sell you 🤣
Why do AMD always have such a terrible response to these vulnerabilities? The article seems to suggest they’ve just decided to ignore this. They almost left zen 2 CPUs out of the Sinkclose fix and they took ages to release the Zenbleed fix for consumer CPUs despite it being available for enterprise ones when the vulnerability was released. And their microcode patches on Linux are only for server CPUs, desktop CPUs have to hope that their motherboard vendor releases a firmware update fairly quickly