The “problem” is that the more you understand the engineering, the less you believe Intel when they say they can fix it in microcode. Without writing an entire essay, the TL/DR is that the instability gets worse over time, and the only way that happens is if applied voltages are breaking down dielectric barriers within the chip. This damage is irreparable, 100% of chips in the wild are irreparably damaging themselves over time.
Even if Intel can slow the bleeding with microcode, they can’t repair the damage, and every chip that has ever ran under the bad code will have a measurably shorter lifespan. For the average gamer, that sometimes hasn’t even been the average warranty period.
+1. Lots of people are also likely to not have any idea about the situation and just think their PC crashes or acts up more. More of these issues can pop up over time.
A recall forces them to notify customers of the issue so the customer can act on it.
They can most likely prevent further breakdown through software. If the meters and controls are functioning correctly, they can undervolt the CPU. But it’s not really a fix if that comes with a performance penalty. If it’s a bug where the CPU maxes out the voltage when idle so it can do nothing faster, that could be fixed with no performance penalty, but that seems unlikely.
I’m sorry but this is just a fundamentally incorrect take on the physics at play here.
You unfortunately can’t ever prevent further breakdown. Every time you run any voltage through any CPU, you are always slowly breaking down gate-oxides. This is a normal, non-thermal failure mode of consumer CPUs. The issue is that this breakdown is non-linear. As the breakdown process increases, it increases resistance inside the die, and as a consequence requires higher minimum voltages to remain stable. That higher voltage accelerates the rate of idle damage, making time disproportionately more damaging the more damaged a chip is.
If you want to read more on these failure modes, I’d recommend the following papers:
L. Shi et al., “Effects of Oxide Electric Field Stress on the Gate Oxide Reliability of Commercial SiC Power MOSFETs,” 2022 IEEE 9th Workshop on Wide Bandgap Power Devices & Applications
Y. Qian et al., “Modeling of Hot Carrier Injection on Gate-Induced Drain Leakage in PDSOI nMOSFET,” 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications
The “problem” is that the more you understand the engineering, the less you believe Intel when they say they can fix it in microcode. Without writing an entire essay, the TL/DR is that the instability gets worse over time, and the only way that happens is if applied voltages are breaking down dielectric barriers within the chip. This damage is irreparable, 100% of chips in the wild are irreparably damaging themselves over time.
Even if Intel can slow the bleeding with microcode, they can’t repair the damage, and every chip that has ever ran under the bad code will have a measurably shorter lifespan. For the average gamer, that sometimes hasn’t even been the average warranty period.
+1. Lots of people are also likely to not have any idea about the situation and just think their PC crashes or acts up more. More of these issues can pop up over time.
A recall forces them to notify customers of the issue so the customer can act on it.
They can most likely prevent further breakdown through software. If the meters and controls are functioning correctly, they can undervolt the CPU. But it’s not really a fix if that comes with a performance penalty. If it’s a bug where the CPU maxes out the voltage when idle so it can do nothing faster, that could be fixed with no performance penalty, but that seems unlikely.
I’m sorry but this is just a fundamentally incorrect take on the physics at play here.
You unfortunately can’t ever prevent further breakdown. Every time you run any voltage through any CPU, you are always slowly breaking down gate-oxides. This is a normal, non-thermal failure mode of consumer CPUs. The issue is that this breakdown is non-linear. As the breakdown process increases, it increases resistance inside the die, and as a consequence requires higher minimum voltages to remain stable. That higher voltage accelerates the rate of idle damage, making time disproportionately more damaging the more damaged a chip is.
If you want to read more on these failure modes, I’d recommend the following papers:
L. Shi et al., “Effects of Oxide Electric Field Stress on the Gate Oxide Reliability of Commercial SiC Power MOSFETs,” 2022 IEEE 9th Workshop on Wide Bandgap Power Devices & Applications
Y. Qian et al., “Modeling of Hot Carrier Injection on Gate-Induced Drain Leakage in PDSOI nMOSFET,” 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications