I’ve set up my gaming computer with dual booting between Fedora Linux and Windows 11. The Windows 11 installation doesn’t see much use these days since PlanetSide 2 became available on Linux. For the last two months, the system has suffered from stuttering issues during regular use and gaming. The display stops painting or gets partially corrupted, audio goes silent, and all operations halt for 0,5–2 seconds. Sometimes, the system even crashes, but it usually recovers by itself.
The unusual thing about this problem is that it manifests the same on both Linux and Windows. In my earlier experience with similar problems, they’ve often been caused by driver issues. Drivers are operating system (OS) specific, and it’s unlikely that two different OSs should have the same problem. I feared it was a hardware issue, but I struggled to identify the problem.
I’ve run dozens of performance benchmarks, system stability and latency tests, and tests designed to detect specific hardware problems. Tests included MdRes, MemTest, LatencyMon, Prime95, PCMark, 3DMark, OCCT, BurnInTest, and others. Every test came back clean. I found it even stranger that the stutters didn’t negatively affect any performance benchmark tests. The problem was observable, but not measurable. Oh, joy.
It was difficult to rule anything out as an issue like this one can be caused by just about any component in the system. I’ve installed Windows and Linux on separate storage media, so I was just about the only component I could rule out. My ASUS mainboard has a bug where it randomly doesn’t initialize the first M.2 connector at boot. So, while storage issues complicated my troubleshooting it’s not the cause of this fault.
I tried removing hardware and changing different UEFI settings to see if anything could help to isolate the issue. You can waste days experimentally flipping switches and trying combinations of different settings. Unfortunately, nothing I tried made any positive difference.
However, I was put on the right track when checking whether there were any updates available for my ASUS PRIME X570-P mainboard. I had already installed the latest version, but the latest changelog mentioned that it had fixed an issue with the firmware-backed trusted platform module (fTPM) causing stuttering!
There is a known problem with stuttering with some AMD-brand processors and mainboard combinations. I was a bit confused why this issue hadn’t shown up in my search results before; I must have tried a hundred keyword combinations that should have dug up this tidbit.
Anyhow, I ran out just before closing hours and snatched up compatible hardware Trusted Platform Module (TPM). I bought the ASUS TPM-SPI module. There are three different TPM connectors, so check that you get the right one! Minutes after I’d installed it onto my mainboard, I could confidently say that things had finally gotten better!
The stuttering didn’t go away completely, nor did it reduce how often it happen. the duration of the stuttering was significantly reduced on both Linux and Windows. Instead of lasting up to two seconds, the screen and audio only cut for a fraction of a second.
Microsoft made a TPM version 2.0 module a requirement to run Windows 11. I had to audit my Linux installation to verify this, but it isn’t set up to use the TPM at all. I don’t understand how the fTPM should affect Linux, but its presence might cause some stability issues somewhere in the system. It could also have been introduced by the very mainboard firmware update that was supposed to address the issue.
I removed the hardware TPM, and the stuttering immediately got worse in both Linux and Windows. The mainboard reverts to fTPM if the hardware TPM is removed. It doesn’t have a way to disable the TPM. The stuttering got better again after I reinstalled the TPM. The fTPM clearly affects the system stability, even under an OS where it’s not in use.
I had finally made some progress after spending a dozen hours on the issue over several days. I still don’t know what the underlying problem is, but at least all my random trial and error had resulted in a change for the better.
On a whim, I tried reseating (unplugging and replugging) the memory modules. I was just about out of other things to test. I had run multiple memory speed and stability tests, and they had not detected any problems.
I first booted into Windows and didn’t notice any improvements. However, I immediately noticed improvements when I booted into Linux. The stuttering was almost completely gone!
The desktop still sporadically locks up for a fraction of a second now and then. Occasionally, it also hangs indefinitely or crashes. However, it doesn’t happen when playing games or watching videos anymore. Importantly, it now always recovers from the stuttering and no longer locks up indefinitely. Nevertheless, I jotted it down as a success.
I guess the lesson I learned from this experience was to not trust software-backed assessments of hardware. The slots for the memory modules on my mainboard are especially finicky to click into place. You need to apply an excessive amount of force to install the modules. I presumably must not have done it properly when I installed them two years ago. Replugging them shouldn’t have reset or influenced anything else on the mainboard.
I’ve previously experienced another connector problem that caused subtle and difficult-to-troubleshoot problems. My gaming computer is stationary, so the connector problem must have been caused by something like thermal cycling. Whatever the cause, the faults must have been error-correctable at some level to a degree where no test tools noticed the issue. I haven’t got memory modules with error-correcting code (ECC). They’re too slow and too expensive for a gaming-focused system. However, modern memory modules are so dense that errors are expected, so they need to include some simpler internal error-correction mechanisms.
At this point, I pulled out the TPM and reverted to the fTPM again. I expected that it wouldn’t impact the system stability and the stuttering anymore. Country to my belief, this reintroduced the stuttering on Linux. I switched back to the hardware TPM, and the stuttering stopped. Whatever the underlying cause, it’s caused by a complex set of factors.
Windows was still stuttering, though. The stuttering must be caused by yet another issue. At this point — with two other underlying problems out of the way — I had more luck with a more traditional troubleshooting technique that I’d already tried and dismissed as ineffective.
I’ve been using the stable driver for my AMD Radeon RX 6700 XT graphic card (GPU). It was last updated nearly four months ago. AMD has released four beta releases since the last stable release. The changelog for all four betas mentioned that they address stuttering and stability issues.
I upgraded to the latest beta release and the stuttering on Windows disappeared completely. I had already tried using an earlier beta release but had reverted to the stable version after it didn’t fix the stuttering. A bit curious, I removed the hardware TPM one last time to see what would happen. The stuttering re-manifested with the fTPM, so I quickly reinstalled the hardware TPM.
I also needed to find a solution for the desktop crashes on Linux. As I’ve mentioned, it doesn’t crash when I’m watching videos or playing games. So, the system seems to handle high loads fine but not lower loads. I disabled Dynamic Power Management (DPM) in the Kernel, and that seemed to help a lot.
I went from experiencing one or two crashes every hour to maybe once every four hours when I disabled DPM. I’d already tried this workaround some weeks earlier, but it didn’t make a difference with all the other system issues. The AMDGPU driver has a history of stability issues with DPM. Disabling it is a commonly suggested workaround for any driver crashes.
I tried one more time to remove the TPM while DPM was disabled on the GPU. The system started stuttering again, so I quickly reinstalled the TPM. I expect that this is the last time I’ll ever attempt to remove it. It observably increases system stability for both Windows and Linux.
I’ll be the first to admit that I have no idea what’s going on inside my computer. It’s a common setup with average hardware components. Gaming and computer enthusiast forums are filled with people complaining about similar issues with identical or similar hardware. What works for some doesn’t work for others.
It took me weeks to fix these problems. Without any clear indications of what was wrong, I was left with random experimentation to see if I could influence the observed behavior. By the time my system was up and running reliably again, I expect most consumers would have given up and either bought a new computer or found another hobby.
I would probably not have gotten anywhere without decades’ worth of hard-fought-for experience. Even so, my three-part solution was mostly discovered by random chance and an unwillingness to admit defeat.
It’s incredibly difficult and time consuming to resolve this type of difficult-to-diagnose hardware problem. There simply aren’t any good tools or reliable methods to detect and diagnose hardware issues. Consumers are left to fend for themselves as they can’t even accurately point the finger at any one hardware vendor as being the culprit behind their hardware vows.
Consumers either need to spend hours and become experts in the field or find another hobby. The computer enthusiast and gaming markets will die unless vendors can come together to develop better standards for troubleshooting tools. We need better fault detection and reporting, more insights and logs, and it needs to come from every component and level.
I believe more in initiatives to make problems easier to diagnose than I believe the industry will magically overcome entropy and incompatibility issues. It can’t develop more reliable products without overcoming the same lack-of knowledge and insight that its consumers face on their own every day.