| « Failing hardware part 2 | A very minimal desktop » |
Failing hardware?
Oh, how the hardware hates me. My poor Gentoo development box and primary desktop workstation has been suffering a long string of random lockups lately.
It's not heat. CPU temps are anywhere from 26C to 42C depending on load. GPU between 48C and 62C depending on load. Hard disks are steady at about 30C. All well below anything approaching a danger zone.
I figure it ain't my power supply. I have a quality Seasonic SII-380W that's given me two years of sterling service. My system doesn't come close to exceeding its capacity.
I tend to get more frequent lockups when doing 3D-intensive stuff, such as playing UT2004. That's when I get the most lockups. However, they also happen when I'm just doing desktop work or have a browser open. I normally run Xfce with the compositor enabled, so at first I suspected it was unstable. Turning it off made no difference to the frequency of lockups. Scratch that.
Today I cleaned out the machine, getting rid of a fair amount of dust. I had to remove the graphics card to get at its cooling fins, and ever since reinstalling it and rebooting, there are minor graphical glitches covering the screen at bootup, at least until the initrd is loaded. Everything's fine once fbsplash and X kick in. Maybe I shoulda wiped off the PCIe contacts or something?
About the only thing I can do to check the graphics issue is upgrade my drivers, which I'm doing now. I'll go with the latest 177 version. nVidia has listed several stability fixes that may help.
I checked my logs -- there aren't any messages that give me immediate clues. No kernel panics or graphics errors or anything useful in dmesg. Except possibly one thing: in every case of a lockup, immediately before and during it there's a stream of bad/invalid packets running in and out of my NIC.
I'm only noticing this because I have my iptables rules setup to notify me anything that doesn't fit all the chains I've established. I'm wondering if my NIC is on the fritz; perhaps the hardware is getting confused, and it's doing something to . . . something else.
Coincidentally, my router has been quite screwy lately; I usually have to powercycle it in order to connect; it just stops working overnight when my computer's off. And even during the day, it tends to go down while I'm in the middle of doing something. My router's weird, and I have weird stuff in my NIC logs. Could they be related?
The only test I can do right now is start using my other NIC; my motherboard comes with two, each with a separate controller. And I have a few spare ethernet cables, though in my experience they aren't as fragile/problem-prone as SATA cables. May as well swap those too.
To sum up, if the easy software fixes or NIC swaps don't work, I'm going to have to throw hardware at the problem until I get a solution. I could spend $30 (new NIC or router), $140 (graphics card), or even $400 for a new motherboard and CPU.
I hate troubleshooting hardware. It gets expensive real fast.
Trackback address for this post
Trackback URL (right click and copy shortcut/link location)
6 comments
Even if it's not a permanent solution it might help to isolate the problem.
No, I hadn't thought of that. I'll try it out. So far I've tried the latest hardmasked binary driver, and it doesn't make any difference.
Also tried switching NICs; no joy.
There's further graphical corruption during the early boot stages; I can't even reach the grub screen now. It just loads Gentoo without giving me access to the selection screen. Lots of sparklies and other artifacts on the screen.
peak voltages can be the cause for such random, erratic hangups. There are special power outlet filters (check your local electronics dealer, probably radio shack in the US?) that smooth such peak voltages or you could buy an UPS (uninterruptible power source) and have the added benefit of no power losses.
That is how I ended up with my 2nd desktop awhile back. Kept replacing stuff until I only needed one part (the broken one) for my "new" desktop. sigh..
I own a computer which hangs every time I'm playing a high resolution movie or running any cpu intensive task. Watching the temperatures I noticed that the hangups occurre because the chipset becomes too hot. It's very hard to notice because the temperatures are ok most of the time and when they become high the system hangs, the only way to see it is if you keep a log of the temperatures.