Page 1 of 1
600X fell over dead -- resuscitation suggestions?
Posted: Sun Aug 20, 2006 4:18 pm
by Chris Thorne
TP600x here, a 450 unit upgraded to a 650. Has been relatively stable for some time save a minor hassle with a wonky backlight.
Was using it about an hour ago in Win2K Pro, when the UI suddenly froze. Mouse pointer disappeared. No response to keyboard. No response to three-finger salute. Closing and opening lid did not sleep the system.
Eventually resorted to pulling the AC power out and removing the battery. On returning power to the system, noted that there is no boot beep. The IBM memory test and splash screen do not appear. The hard drive LED use indicator comes on at once and stays on continuously. The green LEDs to the right of the hard drive indicator light briefly and then go out. That's it otherwise. It does this even if the hard drive has been removed from its bay.
I can't F1 into the diagnostics in this condition. Am wondering if that was a fatal CPU overtemp just experienced. This thing has consistently run very warm.
Posted: Sun Aug 20, 2006 4:45 pm
by rkawakami
Standard troubleshooting advice applies here: remove all hardware from the system (HDD, optical, memory, USB, PCMCIA, MiniPCI, etc.) and see if you can boot into BIOS. Assuming you haven't disabled the 64MB onboard memory via a BIOS edit, then you should get some response - memory report, IBM splash screen, beep. If not, then your problem is most likely bad onboard memory or bad CPU. Since you can disable the onboard memory easier than swapping CPUs, try this first. Instructions can be found in this thread:
http://forum.thinkpads.com/viewtopic.php?t=8720
Be sure to install one memory module after doing this. If that doesn't help, then the last recourse is to swap CPU modules.
Posted: Mon Aug 21, 2006 2:25 am
by Chris Thorne
Thanks, Ray.
I left the unit alone for a few hours and attempted to boot it upon return. Completely nominal memory check, splash screen, normal Win2K boot.
System worked beautifully for half an hour and then hung again. A hard power cycle left it again unable to boot.
I'll try removing components sequentially tomorrow morning.
Posted: Mon Aug 21, 2006 4:06 am
by rkawakami
Hmm.... sounds like a thermal issue so your first impression that it is being caused by an overheating CPU could be correct. Download and run the MobileMeter program from this site:
http://www.geocities.co.jp/SiliconValley-Oakland/8259/
The link to the ZIP file is near the top of the page. The program can monitor temperature sensors within the laptop. Does your fan run normally? By that I mean does it spin faster/louder when you are running a program which takes up a large percentage of the CPU cycles?
Posted: Mon Aug 21, 2006 12:01 pm
by Chris Thorne
rkawakami wrote: Does your fan run normally? By that I mean does it spin faster/louder when you are running a program which takes up a large percentage of the CPU cycles?
I do not recall having witnessed that behavior.
The fan does appear to be sensitive to long periods of use. After the machine has been on for a few hours and has built up some internal warmth, the fan will come on at a low level and will increase to what I assume is max speed within a few minutes. It will do that even with relatively undemanding Web browsing. I doubt that sustained processor utilization is ever over 50% for any length of time.
This system is running with the original 450Mhz fan and heat sink. I am aware that the faster CPUs have a heatsink of different design, but I had no access to one at the time I installed the 650Mhz CPU.
My assumption was that any thermal instability would manifest itself early on and that I would be able to go acquire a new H/S if required. The machine has in fact been beautifully stable for months, apart from having to Fn-F11 it to boot or wake up.
Assuming that the system is bootable again today, I will download the app you suggested and will report back with findings later on. Thanks!
Posted: Mon Aug 21, 2006 4:17 pm
by Chris Thorne
Okay, the system became bootable again after a few hours of cooling down. I downloaded MobileMeter and installed it. I am not sure how to parse the output of MM, which is very terse, but the topmost temperature reading on its list (which I assume is CPU temp) was 55C by the time I launched the app, and which climbed rapidly to 65C.
At that point I decided to shut the system down cleanly rather than risk a munged filesystem (the previous overtemp lockup had left a real mess).
I attempted to reboot immediately afterward in order to determine if the system was responsive. It would not boot.
I waited 90 minutes and tried again. System booted. I ran MM at once and found the temp to be 33C. It rose at the rate of about 4C/min for the next several minutes until I once again did a protective shutdown.
The fan does run when the system boots, and while W2K is loading. After that, it goes quiescent, and it does not run even when the CPU temp is climbing into the danger zone.
So the sensor certainly seems to be registering the overtemp condition, and the fan is physically capable of running, but the fan is not being throttled up when it should be.
Posted: Tue Aug 22, 2006 1:05 pm
by Chris Thorne
Curiouser and curiouser, said Alice.
So I had opened the case to swap subcards (in addition to the CPU overtemp, this machine has an occasionally dark display).
While I was in there, I attempted to boot the machine. The hard drive LED came on continuously upon powerup, and there was no boot and no splash screen.
Too hot? I put a finger cautiously on top of the heat sink. Seemed fairly cool to the touch. Tried powerup again. Hard drive LED lit, then went out. Splash screen appeared. Normal boot began, accompanied by the hard drive LED.
Hrm? Touch fixes the problem? How's that again?
I played around with it for two hours last night. The machine now seems to be incredibly sensitive to pressure or lack of same in certain areas of the MMC-2 daughtercard. I was able to get the machine booting reliably, and then to render it unbootable by turning one of the daughtercard corner mounting screws by a tiny fraction.
I removed the daughtercard and put in an old 400Mhz daughtercard. No such problem there. Put the 650Mhz daughtercard back in. Wouldn't boot. Nudged the screw settings around some. Back to reliable booting.
I am guessing that heat has warped the daughtercard structurally and perhaps cracked a trace on it somewhere. I did examine the pinout on the connector under magnification, and every pin looks just fine.
Posted: Tue Aug 22, 2006 1:56 pm
by rkawakami
Sounds like some good dectective work there, Alice
Intermittent contact problems are one of the most maddening things you can try to troubleshoot. One thing you should look for is a component which may have lifted partially off of the board. With surface mounted parts, there is always the possibility that there was not enough solder paste applied before the re-flow process. Since there is a good chance that thermal expansion and contraction cycles have affected your daughtercard, I would suspect this to be your problem before attempting to look for a broken trace. Good luck with your hunt!
Posted: Wed Aug 23, 2006 12:38 pm
by Chris Thorne
Yes, the daughtercard definitely has a stability issue. Almost certainly a cracked trace. Whether that is from overheating or from perhaps being overtightened in its mounts, I can't say.
Have put the bad D/C back in to an old 600E board. That displays the same pattern of booting only irregularly, which is affected by pressing down on various points of the top of the D/C.
I put the original 450Mhz D/C back in the 600X, and that works fine.
So I will reluctantly have to mark this one as bad, and go in search of a new MMC-2 unit. If anyone has a line on a 750 or higher, sing out. I had not been all that impressed with the performance delta from 450->650 anyway.
Posted: Wed Aug 23, 2006 2:19 pm
by cmarti