Nvidia GPU Problems
Introduction
Starting with the T60 series, Lenovo began using Nvidia GPU’s in some of their laptops. Whilst it was a step up in graphics performance, it was soon found that these chips had serious defects resulting in blank and freezing screens, BSOD’s, pixilation etc.
Many other laptop manufacturers were also experiencing the similar problems.
This article summarizes the Nvidia GPU issue, Lenovo’s response, and suggests some fixes.
The Problem
In 2008 a number of articles appeared highlighting the issue, typically by this one from Charlie Demerjian in the Inquirer, entitled “All Nvidia G84 and G86’s are Bad”:
http://www.theinquirer.net/inquirer/new ... 84-g86-bad.
A series of follow-up articles by the same author give more detail, and describe how the chips are manufactured, why the problems occurred, and what Nvidia’s doing about it:
http://www.theinquirer.net/inquirer/new ... -defective
In short, there were found to be issues with the Bump properties (the connection points within the chip), and the type of Underfill used (material used to fill the spaces between the chip layers).
A phrase in the third article sums it up:
"The flaw is a downright idiotic choice of multiple materials coupled with poor chip design and inadequate testing. It is a case of errors compounding errors. They are all defective."
Nvidia’s Response
In July 2008 Nvidia released this Press Statement:
http://www.nvidia.com/object/io_1215037160521.html. In it they state that they’ve set aside a proportion of their profits to:
“...cover anticipated warranty, repair, return, replacement and other costs and expenses, arising from a weak die/packaging material set in certain versions of its previous generation GPU and MCP products used in notebook systems.”
Lenovo’s Response
Lenovo is somewhat coy about the problem, and it’s difficult to find anything definitive from them; however this topic in their forums gives a good overview as well as a point of contact for affected owners:
http://forums.lenovo.com/t5/T61-and-pri ... td-p/46469.
The “Solution”
At present there is no permanent solution, since Nvidia appear not to offer redesigned chips to replace existing defective ones.
However it has been found that if the chips are heated, they sometimes work again.
There are also some software-based solutions which could help lower the chip operating temperature.
The following information and programs are supplied as information only, and may cause permanent damage or invalidate warranties. Use at your own risk!
1. Rebonding the Chip by Heating
After the chip fails, it has been found that heating it causes the GPU materials to rebond, and for it to sometimes work again.
Various heating methods have been employed such as hot air guns etc, however the safest method is to find a Reflowing company who use proper Infrared Reflowing Stations, where board and chip temperatures are closely monitored.
There’s no doubt that heating the chip does provide some sort of "cure", however anecdotal evidence on how successful it is, is mixed.
Bottom line is that the chip can never be permanently repaired like this, since reheating an already fundamentally flawed chip isn’t going to somehow solve the original manufacturing and design issues.
2. Software Fixes
Always check for any relevant BIOS or driver updates from the manufacturer relating to the Nvidia GPU.
If the GPU still works OK or has been successfully fixed by heating, underclocking the GPU may help reduce the internal chip temperature and prolong its operating life.
The following programs can be used to modify the settings on Powermizer
http://www.nvidia.com/object/feature_powermizer.html, a battery power optimising program which should be automatically installed at the same time as the Nvidia laptop GPU drivers.
Powermizer Manager:
http://somemorebytes.com/wp/index.php/nvpmmanager/, allows the GPU clock speeds to be manually changed.
Powermizer Switch:
http://forum.notebookreview.com/gaming- ... -card.html, allows the user to turn Powermizer on or off.
TechPowerUp GPU (shown in use in the Powermizer Switch link)
http://www.techpowerup.com/downloads/18 ... 0.4.4.html shows the changes Powermizer Switch make to the GPU clock speeds.
Other T6x Thinkpad Related Issues
T6x series fitted with Nvidia Quadro NVS 140M & Quadro FX570M GPU chips are arriving in our workshops with debonded chip symptoms, but which are also presenting us with further problems.
Before applying "repair" heat to any chips, any epoxy present should be removed first, to allow it to move freely during the heating process.
Generally either red, orange, or black dots, or continuous strips of red epoxy are applied around the chip, and these are easily removed when softening with a bit of heat.
Unfortunately Lenovo also uses a type of clear epoxy (the only manufacturer we've seen who uses this), and this remains hard at all temperatures. It can be dissolved with solvents, but because it often extends under the first few rows of solder balls, it cannot be reached to remove it properly.
The only effective solution we’ve found is to grind the chip off the board. Any remaining epoxy is then removed using the heat from a soldering iron which turns it into powder. There are numerous problems with this method such as finding a good replacement chip, to say nothing of the very real danger of damage to the motherboard. Also it’s possible that carcinogens etc from the fine powder are released during the grinding process.
We've managed to do it a few times on T43 boards (same epoxy issue) but whilst there's been no visible damage to the board, we've ended up with unrelated hardware faults.
Work on this is continuing.
Conclusion
Nvidia are unlikely to ever manufacture updated GPU chips to replace the existing faulty ones. This fact alone calls into question the long term viability of owning any laptop fitted with these chip families.
Short term fixes such as heating to rebond chips, and possibly underclocking them exist, but they remain that – short term fixes.