T60+WinXP - Cardbus performance issues - cause and solution
Posted: Mon Jan 07, 2013 9:48 am
In my recent investigations of various Cardbus controllers and their performance (USB controllers and SDHC readers) I noticed that the T60 (TI PCI-1510 Cardbus controller) was consistently underperforming, compared to other Cardbus-equipped laptops I tested (the T42 which also uses a TI, PCI-4520) and the A31p/X32 (which use Ricoh devices).
Primary symptom: The continuous read/write speeds (behchmarked through CrystalDiskMark) of flash memory (thumb drives / SD cards) connected through Cardbus are roughly 50% of the speeds of same devices in any of the other laptops
Secondary symptom: High CPU utilization by "hardware interrupts" during CardBus I/O was observed through Process Explorer (usually 15-30%).
The results were consistent across two different T60 laptops, and a variety of Cardbus controllers and flash memory devices.
The following were quickly disqualified as potential causes:
Bad Cardbus controller in T60: Performance in Fedora Linux was normal on both T60 machines.
Bad PCI-1510 XP driver: XP uses the same generic driver for the TI controller in T42, and the performance is normal there. A different version of the driver (originally for Win2K) was tried - with same results.
The symptoms lead to believe that the cause may be related to interrupts. A quick examination of the IRQ assignments in WinXP showed several devices sharing the same IRQ (16) with the Cardbus controller - among them the Intel PRO/1000 PL Ethernet controller.
Voila! Disabling the Ethernet controller immediately fixed the performance problems of all Cardbus devices.
The cause therefore seems to be a bug / lack of proper optimization in the Windows XP driver of the PRO/1000 PL. The problem appears to be specific to this device, as the PRO/1000 MT in T42/X32 does not cause high interrupts, despite sharing the same IRQ (11) with the Cardbus device, and to the Windows operating system, since the Linux driver for PRO/1000 PL has no problems as well.
In fact, under Linux, the Intel LAN driver uses MSI interrupts, which work in a different manner and are not shared with the Legacy IRQ. WinXP does not support MSI interrupts, so this approach would not be possible there. I haven't checked what happens under Vista/Win7, which do support MSI (that would depend on Intel's choice to implement MSI interrupts in the NT6 drive for the PRO/1000 PL or not).
The root cause is consistent with similar findings in a thread on the Lenovo forums, where the Intel LAN was shown to interfere with Cardbus sound devices.
Now that the problem was identified, what about the solution? Unfortunately, most simple alternatives proved inadequate.
* Disabling the LAN is acceptable for someone who never uses it, but not for the rest of us
* Dealing with low Cardbus performance may be acceptable, but leaves a bad taste for the tech-geeks
* Trying different versions of the LAN driver did not solved the problem, although I must admit I haven't tried any of the really old ones. What's certain is that the problem exists in the newest drivers both from Lenovo's site and from Intel's.
* Locating the Intel driver developers and asking them to find and fix the bug is the ideal solution, and I may try it if ever I feel like it. But I have a hunch it won't be quick.
* Reassigning the Cardbus and the LAN to different IRQs - appears to be impossible unless you want to give up ACPI, which most people wouldn't, and even then it is uncertain it would work - the BIOS allows you to manually assign different IRQ numbers to different PCI interrupt requests, but the question is how the interrupt lines on the PCI/PCI-E devices are actually wired. It may not be possible to separate these particular devices.
So what worked?
The information in the rest of this post is correct but out-of-date as a Better solution was found and outlined in a post below.
==================================
Disabling and re-enabling the LAN controller did.
Probably what happens is that whenever a device which has been disabled is re-enabled, XP pushes it to the end of the work queue assigned to the specific IRQ number. And now, whenever an interrupt originated by other devices arrives on the same IRQ, the LAN controller interrupt handler will not be called and will not have a chance to "hog" the CPU.
One final thing: it appears that during the enumeration that happens on boot, XP always (or almost always) puts the LAN controller before the Cardbus.
And since I don't want to always have to disable/re-enable the device manually - I did the following:
* Downloaded the DevCon Microsoft utility to control the devices from the command-line.
* Wrote a little script in AutoIt to disable and re-enable the LAN (based on vendor/device ID numbers), converted it to a small app, and put it in HKCU\Software\Microsoft\Windows\CurrentVersion\Run. Running during user logon, we are guaranteed that the device enumeration already took place, and so after the script - the LAN controller will definitely be moved to the end of the IRQ line.
Hope this may help others who encounter the same issue and don't want to give up the LAN controller entirely.
I will post the script and the app later when I'm back on one of my T60s.
Primary symptom: The continuous read/write speeds (behchmarked through CrystalDiskMark) of flash memory (thumb drives / SD cards) connected through Cardbus are roughly 50% of the speeds of same devices in any of the other laptops
Secondary symptom: High CPU utilization by "hardware interrupts" during CardBus I/O was observed through Process Explorer (usually 15-30%).
The results were consistent across two different T60 laptops, and a variety of Cardbus controllers and flash memory devices.
The following were quickly disqualified as potential causes:
Bad Cardbus controller in T60: Performance in Fedora Linux was normal on both T60 machines.
Bad PCI-1510 XP driver: XP uses the same generic driver for the TI controller in T42, and the performance is normal there. A different version of the driver (originally for Win2K) was tried - with same results.
The symptoms lead to believe that the cause may be related to interrupts. A quick examination of the IRQ assignments in WinXP showed several devices sharing the same IRQ (16) with the Cardbus controller - among them the Intel PRO/1000 PL Ethernet controller.
Voila! Disabling the Ethernet controller immediately fixed the performance problems of all Cardbus devices.
The cause therefore seems to be a bug / lack of proper optimization in the Windows XP driver of the PRO/1000 PL. The problem appears to be specific to this device, as the PRO/1000 MT in T42/X32 does not cause high interrupts, despite sharing the same IRQ (11) with the Cardbus device, and to the Windows operating system, since the Linux driver for PRO/1000 PL has no problems as well.
In fact, under Linux, the Intel LAN driver uses MSI interrupts, which work in a different manner and are not shared with the Legacy IRQ. WinXP does not support MSI interrupts, so this approach would not be possible there. I haven't checked what happens under Vista/Win7, which do support MSI (that would depend on Intel's choice to implement MSI interrupts in the NT6 drive for the PRO/1000 PL or not).
The root cause is consistent with similar findings in a thread on the Lenovo forums, where the Intel LAN was shown to interfere with Cardbus sound devices.
Now that the problem was identified, what about the solution? Unfortunately, most simple alternatives proved inadequate.
* Disabling the LAN is acceptable for someone who never uses it, but not for the rest of us
* Dealing with low Cardbus performance may be acceptable, but leaves a bad taste for the tech-geeks
* Trying different versions of the LAN driver did not solved the problem, although I must admit I haven't tried any of the really old ones. What's certain is that the problem exists in the newest drivers both from Lenovo's site and from Intel's.
* Locating the Intel driver developers and asking them to find and fix the bug is the ideal solution, and I may try it if ever I feel like it. But I have a hunch it won't be quick.
* Reassigning the Cardbus and the LAN to different IRQs - appears to be impossible unless you want to give up ACPI, which most people wouldn't, and even then it is uncertain it would work - the BIOS allows you to manually assign different IRQ numbers to different PCI interrupt requests, but the question is how the interrupt lines on the PCI/PCI-E devices are actually wired. It may not be possible to separate these particular devices.
So what worked?
The information in the rest of this post is correct but out-of-date as a Better solution was found and outlined in a post below.
==================================
Disabling and re-enabling the LAN controller did.
Probably what happens is that whenever a device which has been disabled is re-enabled, XP pushes it to the end of the work queue assigned to the specific IRQ number. And now, whenever an interrupt originated by other devices arrives on the same IRQ, the LAN controller interrupt handler will not be called and will not have a chance to "hog" the CPU.
One final thing: it appears that during the enumeration that happens on boot, XP always (or almost always) puts the LAN controller before the Cardbus.
* Downloaded the DevCon Microsoft utility to control the devices from the command-line.
* Wrote a little script in AutoIt to disable and re-enable the LAN (based on vendor/device ID numbers), converted it to a small app, and put it in HKCU\Software\Microsoft\Windows\CurrentVersion\Run. Running during user logon, we are guaranteed that the device enumeration already took place, and so after the script - the LAN controller will definitely be moved to the end of the IRQ line.
Hope this may help others who encounter the same issue and don't want to give up the LAN controller entirely.
I will post the script and the app later when I'm back on one of my T60s.