Beta Linux kernel killing Intel e1000e Network Cards

Solaris, RedHat, FreeBSD and the like
Post Reply
Message
Author
GomJabbar
Moderator
Moderator
Posts: 9765
Joined: Tue Jun 07, 2005 6:57 am

Beta Linux kernel killing Intel e1000e Network Cards

#1 Post by GomJabbar » Tue Sep 23, 2008 12:01 pm

Serious e1000e Driver Issue in SLE 11 Beta 1 and openSUSE 11.1 Beta 1
The Intel e1000e driver on openSUSE 11.1 Beta 1 and SUSE Linux Enterprise 11 Beta 1 might have a serious issue with the potential to damage the network card in a way that it cannot be used any longer.
Mandriva is also reporting this problem in the latest kernel they are using for Mandriva 2009 release candidate.
https://qa.mandriva.com/show_bug.cgi?id=44147
It seems the e1000e driver in 2.6.27-rc can corrupt the eeprom of these cards, killing the hardware.
Last edited by GomJabbar on Sat Nov 15, 2008 11:17 am, edited 3 times in total.
DKB

frankausmtank
Freshman Member
Posts: 111
Joined: Thu Aug 03, 2006 5:06 am
Location: Berlin, Germany

#2 Post by frankausmtank » Tue Sep 23, 2008 2:56 pm

Thanks for posting this, I was just about to start a similar thread.
It seems this is an issue with all distros using 2.6.27rc, including the current alphas of Ubuntu 8.10.

https://bugs.launchpad.net/ubuntu/+sour ... bug/263555

edit:
under which circumstances will e1000e be loaded? So, which thinkpads (if any) are at risk, and when?
I always had just the e1000 module loaded (on 2.6.24), however, after a quick search on the net for "thinkpad e1000e" it seems that the e1000e module gets loaded sometimes on certain models, but I'm still not that sure about it.

another edit: thanks for the sticky.

wswartzendruber
Junior Member
Junior Member
Posts: 377
Joined: Fri Apr 15, 2005 10:33 am
Location: Idaho, USA

#3 Post by wswartzendruber » Fri Sep 26, 2008 11:12 am

frankausmtank wrote:Thanks for posting this, I was just about to start a similar thread.
It seems this is an issue with all distros using 2.6.27rc, including the current alphas of Ubuntu 8.10.

https://bugs.launchpad.net/ubuntu/+sour ... bug/263555

edit:
under which circumstances will e1000e be loaded? So, which thinkpads (if any) are at risk, and when?
I always had just the e1000 module loaded (on 2.6.24), however, after a quick search on the net for "thinkpad e1000e" it seems that the e1000e module gets loaded sometimes on certain models, but I'm still not that sure about it.

another edit: thanks for the sticky.
Affected chipsets seem to be GM965 and GM45 (whatever-61 and the new ones).
Model: Lenovo ThinkPad T400
CPU: Intel Core 2 Duo P8400 (2.26 GHz, 1067 MHz FSB, 3 MB L2 Cache)
RAM: 4 GB PC-8500 (1067 MHz, Dual-channel)
HDD: 500 GB, 54000 RPM
Audio: Conexant CX20561 (192 kHz, 24-bit)
Video: Intel GMA 4500MHD
Wireless: Intel 5300

tarvoke
Junior Member
Junior Member
Posts: 273
Joined: Sun Mar 25, 2007 12:45 pm
Location: Slightly Outside America

#4 Post by tarvoke » Fri Sep 26, 2008 9:42 pm

8086:1049 is the pci id of the 82566MM which we see in a lot of the newer thinkpads.

think I dodged a bullet... my x61t runs ubuntu 8.04.1, but with the 2.6.26/2.6.27 kernel from 8.10.

there is some theorizing that it is an interaction from latest xorg in 8.10 that causes this (e1000e in 2.6.26 is identical to 2.6.27 I think). but reading those bug reports earlier today I saw mention from someone who runs a setup like mine (i.e. xorg from 8.04 not 8.10), who got killed by the problem.

also read one or two people saying that booting into windows somehow resurrected the failed hardware. not sure if that's really true.

but...

my machine spends 99% of the time at home using wireless only. today I'm at a hotel where wireless just refused to work no matter what, and wired would not work. rebooted to xp64 and wired works fine.

either way, whew!

canonical (and heads of other distros) need to take some responsibility here. alpha/beta release, sure I understand it may trash my filesystem or make fun of my dog or something; but not trash hardware.

putting up warnings on the download page but not pulling the images is not cool. probably almost half of new laptops shipped use integrated intel pci-e nic. a lot of people get images from bittorrent and won't even see the download page / warning. sorry for the rant, this just really burns me up.
go away.

GomJabbar
Moderator
Moderator
Posts: 9765
Joined: Tue Jun 07, 2005 6:57 am

#5 Post by GomJabbar » Fri Sep 26, 2008 11:55 pm

tarvoke wrote:canonical (and heads of other distros) need to take some responsibility here. alpha/beta release, sure I understand it may trash my filesystem or make fun of my dog or something; but not trash hardware.

putting up warnings on the download page but not pulling the images is not cool. probably almost half of new laptops shipped use integrated intel pci-e nic. a lot of people get images from bittorrent and won't even see the download page / warning. sorry for the rant, this just really burns me up.
I think Mandriva did what they could, but they have a limited amount of hardware to test internally. On August 30th, they posted the following Announcement in the Cooker forum of http://forum.mandriva.com/ .
Anne wrote:Hi

As written in copied mail above, we decided to move to kernel 2.6.27 for Mandriva Linux 2009. This is a big decision and needed lots of tests and discussions.

This was a hard one because we are late in planning and this kind of change can have lots of consequences. Because of maintainance issues and hardware support progress in last kernel, kernel team decided to do the jump.

If some blocking issues are discovered, we will switch back to 2.6.26 kernel.

In order to make this change in safe conditions, we need to get the largest tests on largest range of hardware configurations. You can now test it from Cooker then through coming RC1 release.

Thanks for advance for this

Cheers

===================================================================

The announcement on Cooker:

Hi,

Today kernel 2.6.27-rc5 is arriving in cooker.

It seems a bit too bleeding edge for the state we are in now of the distro,
but after some internal discussion we decided to go with it. We hope that it
enhances users experience of 2009.0, with better support in general (like
with wireless and other new hardware now supported by 2.6.27) and less
issues. Also it will makes things easier in the maintenance area, with less
backports needed (unlike the case with 2.6.26), and considering that 2009.0
can be the base of next corporate version of the distro, a most up to date
kernel easies things a bit.

We made many tests considering the stage now of 2009.0, with rc1 release going
to happen next week, but please report especially any regressions you see
compared with previous 2.6.26 package releases. Until now we didn't found
anything critical that could hold the update to 2.6.27, but if something very
bad not detected yet happens we will revert back to 2.6.26.

Just a note about 2.6.27-rc5: on LKML it was reported a boot issue when using
ahci (http://lkml.org/lkml/2008/8/29/304), as can be seen on the thread there
is already a fix for it, we will release later updated packages, just a warn
for ahci users that you may want to wait a bit before using the new kernel.

--
[]'s
Herton
_________________
Mandriva engineering director
They also posted a warning announcement on Sept. 23rd at the top of the Cooker forum:
URGENT: major bug in all Mandriva Linux 2009 pre-releases
awilliamson wrote:This bug is known to be present in all current 2.6.27 pre-release kernels.

The issue potentially affects all current Mandriva Linux 2009 pre-releases (earlier pre-releases came with a 2.6.26 kernel, but if you installed one and updated regularly, you would be automatically updated to a 2.6.27 kernel). It will not affect Mandriva Linux 2009 RC2, which will be released soon with a kernel that works around the issue. The issue will not be present in the final release of Mandriva Linux 2009. It does not affect any Mandriva Linux stable release (including 2008 Spring) unless you are using a non-official kernel. If you are using any current Mandriva Linux 2009 pre-release, we highly recommend you check whether this issue may affect you.
<snip>
We recommend you then immediately switch to an unaffected kernel. Cooker kernels from version 2.6.27-0.rc7.1.1mnb work around the problem by disabling the driver; this will prevent your network interface from working, but will also ensure it is not affected by this issue. Kernel 2.6.27-0.rc7.1.1mnb will be available in the Cooker repositories on September 24th.
The kernel bug was submitted at Mandriva's bugzilla less than 3 hours before awilliamson's post above (see link in my original post).

IMO, Mandriva acted responsibly - and SUSE as well. If people want to run pre-releases of an operating system, they should do their homework, and expect problems from time to time. Pre-releases of an operating system should not be run on a production machine or on your daily driver. The kernel came from upstream, and this problem affected all distros that were trying it out (before the bug was discovered).
DKB

tarvoke
Junior Member
Junior Member
Posts: 273
Joined: Sun Mar 25, 2007 12:45 pm
Location: Slightly Outside America

#6 Post by tarvoke » Sat Sep 27, 2008 12:04 am

was mainly griping about ubuntu; I'm not up on the other affected distros much, so should not have just lumped them in, sorry. sounds like mandriva and suse did as well or better than can be expected.

the semi-good news is, if the magical boot-into-windows-fixes-it-fix actually restores the eeprom, it will be possible to write a tool to do the same and get it out to the poor barstids who've lost their nic.

no one in their right mind should be putting unstable bits on production machines, I think we can all agree there. we are going to be on etch with 2.6.18 for quite a long time to come at $DAY_JOB.
go away.

jglen490
Posts: 25
Joined: Sun Feb 10, 2008 11:51 pm
Location: Montgomery AL

#7 Post by jglen490 » Sat Sep 27, 2008 4:32 pm

Here is the first thing that is displayed on the Ubuntu Alpha 6 page - before you even get to the link to download.
WARNING

Due to an unresolved bug in the Linux kernel included in Alpha 6, it should not be used on Intel ethernet hardware handled by the e1000e driver (Intel GigE). Doing so may render your network hardware permanently inoperable.

Older Intel ethernet hardware which uses the e1000 driver is not affected by this; however, some hardware which used the e1000 driver in previous Ubuntu releases, such as hardware that uses a PCI Express bus, has been moved from e1000 to e1000e in the latest kernel releases. If in doubt, do not use Alpha 6, and subscribe to https://bugs.launchpad.net/bugs/263555 to be notified when the bug is fixed in the daily images.
Ubuntu, also, has done due diligence. Know your platform, or at least investigate what you have before using any distro, or any OS for that matter. Since Intrepid is still in the Alpha stage, you should be cautious anyway.
I feel more like I do now than I did when I got here.
Registered Linux User #270832

dk
Posts: 39
Joined: Tue Sep 30, 2008 8:49 pm
Location: Edmonton, CA

e1000e: corruption update

#8 Post by dk » Fri Oct 03, 2008 10:42 pm

Hello.

Bruce Allan has written a patch for the e1000e driver to prevent future corruptions of NVM and it looks like it may be merged into the next 2.6.27 release candidate. Also, he makes mention of a coming patch to help those who have corrupted eeproms. I have a eeprom dump of a working Intel 82567LM on the T400 if anyone needs it in the future.

http://lkml.org/lkml/2008/10/1/368

If you really need your internal NIC you should be safe to patch your 2.6.27 source tree with his patch and use your NIC.

carbon_unit
Moderator Emeritus
Moderator Emeritus
Posts: 2988
Joined: Sat Apr 24, 2004 9:10 pm
Location: South Central Iowa, USA

#9 Post by carbon_unit » Sat Oct 04, 2008 7:10 am

Cool! I am glad to see it is under control.
T60 2623-D7U, 3 GB Ram.
Dual boot XP and Linux Mint.
Registered linux user #160145

dk
Posts: 39
Joined: Tue Sep 30, 2008 8:49 pm
Location: Edmonton, CA

2.6.27.1: e1000e update

#10 Post by dk » Thu Oct 16, 2008 2:59 pm

Kudos to Intel and the kernel developers who appear to have sussed the e1000e corruption issue. See the changelog for a description of the dynamic ftrace code issue:

http://kernel.org/pub/linux/kernel/v2.6 ... g-2.6.27.1

GomJabbar
Moderator
Moderator
Posts: 9765
Joined: Tue Jun 07, 2005 6:57 am

#11 Post by GomJabbar » Sat Nov 15, 2008 11:17 am

I believe this problem is no longer an issue so I am going to unsticky this thread. Anyone feels differently - please respond.
DKB

carbon_unit
Moderator Emeritus
Moderator Emeritus
Posts: 2988
Joined: Sat Apr 24, 2004 9:10 pm
Location: South Central Iowa, USA

#12 Post by carbon_unit » Sat Nov 15, 2008 8:04 pm

Go for it. There is no longer an issue.
T60 2623-D7U, 3 GB Ram.
Dual boot XP and Linux Mint.
Registered linux user #160145

Post Reply
  • Similar Topics
    Replies
    Views
    Last post

Return to “Linux Questions”

Who is online

Users browsing this forum: No registered users and 1 guest