Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2013 18:14:44 +0800
From:      Sepherosa Ziehau <sepherosa@gmail.com>
To:        Nick Rogers <ncrogers@gmail.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Jack Vogel <jfvogel@gmail.com>, "Christopher D. Harrison" <harrison@biostat.wisc.edu>
Subject:   Re: igb network lockups
Message-ID:  <CAMOc5cz%2BknVK=skEz1z=WNAjd5mL3DeOVBasHnJ6ggsNtiQdbA@mail.gmail.com>
In-Reply-To: <CAKOb=YYRu94CRC8Fd1TrWezHig6Od_uNpO2f%2BtCBQTBNQVjtog@mail.gmail.com>
References:  <512BAA60.3060703@biostat.wisc.edu> <CAFOYbckDFJKRip%2Be=a%2B_JPHhk%2BHbAikRBK0dHEBDDEgdsZT6sw@mail.gmail.com> <512BAF8D.7080308@biostat.wisc.edu> <CAFOYbcnEN=Pzd9k4hvR%2BwqP3_HJj3-QRQSwocfHDSehUH5YPXA@mail.gmail.com> <CAKOb=YYyJZyKzpEBT%2Bo-Vmn7dedRfVW%2BwVh1KVM7oaWT63%2BqBg@mail.gmail.com> <CAKOb=YYRu94CRC8Fd1TrWezHig6Od_uNpO2f%2BtCBQTBNQVjtog@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers <ncrogers@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers <ncrogers@gmail.com> wrote:
>> FWIW I have been experiencing a similar issue on a number of systems
>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>> are: interface stops passing traffic until the system is rebooted. I
>> have not yet been able to gain access to the systems to dig around
>> (after they have crashed), however my kernel/network settings are
>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>> happen about once a day on systems with around a sustained 50Mb/s of
>> traffic.
>>
>> I realize this is not much to go on but perhaps it helps. I am
>> debating trying the e1000 driver in the latest CURRENT on top of
>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>> ago. Would this change or perhaps another change to e1000 since
>> 9.1-RELEASE possibly affect stability in a positive way?
>>
>> Thanks.
>
> Heres relevant pciconf output:
>
> em0@pci0:1:0:0: class=0x020000 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
>     class      = network
>     subclass   = ethernet
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit
>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>     cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected

For 82574L, i.e. supported by em(4), MSI-X must _not_ be enabled; it
is simply broken (you could check 82574 errata on Intel's website to
confirm what I have said here).

For 82575, i.e. supported by igb(4), MSI-X must _not_ be enabled; it
is simply broken (you could check 82575 errata on Intel's website to
confirm what I have said here).

Best Regards,
sephe

--
Tomorrow Will Never Die

On Sat, Mar 2, 2013 at 12:18 AM, Nick Rogers <ncrogers@gmail.com> wrote:
> On Fri, Mar 1, 2013 at 8:04 AM, Nick Rogers <ncrogers@gmail.com> wrote:
>> FWIW I have been experiencing a similar issue on a number of systems
>> using the em(4) driver under 9.1-RELEASE. This is after upgrading from
>> a snapshot of 8.3-STABLE. My systems use PF+ALTQ as well. The symptoms
>> are: interface stops passing traffic until the system is rebooted. I
>> have not yet been able to gain access to the systems to dig around
>> (after they have crashed), however my kernel/network settings are
>> properly tuned (high mbuf limit, hw.em.rxd/txd=4096, etc). It seems to
>> happen about once a day on systems with around a sustained 50Mb/s of
>> traffic.
>>
>> I realize this is not much to go on but perhaps it helps. I am
>> debating trying the e1000 driver in the latest CURRENT on top of
>> 9.1-RELEASE. I noticed the Intel shared code was updated about a week
>> ago. Would this change or perhaps another change to e1000 since
>> 9.1-RELEASE possibly affect stability in a positive way?
>>
>> Thanks.
>
> Heres relevant pciconf output:
>
> em0@pci0:1:0:0: class=0x020000 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
>     class      = network
>     subclass   = ethernet
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit
>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>     cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em1@pci0:2:0:0: class=0x020000 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
>     class      = network
>     subclass   = ethernet
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit
>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>     cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em2@pci0:7:0:0: class=0x020000 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
>     class      = network
>     subclass   = ethernet
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit
>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>     cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
> em3@pci0:8:0:0: class=0x020000 card=0x10d315d9 chip=0x10d38086 rev=0x00 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = '82574L Gigabit Network Connection'
>     class      = network
>     subclass   = ethernet
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit
>     cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1)
>     cap 11[a0] = MSI-X supports 5 messages in map 0x1c enabled
> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>
>
>>
>> On Mon, Feb 25, 2013 at 10:45 AM, Jack Vogel <jfvogel@gmail.com> wrote:
>>> Have you done any poking around, looking at stats to determine why the
>>> hangs? For instance,
>>> might your mbuf pool be depleted? Some other network resource perhaps?
>>>
>>> Jack
>>>
>>>
>>> On Mon, Feb 25, 2013 at 10:38 AM, Christopher D. Harrison <
>>> harrison@biostat.wisc.edu> wrote:
>>>
>>>>  Sure,
>>>> The problem appears on both systems running with ALTQ and vanilla.
>>>>     -C
>>>>
>>>> On 02/25/13 12:29, Jack Vogel wrote:
>>>>
>>>> I've not heard of this problem, but I think most users do not use ALTQ,
>>>> and we (Intel) do not
>>>> test using it. Can it be eliminated from the equation?
>>>>
>>>> Jack
>>>>
>>>>
>>>> On Mon, Feb 25, 2013 at 10:16 AM, Christopher D. Harrison <
>>>> harrison@biostat.wisc.edu> wrote:
>>>>
>>>>> I recently have been experiencing network "freezes" and network "lockups"
>>>>> on our Freebsd 9.1 systems which are running zfs and nfs file servers.
>>>>> I upgraded from 9.0 to 9.1 about 2 months ago and we have been having
>>>>> issues with almost bi-monthly.   The issue manifests in the system becomes
>>>>> unresponsive to any/all nfs clients.   The system is not resource bound as
>>>>> our I/O is low to disk and our network is usually in the 20mbit/40mbit
>>>>> range.   We do notice a correlation between temporary i/o spikes and
>>>>> network freezes but not enough to send our system in to "lockup" mode for
>>>>> the next 5min.   Currently we have 4 igb nics in 2 aggr's with 8 queue's
>>>>> per nic and our dev.igb reports:
>>>>>
>>>>> dev.igb.3.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.4
>>>>>
>>>>> I am almost certain the problem is with the ibg driver as a friend is
>>>>> also experiencing the same problem with the same intel igb nic.   He has
>>>>> addressed the issue by restarting the network using netif on his systems.
>>>>> According to my friend, once the network interfaces get cleared, everything
>>>>> comes back and starts working as expected.
>>>>>
>>>>> I have noticed an issue with the igb driver and I was looking for
>>>>> thoughts on how to help address this problem.
>>>>>
>>>>> http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drbr-and-ALTQ-td5760338.html
>>>>>
>>>>> Thoughts/Ideas are greatly appreciated!!!
>>>>>
>>>>>     -C
>>>>>
>>>>> _______________________________________________
>>>>> freebsd-net@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



--
Tomorrow Will Never Die



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMOc5cz%2BknVK=skEz1z=WNAjd5mL3DeOVBasHnJ6ggsNtiQdbA>