Date: Wed, 09 Nov 2011 12:21:46 +0330 From: Hooman Fazaeli <hoomanfazaeli@gmail.com> To: Adrian Chadd <adrian@freebsd.org> Cc: pyunyh@gmail.com, freebsd-net@freebsd.org, Emil Muratov <gpm@hotplug.ru>, Jack Vogel <jfvogel@gmail.com>, Jason Wolfe <nitroboost@gmail.com> Subject: Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled Message-ID: <4EBA3F22.2060204@gmail.com> In-Reply-To: <CAJ-Vmomf-wxb8dY7YF7qT_FGK5d-YLPU3BkPOeHnOtKZ%2BUrYeQ@mail.gmail.com> References: <CAAAm0r0RXEJo4UiKS=Ui0e5OQTg6sg-xcYf3mYB5%2Bvk8i8557w@mail.gmail.com> <4E8F51D4.1060509@sentex.net> <CACqU3MVwLaepFymZJkaVk6p=SpykGhqs=VYFjLh9fP9S=AxDhg@mail.gmail.com> <CAAAm0r1DKvoL9=Ket9up=4%2B5xiCzTTZJK99FhF9jcCA28B0M%2BA@mail.gmail.com> <CAAAm0r3XdsMHZh%2BP_NF-txZasdExzwZ8ymmGQgGhJQds0fOiBQ@mail.gmail.com> <CAAAm0r1iS3z-7CBJ=xYDf%2BJOA1Q2nU0O54Twbyb7FjvgWHjKVw@mail.gmail.com> <4EA7E203.3020306@sepehrs.com> <CAAAm0r3Nr2t8cCetPkFnLQ-3KwqHw_0SpqbtvYPRUkSP=9n8CA@mail.gmail.com> <4EA80818.3030504@sentex.net> <4EA80F88.4000400@hotplug.ru> <4EA82715.2000404@gmail.com> <4EA8FA40.7010504@hotplug.ru> <4EA91836.2040508@gmail.com> <4EA959EE.2070806@hotplug.ru> <4EAD116A.8090006@gmail.com> <CAAAm0r3qm=nQQuAmZDD4k4X8K-xW6_kM9TukRT=1GoG9dYR3zw@mail.gmail.com> <4EAE58A2.9040803@gmail.com> <CAAAm0r0uoPPEQbq5rHkFr6ZLp-WJ4YVjDVvxxV6y%2BUh4eEKDEA@mail.gmail.com> <4EB96511.50701@gmail.com> <CAJ-Vmomf-wxb8dY7YF7qT_FGK5d-YLPU3BkPOeHnOtKZ%2BUrYeQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/8/2011 11:00 PM, Adrian Chadd wrote: > On 8 November 2011 09:21, Hooman Fazaeli<hoomanfazaeli@gmail.com> wrote: > >> With MSIX enabled, the link task (em_handle_link) does _not_ triggers >> _start when the link changes state from inactive to active (which it >> should). >> If if_snd quickly fills up during a temporary link loss, transmission is >> stopped forever and the driver never recovers from that state. >> >> The last patch should have reduced the frequency of the problem >> but it assumes every IFQ_ENQUEUE is followed by a if_start which >> is not a true assumption. > > FWIW, I saw something very similar with the if_arge code port from > Linux. If the TX queue filled up and wasn't serviced before it hit > completely full, it was never drained. > > It may be worthwhile auditing some of the other NIC drivers to ensure > this kind of situation isn't occuring. Especially if they came from > Linux. :-) > > That's a great catch, I hope it finally fixes the if_em issues with MSIX. :-) > > > Adrian Just for the record, I should inform you that igb, ixgb and ixbge have the same issue. I have not checked other drivers. And there is another subtle problem with all these drivers: if transmit (xxx_xmit) fails for a temporary memory shortage (i.e., DMA failure for ENOMEM), the driver may enter the OACTIVE state and _never_ recovers! The scenario is somehow as before: - if_start is executed. - xxx_xmit fails with ENOMEM. - xxx_start_locked sets OACTIVE. Note that this is different from a low TX descriptor condition which also sets OACTIVE. - stack enqueues packets in if_snd but does not call if_start since driver is OACTIVE. - stack enqueues more packets until if_snd fills up and packets start to drop. - Since there is nowhere in the driver's code to re-try transmission when memory becomes available again (xxx_local_timer is a candidate), the driver remains OACTIVE forever until it is re-initialized. I am working on patches for em/igb/ixgb/ixgbe to fix these issues and would be happy to share them with anyone who is interested. since these are really severe problems, I hope gurus apply official fixes ASAP.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EBA3F22.2060204>