Date: Thu, 31 Mar 2011 18:15:32 -0400 From: Arnaud Lacombe <lacombar@gmail.com> To: Jack Vogel <jfvogel@gmail.com> Cc: freebsd-net@freebsd.org Subject: Re: em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"] Message-ID: <AANLkTina-MO4GuK66ZJN0hipp%2BVCa-CUxEz79rzRt-cZ@mail.gmail.com> In-Reply-To: <AANLkTikvbvr%2BY=Fh2fPVieHkTRix%2Bni61jVPct10NKfD@mail.gmail.com> References: <AANLkTin64gGxRituE2B%2BsfVpRXt2QetdNLaV7HCf0uNE@mail.gmail.com> <AANLkTi=OjzMrjCPZ2VFDBf6URTaMoAzQqXbxWLv3d9mW@mail.gmail.com> <AANLkTikvbvr%2BY=Fh2fPVieHkTRix%2Bni61jVPct10NKfD@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel@gmail.com> wrote: > So, what is the evidence that the driver is stuck here? > About 800 pps (mostly SYN) present wire but never ever seen on em0, plus a couple of ARP reply, which still never hit em0, plus the `missed_packets' count increasing by the same 800 pps in the last hour. Is that enough ? - Arnaud ps: I forgot to add that MAC address on the wire are fine. > I see that next_to_check != next_to_refresh, which is why the > local timer won't schedule anything. OH, and I also realized there > is a problem with local_timer anyway, it will run rxeof, but that won't help > if you can't enter the loop, so I need to add some code at the top to > call em_refresh_mbufs() when in this state. > > On this interrupt cause that you are focused upon, although its there in the > design, I had talked with some of our most seasoned developers on both > the Windows and Linux side of the house, and NO one has ever used this > 'feature', because (and I'm quoting here) "there's no good use case for it". > Meaning, there's always some simpler way of handling the issue. > > When you use MSIX you can't read causes btw, if you configured it, it would > mean you'd just get into the regular RX handler, same as always, so why > some special bother with this cause? > > On non-MSIX hardware there is just no particular reason to worry about the > cause either, we can just handle the RX situation in the interrupt handler. > > Jack > > > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar@gmail.com> wrote: >> >> Hi Jack, >> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar@gmail.com> >> wrote: >> > [...] >> > I'll remove part of the changes I made to keep only `rx_forced_refill' >> > and the associated sysctl, re-run the tests and come back with correct >> > value, hopefully in a few hours. >> > >> Here it is: >> >> # sysctl dev.em.0.%desc >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2 >> >> # sysctl dev.em.0.mac_stats.missed_packets >> dev.em.0.mac_stats.missed_packets: 917428 >> >> # sysctl dev.em.0.debug=1 >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE >> em0: hw tdh = 975, hw tdt = 975 >> em0: hw rdh = 884, hw rdt = 885 >> em0: Tx Queue Status = 0 >> em0: TX descriptors avail = 1024 >> em0: Tx Descriptors avail failure = 0 >> em0: RX discarded packets = 0 >> em0: RX Next to Check = 884 >> em0: RX Next to Refresh = 885 >> -> -1 >> >> So the taskqueue cannot be scheduled to run and the driver is stuck. >> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel@gmail.com> wrote: >> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx queues >> >> and >> >> will schedule a task that refreshes mbufs if they are empty. This has >> >> exactly the >> >> same effect as checking for some interrupt cause, a cause that is not >> >> available >> >> when using MSIX on 82574, but this approach works for everything. >> >> >> Can you please point me to a reference datasheet (or errata), provided >> by Intel, about the RX Overrun interrupt not being available with >> MSI-X on the 82574 ? >> >> Currently, I only have access to [0], which precises the following: >> >> 7.4 Interrupts >> 7.4.2 MSI-X Mode >> [...] >> The following configuration and parameters are involved: >> • The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues and >> other >> events to 5 interrupt vectors >> • The ICR[24:20] bits reflect specific interrupt causes >> • Five MSI-X interrupt vectors are provided (calculated based on four >> vectors for >> queues and one vector for other causes). The requested number of vectors >> is >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X >> capability >> structure of the function. >> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC) >> [...] >> >> about bit 24: >> >> Other Interrupt. Indicates one of the following interrupts was set: >> • Link Status Change. >> • Receiver Overrun. >> • MDIO Access Complete. >> • Small Receive Packet Detected. >> • Receive ACK Frame Detected. >> • Manageability Event Detected. >> >> Thanks in advance, >> - Arnaud >> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTina-MO4GuK66ZJN0hipp%2BVCa-CUxEz79rzRt-cZ>
