Date: Thu, 31 Mar 2011 17:28:02 -0700 From: Jack Vogel <jfvogel@gmail.com> To: Arnaud Lacombe <lacombar@gmail.com> Cc: freebsd-net@freebsd.org Subject: Re: em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"] Message-ID: <AANLkTin1KKiPKEf_KquG0NrbqExDsGPU_tizam7tYV9Y@mail.gmail.com> In-Reply-To: <AANLkTimut2BMxvhkkyREnK_izXek5tAT5jrw8tW%2BNKVY@mail.gmail.com> References: <AANLkTin64gGxRituE2B%2BsfVpRXt2QetdNLaV7HCf0uNE@mail.gmail.com> <AANLkTi=OjzMrjCPZ2VFDBf6URTaMoAzQqXbxWLv3d9mW@mail.gmail.com> <AANLkTikvbvr%2BY=Fh2fPVieHkTRix%2Bni61jVPct10NKfD@mail.gmail.com> <AANLkTina-MO4GuK66ZJN0hipp%2BVCa-CUxEz79rzRt-cZ@mail.gmail.com> <AANLkTi=OVSOitMvdjHexbv-fu0fA1WWOHo7gm-=MtPRf@mail.gmail.com> <AANLkTikmjmBKf9XUuSrYQz4T7xsR5ynvxHm2cjEDtFE%2B@mail.gmail.com> <AANLkTimut2BMxvhkkyREnK_izXek5tAT5jrw8tW%2BNKVY@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
You know what Arnaud, I've looked at the numbers again, and I suddenly saw that next_to_check and next_to_refresh are NOT in a good state, exactly the opposite, check is BEHIND refresh, which means the whole ring is empty, the HEAD (next_to_check) is pointing at 929, but next_to_refresh is at 930, RIGHT IN FRONT of it, so the whole ring is depleted!! What this means is that just a test of check =3D=3D refresh is not going to= be good enough to protect against all cases, so let me think about how to handle this... Jack On Thu, Mar 31, 2011 at 4:38 PM, Jack Vogel <jfvogel@gmail.com> wrote: > My validation group has some kind of hang... happens when they use a > certain number > of clients each running a stress test to the SUT, its like this, no real > handle on what's > wrong, if I knew what was wrong it would be half way or more to fixing it > :) > > The evidence shows you have hit the max clusters at one point, but have > freed most > of them back up again, there is no shortage right at this point. Your > previous data > showed a normal idle head/tail relationship.... > > Just as a data point, will you please disable msix, recompile and run in > MSI mode, > I just want to see if that makes a difference. Search in the driver for > em_enable_msix > and set it FALSE. > > Jack > > > > On Thu, Mar 31, 2011 at 4:06 PM, Arnaud Lacombe <lacombar@gmail.com>wrote= : > >> Hi, >> >> On Thu, Mar 31, 2011 at 6:28 PM, Jack Vogel <jfvogel@gmail.com> wrote: >> > OK, but those are not something present in this data, that was what I'= m >> > asking. >> > >> > So, you have a hang for which we do not have a certain cause. What do= es >> > netstat -m show? >> > >> # netstat -m >> 3073/74927/78000 mbufs in use (current/cache/total) >> 3070/29698/32768/32768 mbuf clusters in use (current/cache/total/max) >> 0/383 mbuf+clusters out of packet secondary zone in use (current/cache) >> 0/12800/12800/12800 4k (page size) jumbo clusters in use >> (current/cache/total/max) >> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) >> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) >> 6908K/129327K/136236K bytes allocated to network (current/cache/total) >> 0/1080/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >> 0/7/6656 sfbufs in use (current/peak/max) >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >> 0 calls to protocol drain routines >> >> Note that the mbuf allocation denial did not appended at once. It has >> been progressively increasing by block of ~200 over the 5h of uptime >> of the machine, until the current condition occurred. >> >> I have previously been trying to simulate the depletion and the hang, >> but the driver recovered. I assume the condition is met in >> em_local_timer() to refresh the ring, I'd still need to check that. >> >> - Arnaud >> >> > Jack >> > >> > >> > On Thu, Mar 31, 2011 at 3:15 PM, Arnaud Lacombe <lacombar@gmail.com> >> wrote: >> >> >> >> Hi, >> >> >> >> On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel@gmail.com> wrote= : >> >> > So, what is the evidence that the driver is stuck here? >> >> > >> >> About 800 pps (mostly SYN) present wire but never ever seen on em0, >> >> plus a couple of ARP reply, which still never hit em0, plus the >> >> `missed_packets' count increasing by the same 800 pps in the last >> >> hour. Is that enough ? >> >> >> >> - Arnaud >> >> >> >> ps: I forgot to add that MAC address on the wire are fine. >> >> >> >> > I see that next_to_check !=3D next_to_refresh, which is why the >> >> > local timer won't schedule anything. OH, and I also realized there >> >> > is a problem with local_timer anyway, it will run rxeof, but that >> won't >> >> > help >> >> > if you can't enter the loop, so I need to add some code at the top = to >> >> > call em_refresh_mbufs() when in this state. >> >> > >> >> > On this interrupt cause that you are focused upon, although its the= re >> in >> >> > the >> >> > design, I had talked with some of our most seasoned developers on >> both >> >> > the Windows and Linux side of the house, and NO one has ever used >> this >> >> > 'feature', because (and I'm quoting here) "there's no good use case >> for >> >> > it". >> >> > Meaning, there's always some simpler way of handling the issue. >> >> > >> >> > When you use MSIX you can't read causes btw, if you configured it, = it >> >> > would >> >> > mean you'd just get into the regular RX handler, same as always, so >> why >> >> > some special bother with this cause? >> >> > >> >> > On non-MSIX hardware there is just no particular reason to worry >> about >> >> > the >> >> > cause either, we can just handle the RX situation in the interrupt >> >> > handler. >> >> > >> >> > Jack >> >> > >> >> > >> >> > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar@gmail.com= > >> >> > wrote: >> >> >> >> >> >> Hi Jack, >> >> >> >> >> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar@gmail.co= m >> > >> >> >> wrote: >> >> >> > [...] >> >> >> > I'll remove part of the changes I made to keep only >> >> >> > `rx_forced_refill' >> >> >> > and the associated sysctl, re-run the tests and come back with >> >> >> > correct >> >> >> > value, hopefully in a few hours. >> >> >> > >> >> >> Here it is: >> >> >> >> >> >> # sysctl dev.em.0.%desc >> >> >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2 >> >> >> >> >> >> # sysctl dev.em.0.mac_stats.missed_packets >> >> >> dev.em.0.mac_stats.missed_packets: 917428 >> >> >> >> >> >> # sysctl dev.em.0.debug=3D1 >> >> >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE >> >> >> em0: hw tdh =3D 975, hw tdt =3D 975 >> >> >> em0: hw rdh =3D 884, hw rdt =3D 885 >> >> >> em0: Tx Queue Status =3D 0 >> >> >> em0: TX descriptors avail =3D 1024 >> >> >> em0: Tx Descriptors avail failure =3D 0 >> >> >> em0: RX discarded packets =3D 0 >> >> >> em0: RX Next to Check =3D 884 >> >> >> em0: RX Next to Refresh =3D 885 >> >> >> -> -1 >> >> >> >> >> >> So the taskqueue cannot be scheduled to run and the driver is stuc= k. >> >> >> >> >> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel@gmail.com> >> >> >> > wrote: >> >> >> >> Read the code in HEAD, em_local_timer() has a test of ALL the r= x >> >> >> >> queues >> >> >> >> and >> >> >> >> will schedule a task that refreshes mbufs if they are empty. Th= is >> >> >> >> has >> >> >> >> exactly the >> >> >> >> same effect as checking for some interrupt cause, a cause that = is >> >> >> >> not >> >> >> >> available >> >> >> >> when using MSIX on 82574, but this approach works for everythin= g. >> >> >> >> >> >> >> Can you please point me to a reference datasheet (or errata), >> provided >> >> >> by Intel, about the RX Overrun interrupt not being available with >> >> >> MSI-X on the 82574 ? >> >> >> >> >> >> Currently, I only have access to [0], which precises the following= : >> >> >> >> >> >> 7.4 Interrupts >> >> >> 7.4.2 MSI-X Mode >> >> >> [...] >> >> >> The following configuration and parameters are involved: >> >> >> =95 The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queu= es >> and >> >> >> other >> >> >> events to 5 interrupt vectors >> >> >> =95 The ICR[24:20] bits reflect specific interrupt causes >> >> >> =95 Five MSI-X interrupt vectors are provided (calculated based on >> four >> >> >> vectors for >> >> >> queues and one vector for other causes). The requested number of >> >> >> vectors >> >> >> is >> >> >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X >> >> >> capability >> >> >> structure of the function. >> >> >> >> >> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC) >> >> >> [...] >> >> >> >> >> >> about bit 24: >> >> >> >> >> >> Other Interrupt. Indicates one of the following interrupts was set= : >> >> >> =95 Link Status Change. >> >> >> =95 Receiver Overrun. >> >> >> =95 MDIO Access Complete. >> >> >> =95 Small Receive Packet Detected. >> >> >> =95 Receive ACK Frame Detected. >> >> >> =95 Manageability Event Detected. >> >> >> >> >> >> Thanks in advance, >> >> >> - Arnaud >> >> >> >> >> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf >> >> > >> >> > >> > >> > >> > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin1KKiPKEf_KquG0NrbqExDsGPU_tizam7tYV9Y>