From owner-freebsd-net@FreeBSD.ORG  Fri Apr  1 01:16:09 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EB43F106566B
	for <freebsd-net@freebsd.org>; Fri,  1 Apr 2011 01:16:09 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com
	[209.85.212.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 8D9B98FC0C
	for <freebsd-net@freebsd.org>; Fri,  1 Apr 2011 01:16:09 +0000 (UTC)
Received: by vws18 with SMTP id 18so2896425vws.13
	for <freebsd-net@freebsd.org>; Thu, 31 Mar 2011 18:16:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=KLSRdOfhaiV1/88z/hYM7IR5jRyk+OqHN+luJZeAFSs=;
	b=DSHxe8MtTaB1P/uoCuiRNec13NpVgHj3lZJtM0MhR9m0bRPYdTjLDoQfy8y1BkbtuK
	hZKiuRlp44USV65B9vqx+ib/rGDI3FCzF45WmU9Nl0eViIWKrxmuxvdEzh83oY6z246T
	sWvam/PnfY4Sx1T+SgwapX+47YUG7t8T+M0DY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=F5IAtCGKTcLVANtPqkvXKrxTNq1ipssdwg2Nl4roP+k/4QS88COFLYRoqDRsynNYs4
	yfa/hKwxDRwe1AUYPrSLFAh3X/eQWr1Byvhy61OZDOGuT94neZIMBKdaqq0onJaZyL/u
	Dq0Z1RlXrAQU0AES7JKM0/RnkpN3rhqIKlV30=
MIME-Version: 1.0
Received: by 10.52.94.48 with SMTP id cz16mr4345852vdb.173.1301620568561; Thu,
	31 Mar 2011 18:16:08 -0700 (PDT)
Received: by 10.52.167.6 with HTTP; Thu, 31 Mar 2011 18:16:08 -0700 (PDT)
In-Reply-To: <AANLkTin1KKiPKEf_KquG0NrbqExDsGPU_tizam7tYV9Y@mail.gmail.com>
References: <AANLkTin64gGxRituE2B+sfVpRXt2QetdNLaV7HCf0uNE@mail.gmail.com>
	<AANLkTi=OjzMrjCPZ2VFDBf6URTaMoAzQqXbxWLv3d9mW@mail.gmail.com>
	<AANLkTikvbvr+Y=Fh2fPVieHkTRix+ni61jVPct10NKfD@mail.gmail.com>
	<AANLkTina-MO4GuK66ZJN0hipp+VCa-CUxEz79rzRt-cZ@mail.gmail.com>
	<AANLkTi=OVSOitMvdjHexbv-fu0fA1WWOHo7gm-=MtPRf@mail.gmail.com>
	<AANLkTikmjmBKf9XUuSrYQz4T7xsR5ynvxHm2cjEDtFE+@mail.gmail.com>
	<AANLkTimut2BMxvhkkyREnK_izXek5tAT5jrw8tW+NKVY@mail.gmail.com>
	<AANLkTin1KKiPKEf_KquG0NrbqExDsGPU_tizam7tYV9Y@mail.gmail.com>
Date: Thu, 31 Mar 2011 18:16:08 -0700
Message-ID: <AANLkTi=0OkSLnz0cpv02Jrxz_piOhMT40m7xWK0NCiuH@mail.gmail.com>
From: Jack Vogel <jfvogel@gmail.com>
To: Arnaud Lacombe <lacombar@gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net@freebsd.org
Subject: Re: em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not
 setup receive structures"]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Apr 2011 01:16:10 -0000

I know how I'm going to handle this, am formulating code for it, should hav=
e
a
something that can be tested tomorrow, time to head out for the night..

Essentially, rather than just looking for equality, I will calculate the
number
of unrefreshed mbufs given the check/refresh values, and then call refresh
when anything is unrefreshed. This will happen in rxeof, but I will also pu=
t
back the rx interrupt trigger into local timer. I'm pretty sure this will b=
e
bullet proof, at least for this kind of hang.

Jack


On Thu, Mar 31, 2011 at 5:28 PM, Jack Vogel <jfvogel@gmail.com> wrote:

> You know what Arnaud, I've looked at the numbers again, and I suddenly sa=
w
> that next_to_check and next_to_refresh are NOT in a good state, exactly t=
he
> opposite, check is BEHIND refresh, which means the whole ring is empty, t=
he
> HEAD (next_to_check) is pointing at 929, but next_to_refresh is at 930,
> RIGHT
> IN FRONT of it, so the whole ring is depleted!!
>
> What this means is that just a test of check =3D=3D refresh is not going =
to be
> good
> enough to protect against all cases,  so let me think about how to handle
> this...
>
> Jack
>
>
>
> On Thu, Mar 31, 2011 at 4:38 PM, Jack Vogel <jfvogel@gmail.com> wrote:
>
>> My validation group has some kind of hang... happens when they use a
>> certain number
>> of clients each running a stress test to the SUT, its like this, no real
>> handle on what's
>> wrong, if I knew what was wrong it would be half way or more to fixing i=
t
>> :)
>>
>> The evidence shows you have hit the max clusters at one point, but have
>> freed most
>> of them back up again, there is no shortage right at this point. Your
>> previous data
>> showed a normal idle head/tail relationship....
>>
>> Just as a data point, will you please disable msix, recompile and run in
>> MSI mode,
>> I just want to see if that makes a difference. Search in the driver for
>> em_enable_msix
>> and set it FALSE.
>>
>> Jack
>>
>>
>>
>> On Thu, Mar 31, 2011 at 4:06 PM, Arnaud Lacombe <lacombar@gmail.com>wrot=
e:
>>
>>> Hi,
>>>
>>> On Thu, Mar 31, 2011 at 6:28 PM, Jack Vogel <jfvogel@gmail.com> wrote:
>>> > OK, but those are not something present in this data, that was what I=
'm
>>> > asking.
>>> >
>>> > So, you have a hang for which we do not have a certain cause.  What
>>> does
>>> > netstat -m show?
>>> >
>>> # netstat -m
>>> 3073/74927/78000 mbufs in use (current/cache/total)
>>> 3070/29698/32768/32768 mbuf clusters in use (current/cache/total/max)
>>> 0/383 mbuf+clusters out of packet secondary zone in use (current/cache)
>>> 0/12800/12800/12800 4k (page size) jumbo clusters in use
>>> (current/cache/total/max)
>>> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
>>> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
>>> 6908K/129327K/136236K bytes allocated to network (current/cache/total)
>>> 0/1080/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>>> 0/7/6656 sfbufs in use (current/peak/max)
>>> 0 requests for sfbufs denied
>>> 0 requests for sfbufs delayed
>>> 0 requests for I/O initiated by sendfile
>>> 0 calls to protocol drain routines
>>>
>>> Note that the mbuf allocation denial did not appended at once. It has
>>> been progressively increasing by block of ~200 over the 5h of uptime
>>> of the machine, until the current condition occurred.
>>>
>>> I have previously been trying to simulate the depletion and the hang,
>>> but the driver recovered. I assume the condition is met in
>>> em_local_timer() to refresh the ring, I'd still need to check that.
>>>
>>>  - Arnaud
>>>
>>> > Jack
>>> >
>>> >
>>> > On Thu, Mar 31, 2011 at 3:15 PM, Arnaud Lacombe <lacombar@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel@gmail.com>
>>> wrote:
>>> >> > So, what is the evidence that the driver is stuck here?
>>> >> >
>>> >> About 800 pps (mostly SYN) present wire but never ever seen on em0,
>>> >> plus a couple of ARP reply, which still never hit em0, plus the
>>> >> `missed_packets' count increasing by the same 800 pps in the last
>>> >> hour. Is that enough ?
>>> >>
>>> >>  - Arnaud
>>> >>
>>> >> ps: I forgot to add that MAC address on the wire are fine.
>>> >>
>>> >> > I see that next_to_check !=3D next_to_refresh, which is why the
>>> >> > local timer won't schedule anything. OH, and I also realized there
>>> >> > is a problem with local_timer anyway, it will run rxeof, but that
>>> won't
>>> >> > help
>>> >> > if you can't enter the loop, so I need to add some code at the top
>>> to
>>> >> > call em_refresh_mbufs() when in this state.
>>> >> >
>>> >> > On this interrupt cause that you are focused upon, although its
>>> there in
>>> >> > the
>>> >> > design, I had talked with some of our most seasoned developers on
>>> both
>>> >> > the Windows and Linux side of the house, and NO one has ever used
>>> this
>>> >> > 'feature', because (and I'm quoting here) "there's no good use cas=
e
>>> for
>>> >> > it".
>>> >> > Meaning, there's always some simpler way of handling the issue.
>>> >> >
>>> >> > When you use MSIX you can't read causes btw, if you configured it,
>>> it
>>> >> > would
>>> >> > mean you'd just get into the regular RX handler, same as always, s=
o
>>> why
>>> >> > some special bother with this cause?
>>> >> >
>>> >> > On non-MSIX hardware there is just no particular reason to worry
>>> about
>>> >> > the
>>> >> > cause either, we can just handle the RX situation in the interrupt
>>> >> > handler.
>>> >> >
>>> >> > Jack
>>> >> >
>>> >> >
>>> >> > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar@gmail.co=
m
>>> >
>>> >> > wrote:
>>> >> >>
>>> >> >> Hi Jack,
>>> >> >>
>>> >> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <
>>> lacombar@gmail.com>
>>> >> >> wrote:
>>> >> >> > [...]
>>> >> >> > I'll remove part of the changes I made to keep only
>>> >> >> > `rx_forced_refill'
>>> >> >> > and the associated sysctl, re-run the tests and come back with
>>> >> >> > correct
>>> >> >> > value, hopefully in a few hours.
>>> >> >> >
>>> >> >> Here it is:
>>> >> >>
>>> >> >> # sysctl dev.em.0.%desc
>>> >> >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
>>> >> >>
>>> >> >> # sysctl dev.em.0.mac_stats.missed_packets
>>> >> >> dev.em.0.mac_stats.missed_packets: 917428
>>> >> >>
>>> >> >> # sysctl dev.em.0.debug=3D1
>>> >> >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
>>> >> >> em0: hw tdh =3D 975, hw tdt =3D 975
>>> >> >> em0: hw rdh =3D 884, hw rdt =3D 885
>>> >> >> em0: Tx Queue Status =3D 0
>>> >> >> em0: TX descriptors avail =3D 1024
>>> >> >> em0: Tx Descriptors avail failure =3D 0
>>> >> >> em0: RX discarded packets =3D 0
>>> >> >> em0: RX Next to Check =3D 884
>>> >> >> em0: RX Next to Refresh =3D 885
>>> >> >>  -> -1
>>> >> >>
>>> >> >> So the taskqueue cannot be scheduled to run and the driver is
>>> stuck.
>>> >> >>
>>> >> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel@gmail.com>
>>> >> >> > wrote:
>>> >> >> >> Read the code in HEAD, em_local_timer() has a test of ALL the =
rx
>>> >> >> >> queues
>>> >> >> >> and
>>> >> >> >> will schedule a task that refreshes mbufs if they are empty.
>>> This
>>> >> >> >> has
>>> >> >> >> exactly the
>>> >> >> >> same effect as checking for some interrupt cause, a cause that
>>> is
>>> >> >> >> not
>>> >> >> >> available
>>> >> >> >> when using MSIX on 82574, but this approach works for
>>> everything.
>>> >> >> >>
>>> >> >> Can you please point me to a reference datasheet (or errata),
>>> provided
>>> >> >> by Intel, about the RX Overrun interrupt not being available with
>>> >> >> MSI-X on the 82574 ?
>>> >> >>
>>> >> >> Currently, I only have access to [0], which precises the followin=
g:
>>> >> >>
>>> >> >> 7.4 Interrupts
>>> >> >> 7.4.2 MSI-X Mode
>>> >> >> [...]
>>> >> >> The following configuration and parameters are involved:
>>> >> >> =95 The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx que=
ues
>>> and
>>> >> >> other
>>> >> >> events to 5 interrupt vectors
>>> >> >> =95 The ICR[24:20] bits reflect specific interrupt causes
>>> >> >> =95 Five MSI-X interrupt vectors are provided (calculated based o=
n
>>> four
>>> >> >> vectors for
>>> >> >> queues and one vector for other causes). The requested number of
>>> >> >> vectors
>>> >> >> is
>>> >> >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X
>>> >> >> capability
>>> >> >> structure of the function.
>>> >> >>
>>> >> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
>>> >> >> [...]
>>> >> >>
>>> >> >> about bit 24:
>>> >> >>
>>> >> >> Other Interrupt. Indicates one of the following interrupts was se=
t:
>>> >> >> =95 Link Status Change.
>>> >> >> =95 Receiver Overrun.
>>> >> >> =95 MDIO Access Complete.
>>> >> >> =95 Small Receive Packet Detected.
>>> >> >> =95 Receive ACK Frame Detected.
>>> >> >> =95 Manageability Event Detected.
>>> >> >>
>>> >> >> Thanks in advance,
>>> >> >>  - Arnaud
>>> >> >>
>>> >> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>