From owner-freebsd-net@FreeBSD.ORG  Thu Mar 31 22:28:48 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C01BF1065674
	for <freebsd-net@freebsd.org>; Thu, 31 Mar 2011 22:28:48 +0000 (UTC)
	(envelope-from jfvogel@gmail.com)
Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com
	[209.85.212.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 6EC3D8FC0A
	for <freebsd-net@freebsd.org>; Thu, 31 Mar 2011 22:28:48 +0000 (UTC)
Received: by vws18 with SMTP id 18so2803018vws.13
	for <freebsd-net@freebsd.org>; Thu, 31 Mar 2011 15:28:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=5GGNiMgfgA5eo7oR2J4NNSPchSMeo/dC7DcpsTPH+qI=;
	b=j5LumbBzoxQUleUFwZoa5SqqQA2n+VSza3sqzRQPUgY6/TLqcLsc1LmMIsMzdS12dO
	Fqj4sHl/MrWUoIr++sy1rPBhNe449/CHBxSzb1vpsd2vMeVwyhJQPp55zDrLvhJoCEYe
	m9PW26Tv3+CNDVgrlajyNgZvQKFzoQAWXIycE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=k9VTAKu99G4pbs3C2ltuRMIToh9DwvQqn/wS4LjM6Nt66aJZ4EXP1ifqlQQn3sDRTu
	bGr6VkENGo502Eds+qjNzGHzolvSHsUoSKvEW7D1DdgHfKQV3sUw+oAUDYoK15h2qdax
	ZEee/tuiC279baktX3lxqqnqWbyg5nsut8T2o=
MIME-Version: 1.0
Received: by 10.52.93.177 with SMTP id cv17mr4449560vdb.133.1301610526837;
	Thu, 31 Mar 2011 15:28:46 -0700 (PDT)
Received: by 10.52.167.6 with HTTP; Thu, 31 Mar 2011 15:28:46 -0700 (PDT)
In-Reply-To: <AANLkTina-MO4GuK66ZJN0hipp+VCa-CUxEz79rzRt-cZ@mail.gmail.com>
References: <AANLkTin64gGxRituE2B+sfVpRXt2QetdNLaV7HCf0uNE@mail.gmail.com>
	<AANLkTi=OjzMrjCPZ2VFDBf6URTaMoAzQqXbxWLv3d9mW@mail.gmail.com>
	<AANLkTikvbvr+Y=Fh2fPVieHkTRix+ni61jVPct10NKfD@mail.gmail.com>
	<AANLkTina-MO4GuK66ZJN0hipp+VCa-CUxEz79rzRt-cZ@mail.gmail.com>
Date: Thu, 31 Mar 2011 15:28:46 -0700
Message-ID: <AANLkTi=OVSOitMvdjHexbv-fu0fA1WWOHo7gm-=MtPRf@mail.gmail.com>
From: Jack Vogel <jfvogel@gmail.com>
To: Arnaud Lacombe <lacombar@gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net@freebsd.org
Subject: Re: em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not
 setup receive structures"]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Mar 2011 22:28:48 -0000

OK, but those are not something present in this data, that was what I'm
asking.

So, you have a hang for which we do not have a certain cause.  What does
netstat -m show?

Jack


On Thu, Mar 31, 2011 at 3:15 PM, Arnaud Lacombe <lacombar@gmail.com> wrote:

> Hi,
>
> On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel@gmail.com> wrote:
> > So, what is the evidence that the driver is stuck here?
> >
> About 800 pps (mostly SYN) present wire but never ever seen on em0,
> plus a couple of ARP reply, which still never hit em0, plus the
> `missed_packets' count increasing by the same 800 pps in the last
> hour. Is that enough ?
>
>  - Arnaud
>
> ps: I forgot to add that MAC address on the wire are fine.
>
> > I see that next_to_check !=3D next_to_refresh, which is why the
> > local timer won't schedule anything. OH, and I also realized there
> > is a problem with local_timer anyway, it will run rxeof, but that won't
> help
> > if you can't enter the loop, so I need to add some code at the top to
> > call em_refresh_mbufs() when in this state.
> >
> > On this interrupt cause that you are focused upon, although its there i=
n
> the
> > design, I had talked with some of our most seasoned developers on both
> > the Windows and Linux side of the house, and NO one has ever used this
> > 'feature', because (and I'm quoting here) "there's no good use case for
> it".
> > Meaning, there's always some simpler way of handling the issue.
> >
> > When you use MSIX you can't read causes btw, if you configured it, it
> would
> > mean you'd just get into the regular RX handler, same as always, so why
> > some special bother with this cause?
> >
> > On non-MSIX hardware there is just no particular reason to worry about
> the
> > cause either, we can just handle the RX situation in the interrupt
> handler.
> >
> > Jack
> >
> >
> > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar@gmail.com>
> wrote:
> >>
> >> Hi Jack,
> >>
> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar@gmail.com>
> >> wrote:
> >> > [...]
> >> > I'll remove part of the changes I made to keep only `rx_forced_refil=
l'
> >> > and the associated sysctl, re-run the tests and come back with corre=
ct
> >> > value, hopefully in a few hours.
> >> >
> >> Here it is:
> >>
> >> # sysctl dev.em.0.%desc
> >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
> >>
> >> # sysctl dev.em.0.mac_stats.missed_packets
> >> dev.em.0.mac_stats.missed_packets: 917428
> >>
> >> # sysctl dev.em.0.debug=3D1
> >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
> >> em0: hw tdh =3D 975, hw tdt =3D 975
> >> em0: hw rdh =3D 884, hw rdt =3D 885
> >> em0: Tx Queue Status =3D 0
> >> em0: TX descriptors avail =3D 1024
> >> em0: Tx Descriptors avail failure =3D 0
> >> em0: RX discarded packets =3D 0
> >> em0: RX Next to Check =3D 884
> >> em0: RX Next to Refresh =3D 885
> >>  -> -1
> >>
> >> So the taskqueue cannot be scheduled to run and the driver is stuck.
> >>
> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel@gmail.com>
> wrote:
> >> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx
> queues
> >> >> and
> >> >> will schedule a task that refreshes mbufs if they are empty. This h=
as
> >> >> exactly the
> >> >> same effect as checking for some interrupt cause, a cause that is n=
ot
> >> >> available
> >> >> when using MSIX on 82574, but this approach works for everything.
> >> >>
> >> Can you please point me to a reference datasheet (or errata), provided
> >> by Intel, about the RX Overrun interrupt not being available with
> >> MSI-X on the 82574 ?
> >>
> >> Currently, I only have access to [0], which precises the following:
> >>
> >> 7.4 Interrupts
> >> 7.4.2 MSI-X Mode
> >> [...]
> >> The following configuration and parameters are involved:
> >> =95 The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues a=
nd
> >> other
> >> events to 5 interrupt vectors
> >> =95 The ICR[24:20] bits reflect specific interrupt causes
> >> =95 Five MSI-X interrupt vectors are provided (calculated based on fou=
r
> >> vectors for
> >> queues and one vector for other causes). The requested number of vecto=
rs
> >> is
> >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X
> >> capability
> >> structure of the function.
> >>
> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
> >> [...]
> >>
> >> about bit 24:
> >>
> >> Other Interrupt. Indicates one of the following interrupts was set:
> >> =95 Link Status Change.
> >> =95 Receiver Overrun.
> >> =95 MDIO Access Complete.
> >> =95 Small Receive Packet Detected.
> >> =95 Receive ACK Frame Detected.
> >> =95 Manageability Event Detected.
> >>
> >> Thanks in advance,
> >>  - Arnaud
> >>
> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
> >
> >
>