From owner-freebsd-current@FreeBSD.ORG  Thu May  5 14:21:56 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C600D106566B
	for <current@freebsd.org>; Thu,  5 May 2011 14:21:56 +0000 (UTC)
	(envelope-from lacombar@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 7A5A78FC17
	for <current@freebsd.org>; Thu,  5 May 2011 14:21:56 +0000 (UTC)
Received: by iwn33 with SMTP id 33so2598176iwn.13
	for <current@freebsd.org>; Thu, 05 May 2011 07:21:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=IRZK51UvsUgLQwHL9eth467bBeoOSswtUyIAvMz4gCM=;
	b=DcVK5mqnDR/Di64rbtvj1L1T7RSdSLPud48a3U6HZWBrNXOPWx9HPSEr1BZMhsj4Mb
	dBV+dei5l1YNtpx27FsXhU1c6gNHgSmvAsNlQXZYH4uSZo8oaSrFtqUrme3Bhhpx5rUA
	Hd9s78XV9BbCAuk3x4/Nei1zBjZYUa7qlOYNM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=salMxdOJQtugS/VsiRt78Pvwki1hjPJsNlcY32Yu5nrWX/4tHzm+I9eakdouKnPq50
	GrnKtEs4LITF+NT6Knl/cfjHJds5m0TkoXcLshzxbEkAFSEZbnHXkFSXTh3L9EZSMqLU
	jU5qJmxIjdLC/Nt9wX2vs5xzNzEX6F62DnoFc=
MIME-Version: 1.0
Received: by 10.42.150.8 with SMTP id y8mr987984icv.471.1304605315614; Thu, 05
	May 2011 07:21:55 -0700 (PDT)
Received: by 10.42.167.5 with HTTP; Thu, 5 May 2011 07:21:55 -0700 (PDT)
In-Reply-To: <BANLkTikkbpW6_jE5QznGjAt4Zcpee0RagQ@mail.gmail.com>
References: <BANLkTinrfZbO+MUDDuzsoaN1y-=_O8LgNA@mail.gmail.com>
	<4D94A354.9080903@sentex.net>
	<AANLkTik_XPsVWL-KqHkPic1KQ0SdCSk6u_9ykRefi3VE@mail.gmail.com>
	<BANLkTi=K5ASG9TWLAh5r+zo9Wy1stMf9WA@mail.gmail.com>
	<BANLkTikPPzxZ6XRAaqrvdeXBp=Ydvz7hNg@mail.gmail.com>
	<BANLkTi=rhZ0dyO6Zq13jY6-NKVE8n24YyQ@mail.gmail.com>
	<4DC07013.9070707@gmx.net>
	<BANLkTi=DmQsVvJOaoxMr5GPOLkjs7sdTxQ@mail.gmail.com>
	<4DC078BD.9080908@gmx.net>
	<BANLkTin1ykoo80+9iWe+g5ib1DXw+05BgQ@mail.gmail.com>
	<BANLkTi=STPT13-50dxMRgjLP_pyxL9Utyw@mail.gmail.com>
	<BANLkTikX8gs7Ln2KLZkA=MyieeCR+zKXzQ@mail.gmail.com>
	<BANLkTikj-wSOFWQX9Y_yN54Q_jk-=vD3LA@mail.gmail.com>
	<BANLkTin0ANtbWGv4CTr+O5xEL58hVRDefg@mail.gmail.com>
	<BANLkTikzpjxe+cMYiTRak0B0tnkhrW+Bow@mail.gmail.com>
	<BANLkTikUJOD+tzYoiHCoWHrD36PxLQgN7A@mail.gmail.com>
	<BANLkTin2j3QzO0pwVHe9Nm-L8otEf9pcbg@mail.gmail.com>
	<BANLkTinmKH40yx5Mgu9zgQ2qEF2O-n6HMQ@mail.gmail.com>
	<BANLkTikehcbxm0MQtb0SQ0giSfhmkHw99A@mail.gmail.com>
	<BANLkTikkbpW6_jE5QznGjAt4Zcpee0RagQ@mail.gmail.com>
Date: Thu, 5 May 2011 10:21:55 -0400
Message-ID: <BANLkTimVc2Chq9iKrRVCBfqg6WPmt_O=6w@mail.gmail.com>
From: Arnaud Lacombe <lacombar@gmail.com>
To: Jack Vogel <jfvogel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Olivier Smedts <olivier@gid0.org>,
	FreeBSD current mailing list <current@freebsd.org>
Subject: Re: problems with em(4) since update to driver 7.2.2
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 14:21:56 -0000

Hi,

On Thu, May 5, 2011 at 2:59 AM, Jack Vogel <jfvogel@gmail.com> wrote:
> OK, but what this does not explain is why I do not see this if
> its so easily reproduced, what causes the failure case, any idea?
>
It is completely random as it depends on the content of the stack. I
spent 3 or 4 hours trying to reproduce it using different approach on
different platform, with different version of the code and failed. And
once `error' was explicitly colored, it popped up. That's the beauty
of error related with uninitialized variable.

 - Arnaud

> As I said, given the code was not feasible for igb anyway I would not
> be unhappy about returning to the old way of doing things.
>
I am not sure what you mean by "old way of doing thing", but I'd guess
that the ring only need to be setup on a few occasion, like
initialization and MTU transition. I'm not sure either how other
driver manage their ring.

> Jack
>
>
> On Wed, May 4, 2011 at 11:03 PM, Arnaud Lacombe <lacombar@gmail.com> wrot=
e:
>>
>> Hi,
>>
>> On Thu, May 5, 2011 at 1:20 AM, Arnaud Lacombe <lacombar@gmail.com> wrot=
e:
>> > Hi,
>> >
>> > On Wed, May 4, 2011 at 5:38 PM, Jack Vogel <jfvogel@gmail.com> wrote:
>> >> I have had my validation engineer busy all day, we have tried both
>> >> a 9 kernel as well as 8.2, =A0using the code from HEAD, and we
>> >> cannot reproduce this problem.
>> >>
>> > Actually, it can be trivially reproduced by tainting `error'. As it is
>> > uninitialized in HEAD, it's value can be _anything_, so let's mark it
>> > as explicitly invalid.
>> >
>> > diff -u ./if_em.c /data/src/freebsd/em-7.2.2/src/if_em.c
>> > --- ./if_em.c =A0 2011-02-18 01:18:23.000000000 -0500
>> > +++ /data/src/freebsd/em-7.2.2/src/if_em.c =A0 =A0 =A02011-05-05
>> > 01:12:01.000000000 -0400
>> > @@ -3912,7 +3912,7 @@
>> > =A0 =A0 =A0 =A0struct =A0adapter =A0 =A0 =A0 =A0 *adapter =3D rxr->ada=
pter;
>> > =A0 =A0 =A0 =A0struct em_buffer =A0 =A0 =A0 =A0*rxbuf;
>> > =A0 =A0 =A0 =A0bus_dma_segment_t =A0 =A0 =A0 seg[1];
>> > - =A0 =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 i, j, nsegs,=
 error;
>> > + =A0 =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 i, j, nsegs,=
 error =3D -1;
>> >
>> > The error pointed out in this thread pops up in the next boot.
>> >
>> I put a call to kdb_enter() at the beginning of the function, helped
>> with some textdump I got all the backtrace [0] for all the time
>> em_setup_receive_ring() is called. All are exactly the same:
>>
>> kdb_enter_why(0,c09f6511,f391aaa8,c09be1e2,c09f6511,...) at
>> kdb_enter_why+0x3b
>> kdb_enter(c09f6511,0,3810,ffffffff,5dc,...) at kdb_enter+0x19
>> em_setup_receive_ring(c3c8d600,c3c8d7a4,c3c96004,310000fa,c3c8d600,...)
>> at em_setup_receive_ring+0x22
>> em_setup_receive_structures(c3c96000,f15f2000,38,8100,3,...) at
>> em_setup_receive_structures+0x26
>> em_init_locked(c3c96000,0,c09f5de5,414,10000,...) at em_init_locked+0x2f=
2
>> em_ioctl(c3c7d000,80206934,c3ce9d00,c07b7a0b,c3f2a230,...) at
>> em_ioctl+0x1c3
>> ifhwioctl(c3f2a230,f391ac34,c07b7a0b,c3f3e3d0,c08df1c0,...) at
>> ifhwioctl+0x4b8
>> ifioctl(c3f3e3d0,80206934,c3ce9d00,c3f2a230,c3f2a230,...) at ifioctl+0x8=
2
>> kern_ioctl(c3f2a230,3,80206934,c3ce9d00,c3ce9d00,...) at kern_ioctl+0xa8
>> ioctl(c3f2a230,f391acf8,c,c,f391ad2c,...) at ioctl+0xc5
>> syscall(f391ad38) at syscall+0x17d
>> Xint0x80_syscall() at Xint0x80_syscall+0x20
>> --- syscall (54, FreeBSD ELF32, ioctl), eip =3D 0x4816ee23, esp =3D
>> 0xbfbfe67c, ebp =3D 0xbfbfe698 ---
>>
>> This fully explain why the main loop in em_setup_receive_ring() is
>> never entered, as we always verify `j =3D=3D rxr->next_to_check' (provid=
ed
>> that mbuf have been refreshed if some packet were transfered) and
>> return the value on the stack. As of now, beside changing the
>> call-site of em_setup_receive_ring() to ensure it is never re-entered,
>> I'd guess that the patch I sent earlier today, is the only way to
>> ensure that no junk is returned.
>>
>> I'd guess that the driver _is_ able to transmit, if the code was not
>> explicitly calling em_stop() upon em_setup_receive_structures()
>> failure.
>>
>> =A0- Arnaud
>>
>> [0]: I wish that would have been as easy as in Linux, where a WARN()
>> call do all the job automatically, but still, I should not hope for
>> that much unless I am the one implementing it ... yes, free whining,
>> it's 2a.m. ...
>>
>> > =A0- Arnaud
>> >
>> >> The data your netstat -m shows suggests to me that what's happening
>> >> is somehow setup of the receive ring is running more than once maybe?=
?
>> >>
>> >> You asked at one point how this could go into STABLE, well, because
>> >> not only here at Intel, but at lots of external customers this code h=
as
>> >> been
>> >> used and tested thoroughly.
>> >>
>> >> I am not calling into question your problem, but until I understand
>> >> what it
>> >> is I cannot "fix" it :)
>> >>
>> >> The thing I am guessing right now is the culprit is the setup code, t=
he
>> >> reason
>> >> is that when I ported to the igb driver I found that it did not work =
on
>> >> our
>> >> newer
>> >> hardware, and so I went back to the older version of setup for igb.
>> >> Now,
>> >> even
>> >> though I have not seen hardware fail with em, maybe there is some.
>> >>
>> >> To help me give me a complete pciconf -lv, and if its a namebrand
>> >> system
>> >> tell me that, including all hardware in it.
>> >>
>> >> If you like Olivier I can make a version of em for you that also
>> >> reverts the
>> >> setup code the way I did for igb, see if that fixes it for you?
>> >>
>> >> Thanks for your patience,
>> >>
>> >> Jack
>> >> _______________________________________________
>> >> freebsd-current@freebsd.org mailing list
>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> >> To unsubscribe, send any mail to
>> >> "freebsd-current-unsubscribe@freebsd.org"
>> >>
>> >
>
>