From owner-freebsd-net@FreeBSD.ORG Fri Aug 10 08:26:23 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6872B106564A; Fri, 10 Aug 2012 08:26:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id C696C8FC08; Fri, 10 Aug 2012 08:26:21 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q7A8QKRW086764; Fri, 10 Aug 2012 11:26:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q7A8Q848009277; Fri, 10 Aug 2012 11:26:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q7A8Q8Eh009276; Fri, 10 Aug 2012 11:26:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 10 Aug 2012 11:26:08 +0300 From: Konstantin Belousov To: Barney Cordoba Message-ID: <20120810082608.GB2425@deviant.kiev.zoral.com.ua> References: <1336775069.17927.YahooMailClassic@web126002.mail.ne1.yahoo.com> <1344525935.85341.YahooMailClassic@web121605.mail.ne1.yahoo.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8GpibOaaTibBMecb" Content-Disposition: inline In-Reply-To: <1344525935.85341.YahooMailClassic@web121605.mail.ne1.yahoo.com> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: jfv@freebsd.org, Jack Vogel , John Baldwin , net@freebsd.org Subject: Re: 82574L hangs (with r233708 e1000 driver). X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Aug 2012 08:26:23 -0000 --8GpibOaaTibBMecb Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 09, 2012 at 08:25:35AM -0700, Barney Cordoba wrote: >=20 >=20 > --- On Fri, 5/11/12, Barney Cordoba wrote: >=20 > > From: Barney Cordoba > > Subject: Re: 82574L hangs (with r233708 e1000 driver). > > To: "John Baldwin" , "Konstantin Belousov" > > Cc: jfv@freebsd.org, "Jack Vogel" , net@freebsd.org > > Date: Friday, May 11, 2012, 6:24 PM > >=20 > >=20 > > --- On Tue, 5/8/12, Konstantin Belousov > > wrote: > >=20 > > > From: Konstantin Belousov > > > Subject: Re: 82574L hangs (with r233708 e1000 driver). > > > To: "John Baldwin" > > > Cc: jfv@freebsd.org, > > "Jack Vogel" , > > net@freebsd.org > > > Date: Tuesday, May 8, 2012, 4:24 AM > > > On Mon, May 07, 2012 at 01:44:57PM > > > -0400, John Baldwin wrote: > > > > On Friday, May 04, 2012 6:18:19 pm Konstantin > > Belousov > > > wrote: > > > > > On Fri, May 04, 2012 at 11:30:22AM -0400, > > John > > > Baldwin wrote: > > > > > > On Tuesday, May 01, 2012 12:21:21 pm > > > Konstantin Belousov wrote: > > > > > > > On Thu, Apr 12, 2012 at 09:38:49PM > > > +0300, Konstantin Belousov wrote: > > > > > > > > On Mon, Apr 09, 2012 at > > 12:19:39PM > > > -0400, John Baldwin wrote: > > > > > > > > > On Sunday, April 08, > > 2012 > > > 1:11:25 am Konstantin Belousov wrote: > > > > > > > > > > On Sat, Apr 07, 2012 > > at > > > 04:22:07PM -0700, Jack Vogel wrote: > > > > > > > > > > > Make sure you > > have > > > any firmware up to the latest available, if that=20 > > > > > > doesn't > > > > > > > > > > > help > > > > > > > > > > > let me know and > > I'll > > > check internally to see if there are any=20 > > > > > > outstanding > > > > > > > > > > > issues > > > > > > > > > > > in shared > > > code,=9A that will be after the weekend. > > > > > > > > > >=20 > > > > > > > > > > I had BIOS rev. > > 151, > > > after you hint I found rev. 154 on the site. > > > > > > > > > > Now BIOS reports > > itself > > > as MTCDT10N.86A.0154.2012.0323.1601, > > > > > > > > > > March 23. > > > > > > > > > >=20 > > > > > > > > > > Unfortunately, > > upgrade > > > did not changed anything in regard of hanging > > > > > > > > > > interface. > > > > > > > > >=20 > > > > > > > > > Does reverting 233708 > > make any > > > difference?=9A Have you tried futzing=20 > > > > > > around with > > > > > > > > > kgdb when it is hung to > > see > > > what state the device is in (software state=20 > > > > > > at > > > > > > > > > least)? > > > > > > > > It does, in a sense that > > without > > > r233708 the interface becomes stuck > > > > > > > > almost immediately. I just > > upgraded > > > to the e1000@r234154, which does not > > > > > > > > change much. > > > > > > > >=20 > > > > > > > > I fiddled with the adapter > > state > > > after the hang in kgdb more, and I > > > > > > > > noted something interesting. > > > Apparently, tx works. When I ping the remote > > > > > > > > host from my suffering atom > > > machine, remote host sees the packet. Also > > > > > > > > remote machine sees some udp > > > traffic originating from the tom, like > > > > > > > > ntp queries. > > > > > > > >=20 > > > > > > > > And, on receive, the atom > > board > > > does receive interrupts, em0:rx 0 counter > > > > > > > > in vmstat -i increases. Even > > more > > > fun, the sysctl dev.em.0.debug > > > > > > > > shows increasing hw rdh (as I > > > understand, this is hardware 'last > > > > > > > > received' packet pointer for > > rx > > > ring). So I looked at the packet > > > > > > > > descriptor at hw rdt index, > > and > > > there I see > > > > > > > > (kgdb) p/x ((struct adapter > > > *)0xffffff80010e4000)->rx_rings->rx_base[78] > > > > > > > > $11 =3D {buffer_addr =3D > > 0x12a128800, > > > length =3D 0x5ea, csum =3D 0x3c2b, status =3D=20 > > > > > > 0x0,=20 > > > > > > > >=9A=9A=9Aerrors =3D 0x0, > > > special =3D 0x0} > > > > > > > >=20 > > > > > > > > Apparently, the Descriptor > > Done bit > > > is clear, so the em_rxeof() function > > > > > > > > breaks from the loop, not > > consuming > > > the current packet. Also, it returns > > > > > > > > false due to DD bit clear. > > This > > > prevents em_msix_rx() from scheduling > > > > > > > > taskqueue for processing. So > > > apparent cause for the hang is missing > > > > > > > > DD bit in descriptor. > > > > > > > >=20 > > > > > > > > I am not sure isn't all this > > is > > > obvious for anybody who knows em > > > > > > > > internals, and were to go > > from > > > there. > > > > > > >=20 > > > > > > > Ok, nobody cares. > > > > > > >=20 > > > > > > > Below is the workaround I use to > > prevent > > > the interface wedging. > > > > > > > It seems that the sole PCI register > > read > > > (namely, the rx ring head read) > > > > > > > and consequent recheck of the > > descriptor > > > status greatly reduce the > > > > > > > likelihood of the issue. > > Unfortunately, > > > the read does not eliminate > > > > > > > the hang completely. So it is not > > some > > > PCIe coherency problem. > > > > > > >=20 > > > > > > > With the patch applied, I am able > > to > > > copy around blu-ray images, while > > > > > > > previously the interface hang in > > 20-30 > > > seconds of 100Mbit/s traffic. > > > > > > > Sometimes the messages are > > printed: > > > > > > > em0: Workaround: head 1018 tail > > 1002 cur > > > 1010 > > > > > > > em0: Workaround: head 976 tail 973 > > cur > > > 974 > > > > > > > em0: Workaround: head 950 tail 939 > > cur > > > 946 > > > > > > > em0: Workaround: head 435 tail 419 > > cur > > > 426 > > > > > > >=20 > > > > > > > Machine is still dead due to > > random > > > memory corruption which I see, in > > > > > > > particular, pmap sometimes read > > garbage > > > from PTEs. I have no idea is > > > > > > > it related to em0 rx descriptor > > missed > > > writes, or is a different issue. > > > > > >=20 > > > > > > Humm, so if I'm reading this correctly, > > the > > > card "skips" a receive > > > > > > descriptor and stores a packet at the > > next > > > descriptor?=9A That's just > > > > > > bizarre. > > > > > Either this, or it does store the packet but > > > 'forgots' to update the > > > > > rx descriptor. I think that your > > interpretation is > > > closer to reality, > > > > > since I get sustained 20MB/s over ssh with > > the > > > patch even when workaround > > > > > activates. The lost packets probably should > > cause > > > retransmit and speed > > > > > drop. > > > >=20 > > > > This is just weird.=9A I wonder if there is a > > known > > > errata for this? > > > > This really seems to be broken hardware and not a > > > driver issue. > > > I was not able to find anything even remotely > > resembling the > > > described > > > behaviour, in the publically available 82574L > > specification > > > update. I looked > > > at rev. 3.5, dated January 2012. > > >=20 > > > I may indeed give up and relocate the hardware into > > trash, > > > but it would be > > > pity, since this is new shiny Intel Atom 2800 m/b. I am > > not > > > sure I can give > > > convincing arguments to supplier for warranty > > replacement. > > >=20 > > > And, while I booted Debian to apply f/w fix Jack > > > recommended, I did > > > quick test and interface looked stable. > > >=20 > > >=20 > >=20 > > FWIW, I've got an X7SPE-HF-D525 MB with 82574L running on a > > 7.0 driver > > that seems to work pretty well. It panics once in a blue > > moon when we > > overload it (like 200Mb/s of traffic) but it generally works > > ok. > >=20 > > BC >=20 > Has anything been done or patched regarding this problem? Yes, it was fixed by replacing the hardware (by the same model). --8GpibOaaTibBMecb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlAkxaAACgkQC3+MBN1Mb4hQWgCgn7gQMIJFo0Y+DuiLnm0WBdc7 h84AoJqsNNTQ57ouuQiFDuoVg230M8Ma =/eWE -----END PGP SIGNATURE----- --8GpibOaaTibBMecb--