Date: Thu, 12 Apr 2012 21:38:49 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: jfv@freebsd.org, Jack Vogel <jfvogel@gmail.com>, net@freebsd.org Subject: Re: 82574L hangs (with r233708 e1000 driver). Message-ID: <20120412183849.GA2358@deviant.kiev.zoral.com.ua> In-Reply-To: <201204091219.39580.jhb@freebsd.org> References: <20120407133715.GU2358@deviant.kiev.zoral.com.ua> <CAFOYbc=hFg_jvohPVQrp4M%2BXQztoO6b-9Pop=PrVn6VxP6oaHQ@mail.gmail.com> <20120408051125.GA2358@deviant.kiev.zoral.com.ua> <201204091219.39580.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--HcXGYAkJUr6H84bb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 12:19:39PM -0400, John Baldwin wrote: > On Sunday, April 08, 2012 1:11:25 am Konstantin Belousov wrote: > > On Sat, Apr 07, 2012 at 04:22:07PM -0700, Jack Vogel wrote: > > > Make sure you have any firmware up to the latest available, if that d= oesn't > > > help > > > let me know and I'll check internally to see if there are any outstan= ding > > > issues > > > in shared code, that will be after the weekend. > >=20 > > I had BIOS rev. 151, after you hint I found rev. 154 on the site. > > Now BIOS reports itself as MTCDT10N.86A.0154.2012.0323.1601, > > March 23. > >=20 > > Unfortunately, upgrade did not changed anything in regard of hanging > > interface. >=20 > Does reverting 233708 make any difference? Have you tried futzing around= with > kgdb when it is hung to see what state the device is in (software state at > least)? It does, in a sense that without r233708 the interface becomes stuck almost immediately. I just upgraded to the e1000@r234154, which does not change much. I fiddled with the adapter state after the hang in kgdb more, and I noted something interesting. Apparently, tx works. When I ping the remote host from my suffering atom machine, remote host sees the packet. Also remote machine sees some udp traffic originating from the tom, like ntp queries. And, on receive, the atom board does receive interrupts, em0:rx 0 counter in vmstat -i increases. Even more fun, the sysctl dev.em.0.debug shows increasing hw rdh (as I understand, this is hardware 'last received' packet pointer for rx ring). So I looked at the packet descriptor at hw rdt index, and there I see (kgdb) p/x ((struct adapter *)0xffffff80010e4000)->rx_rings->rx_base[78] $11 =3D {buffer_addr =3D 0x12a128800, length =3D 0x5ea, csum =3D 0x3c2b, st= atus =3D 0x0,=20 errors =3D 0x0, special =3D 0x0} Apparently, the Descriptor Done bit is clear, so the em_rxeof() function breaks from the loop, not consuming the current packet. Also, it returns false due to DD bit clear. This prevents em_msix_rx() from scheduling taskqueue for processing. So apparent cause for the hang is missing DD bit in descriptor. I am not sure isn't all this is obvious for anybody who knows em internals, and were to go from there. --HcXGYAkJUr6H84bb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+HITkACgkQC3+MBN1Mb4h0XgCgrZCPC048gtwFEJIwmpwGFpvQ YxoAoNfy+YfHvHY4CDJOmOmmhI7Ifh7m =yI+Q -----END PGP SIGNATURE----- --HcXGYAkJUr6H84bb--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120412183849.GA2358>