Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Apr 2012 21:38:49 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        jfv@freebsd.org, Jack Vogel <jfvogel@gmail.com>, net@freebsd.org
Subject:   Re: 82574L hangs (with r233708 e1000 driver).
Message-ID:  <20120412183849.GA2358@deviant.kiev.zoral.com.ua>
In-Reply-To: <201204091219.39580.jhb@freebsd.org>
References:  <20120407133715.GU2358@deviant.kiev.zoral.com.ua> <CAFOYbc=hFg_jvohPVQrp4M%2BXQztoO6b-9Pop=PrVn6VxP6oaHQ@mail.gmail.com> <20120408051125.GA2358@deviant.kiev.zoral.com.ua> <201204091219.39580.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--HcXGYAkJUr6H84bb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 09, 2012 at 12:19:39PM -0400, John Baldwin wrote:
> On Sunday, April 08, 2012 1:11:25 am Konstantin Belousov wrote:
> > On Sat, Apr 07, 2012 at 04:22:07PM -0700, Jack Vogel wrote:
> > > Make sure you have any firmware up to the latest available, if that d=
oesn't
> > > help
> > > let me know and I'll check internally to see if there are any outstan=
ding
> > > issues
> > > in shared code,  that will be after the weekend.
> >=20
> > I had BIOS rev. 151, after you hint I found rev. 154 on the site.
> > Now BIOS reports itself as MTCDT10N.86A.0154.2012.0323.1601,
> > March 23.
> >=20
> > Unfortunately, upgrade did not changed anything in regard of hanging
> > interface.
>=20
> Does reverting 233708 make any difference?  Have you tried futzing around=
 with
> kgdb when it is hung to see what state the device is in (software state at
> least)?
It does, in a sense that without r233708 the interface becomes stuck
almost immediately. I just upgraded to the e1000@r234154, which does not
change much.

I fiddled with the adapter state after the hang in kgdb more, and I
noted something interesting. Apparently, tx works. When I ping the remote
host from my suffering atom machine, remote host sees the packet. Also
remote machine sees some udp traffic originating from the tom, like
ntp queries.

And, on receive, the atom board does receive interrupts, em0:rx 0 counter
in vmstat -i increases. Even more fun, the sysctl dev.em.0.debug
shows increasing hw rdh (as I understand, this is hardware 'last
received' packet pointer for rx ring). So I looked at the packet
descriptor at hw rdt index, and there I see
(kgdb) p/x ((struct adapter *)0xffffff80010e4000)->rx_rings->rx_base[78]
$11 =3D {buffer_addr =3D 0x12a128800, length =3D 0x5ea, csum =3D 0x3c2b, st=
atus =3D 0x0,=20
  errors =3D 0x0, special =3D 0x0}

Apparently, the Descriptor Done bit is clear, so the em_rxeof() function
breaks from the loop, not consuming the current packet. Also, it returns
false due to DD bit clear. This prevents em_msix_rx() from scheduling
taskqueue for processing. So apparent cause for the hang is missing
DD bit in descriptor.

I am not sure isn't all this is obvious for anybody who knows em
internals, and were to go from there.

--HcXGYAkJUr6H84bb
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk+HITkACgkQC3+MBN1Mb4h0XgCgrZCPC048gtwFEJIwmpwGFpvQ
YxoAoNfy+YfHvHY4CDJOmOmmhI7Ifh7m
=yI+Q
-----END PGP SIGNATURE-----

--HcXGYAkJUr6H84bb--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120412183849.GA2358>