From owner-freebsd-net@FreeBSD.ORG Thu Apr 12 18:38:54 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7564B106564A; Thu, 12 Apr 2012 18:38:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 0D6798FC16; Thu, 12 Apr 2012 18:38:53 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q3CIcoOe059159; Thu, 12 Apr 2012 21:38:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q3CIcn26082486; Thu, 12 Apr 2012 21:38:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q3CIcnxS082485; Thu, 12 Apr 2012 21:38:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 12 Apr 2012 21:38:49 +0300 From: Konstantin Belousov To: John Baldwin Message-ID: <20120412183849.GA2358@deviant.kiev.zoral.com.ua> References: <20120407133715.GU2358@deviant.kiev.zoral.com.ua> <20120408051125.GA2358@deviant.kiev.zoral.com.ua> <201204091219.39580.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="HcXGYAkJUr6H84bb" Content-Disposition: inline In-Reply-To: <201204091219.39580.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: jfv@freebsd.org, Jack Vogel , net@freebsd.org Subject: Re: 82574L hangs (with r233708 e1000 driver). X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Apr 2012 18:38:54 -0000 --HcXGYAkJUr6H84bb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 09, 2012 at 12:19:39PM -0400, John Baldwin wrote: > On Sunday, April 08, 2012 1:11:25 am Konstantin Belousov wrote: > > On Sat, Apr 07, 2012 at 04:22:07PM -0700, Jack Vogel wrote: > > > Make sure you have any firmware up to the latest available, if that d= oesn't > > > help > > > let me know and I'll check internally to see if there are any outstan= ding > > > issues > > > in shared code, that will be after the weekend. > >=20 > > I had BIOS rev. 151, after you hint I found rev. 154 on the site. > > Now BIOS reports itself as MTCDT10N.86A.0154.2012.0323.1601, > > March 23. > >=20 > > Unfortunately, upgrade did not changed anything in regard of hanging > > interface. >=20 > Does reverting 233708 make any difference? Have you tried futzing around= with > kgdb when it is hung to see what state the device is in (software state at > least)? It does, in a sense that without r233708 the interface becomes stuck almost immediately. I just upgraded to the e1000@r234154, which does not change much. I fiddled with the adapter state after the hang in kgdb more, and I noted something interesting. Apparently, tx works. When I ping the remote host from my suffering atom machine, remote host sees the packet. Also remote machine sees some udp traffic originating from the tom, like ntp queries. And, on receive, the atom board does receive interrupts, em0:rx 0 counter in vmstat -i increases. Even more fun, the sysctl dev.em.0.debug shows increasing hw rdh (as I understand, this is hardware 'last received' packet pointer for rx ring). So I looked at the packet descriptor at hw rdt index, and there I see (kgdb) p/x ((struct adapter *)0xffffff80010e4000)->rx_rings->rx_base[78] $11 =3D {buffer_addr =3D 0x12a128800, length =3D 0x5ea, csum =3D 0x3c2b, st= atus =3D 0x0,=20 errors =3D 0x0, special =3D 0x0} Apparently, the Descriptor Done bit is clear, so the em_rxeof() function breaks from the loop, not consuming the current packet. Also, it returns false due to DD bit clear. This prevents em_msix_rx() from scheduling taskqueue for processing. So apparent cause for the hang is missing DD bit in descriptor. I am not sure isn't all this is obvious for anybody who knows em internals, and were to go from there. --HcXGYAkJUr6H84bb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk+HITkACgkQC3+MBN1Mb4h0XgCgrZCPC048gtwFEJIwmpwGFpvQ YxoAoNfy+YfHvHY4CDJOmOmmhI7Ifh7m =yI+Q -----END PGP SIGNATURE----- --HcXGYAkJUr6H84bb--