From owner-freebsd-current@FreeBSD.ORG Tue Nov 18 18:18:53 2008 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 056F61065675 for ; Tue, 18 Nov 2008 18:18:53 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello087206045082.chello.pl [87.206.45.82]) by mx1.freebsd.org (Postfix) with ESMTP id BD2548FC08 for ; Tue, 18 Nov 2008 18:18:40 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id D61FC456AB; Tue, 18 Nov 2008 19:18:33 +0100 (CET) Received: from localhost (unknown [216.239.45.19]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 7754245683; Tue, 18 Nov 2008 19:18:27 +0100 (CET) Date: Tue, 18 Nov 2008 19:18:24 +0100 From: Pawel Jakub Dawidek To: Doug Rabson Message-ID: <20081118181824.GA1634@garage.freebsd.pl> References: <20081117171017.GB1489@garage.freebsd.pl> <4AC8E131-CD12-4075-948F-DA187B4EE2AD@rabson.org> <20081117180253.GA1733@garage.freebsd.pl> <8A43CF07-D06F-4EAF-A171-DF7F10F036F5@rabson.org> <20081117183745.GB1733@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+HP7ph2BbKc20aGI" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 8.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-current@FreeBSD.org Subject: Re: NFS regression. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2008 18:18:53 -0000 --+HP7ph2BbKc20aGI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 18, 2008 at 09:13:26AM +0000, Doug Rabson wrote: >=20 > On 17 Nov 2008, at 18:37, Pawel Jakub Dawidek wrote: >=20 > >On Mon, Nov 17, 2008 at 06:07:52PM +0000, Doug Rabson wrote: > >> > >>On 17 Nov 2008, at 18:02, Pawel Jakub Dawidek wrote: > >> > >>>On Mon, Nov 17, 2008 at 05:54:02PM +0000, Doug Rabson wrote: > >>>> > >>>>On 17 Nov 2008, at 17:10, Pawel Jakub Dawidek wrote: > >>>> > >>>>>Hi. > >>>>> > >>>>>I'm seeing this panic very often now with few days old HEAD: > >>>>> > >>>>> > >>>>>Any ideas? > >>>> > >>>>Can you reproduce this with INVARIANTS turned on? That should =20 > >>>>trigger > >>>>a KASSERT a bit earlier and give me a chance to fix the thing. > >>> > >>>I've INVARIANTS on... Is there some assertion added recently you are > >>>expecting? > >> > >>Hmm. I added an assert in r184921 which ought to have caught this. > >>Could you try this patch and see if it changes anything: > >> > >>Index: rpc/clnt_dg.c > >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>--- rpc/clnt_dg.c (revision 184968) > >>+++ rpc/clnt_dg.c (working copy) > >>@@ -543,7 +543,7 @@ > >> > >> if (tv > 0) { > >> if (cu->cu_closing || cu->cu_closed) > >>- error =3D 0; > >>+ error =3D ESHUTDOWN; > >> else > >> error =3D msleep(cr, &cs->cs_lock, > >> cu->cu_waitflag, cu->cu_waitchan, tv); > >> > > > >Ok, my source is older and doesn't contain the assertion you added. I > >applied the patch above and also added assertion by hand (I'm not =20 > >setup > >now to upgrade entire system). This is the panic I get with the new > >kernel: > > > >... > > > >If you want me to convert some of those to file:line, just let me =20 > >know. >=20 > Don't worry about line numbers - I can see where its calling from. Do =20 > you have a recipe for reproducing this? Also, could you try this patch = =20 > instead of the previous: >=20 > Index: rpc/clnt_dg.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- rpc/clnt_dg.c (revision 184968) > +++ rpc/clnt_dg.c (working copy) [...] With this patch it still panics here: panic: xdrmbuf_create with NULL mbuf chain cpuid =3D 0 KDB: enter: panic [thread pid 8305 tid 100055 ] Stopped at kdb_enter+0x3a: movl $0,kdb_why db> tr Tracing pid 8305 tid 100055 td 0x840f3b40 kdb_enter(80686620,80686620,806a1861,83ac78b4,0,...) at kdb_enter+0x3a panic(806a1861,83ac7988,805c6746,83ac7954,0,...) at panic+0x136 xdrmbuf_create(83ac7954,0,1,2a3,bb9,...) at xdrmbuf_create+0x1f clnt_dg_call(83f9b5c0,83ac7a1c,e,84111900,83ac7a58,...) at clnt_dg_call+0xc= a6 clnt_reconnect_call(83f9b540,83ac7a1c,e,84111900,83ac7a58,...) at clnt_reco= nnect_call+0x5a0 nfs_request(84218d9c,84111900,e,840f3b40,841fbe00,...) at nfs_request+0x1dd nfs_renamerpc(84218d9c,83e23610,15,841fbe00,840f3b40,...) at nfs_renamerpc+= 0x1ab nfs_sillyrename(84c0a430,8,0,0,84218d9c,...) at nfs_sillyrename+0x10a nfs_remove(83ac7c30,83ac7c30,0,83ac7c30,84c0a430,...) at nfs_remove+0x12f VOP_REMOVE_APV(806cfea0,83ac7c30,2,841c429c,7fbfdd34,...) at VOP_REMOVE_APV= +0xa5 kern_unlinkat(840f3b40,ffffff9c,7fbfdd34,0,83ac7c80,...) at kern_unlinkat+0= x187 kern_unlink(840f3b40,7fbfdd34,0,83ac7d2c,8065a4c3,...) at kern_unlink+0x27 unlink(840f3b40,83ac7cf8,4,840f3b40,806bab90,...) at unlink+0x22 syscall(83ac7d38) at syscall+0x283 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (10, FreeBSD ELF32, unlink), eip =3D 0x807d5d3, esp =3D 0x7fbfd= c7c, ebp =3D 0x7fbfdcf8 --- I can reproduce it easly. I've a netbooted system where I start 'make -ssj4 buildworld', but both src/ and obj/ directories are on local ZFS file system. So only all the system tools and libraries are on NFS. I'm using UDP for NFS, BTW. Sorry for not mentioning it earlier: /boot/loader.conf: boot.nfsroot.options=3D"nolockd,udp" /etc/fstab: # Device Mountpoint FStype Options = Dump Pass# 192.168.5.1:/zoo/camel / nfs rw,noatime,nolockd,mntudp,i= ntr,-3 0 0 192.168.5.1:/zoo/pjd /zoo/pjd nfs rw,noatime,nolockd,mntudp,i= ntr,-3 0 0 If you won't be able to reproduce that, I can give you access to this machine, it sits in the netperf cluster. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+HP7ph2BbKc20aGI Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFJIwbwForvXbEpPzQRAgAyAKDzGjYxwQnVJ39oo2KB9EAtzBI7lwCgmCBn DqdYH7Xr2sV8RIA+G7aoNIg= =EbUD -----END PGP SIGNATURE----- --+HP7ph2BbKc20aGI--