Date: Tue, 18 Nov 2008 19:18:24 +0100 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Doug Rabson <dfr@rabson.org> Cc: freebsd-current@FreeBSD.org Subject: Re: NFS regression. Message-ID: <20081118181824.GA1634@garage.freebsd.pl> In-Reply-To: <ABFB69E0-FC9C-44B0-BD07-3FB7AF7AC927@rabson.org> References: <20081117171017.GB1489@garage.freebsd.pl> <4AC8E131-CD12-4075-948F-DA187B4EE2AD@rabson.org> <20081117180253.GA1733@garage.freebsd.pl> <8A43CF07-D06F-4EAF-A171-DF7F10F036F5@rabson.org> <20081117183745.GB1733@garage.freebsd.pl> <ABFB69E0-FC9C-44B0-BD07-3FB7AF7AC927@rabson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--+HP7ph2BbKc20aGI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Nov 18, 2008 at 09:13:26AM +0000, Doug Rabson wrote: >=20 > On 17 Nov 2008, at 18:37, Pawel Jakub Dawidek wrote: >=20 > >On Mon, Nov 17, 2008 at 06:07:52PM +0000, Doug Rabson wrote: > >> > >>On 17 Nov 2008, at 18:02, Pawel Jakub Dawidek wrote: > >> > >>>On Mon, Nov 17, 2008 at 05:54:02PM +0000, Doug Rabson wrote: > >>>> > >>>>On 17 Nov 2008, at 17:10, Pawel Jakub Dawidek wrote: > >>>> > >>>>>Hi. > >>>>> > >>>>>I'm seeing this panic very often now with few days old HEAD: > >>>>> > >>>>> > >>>>>Any ideas? > >>>> > >>>>Can you reproduce this with INVARIANTS turned on? That should =20 > >>>>trigger > >>>>a KASSERT a bit earlier and give me a chance to fix the thing. > >>> > >>>I've INVARIANTS on... Is there some assertion added recently you are > >>>expecting? > >> > >>Hmm. I added an assert in r184921 which ought to have caught this. > >>Could you try this patch and see if it changes anything: > >> > >>Index: rpc/clnt_dg.c > >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>--- rpc/clnt_dg.c (revision 184968) > >>+++ rpc/clnt_dg.c (working copy) > >>@@ -543,7 +543,7 @@ > >> > >> if (tv > 0) { > >> if (cu->cu_closing || cu->cu_closed) > >>- error =3D 0; > >>+ error =3D ESHUTDOWN; > >> else > >> error =3D msleep(cr, &cs->cs_lock, > >> cu->cu_waitflag, cu->cu_waitchan, tv); > >> > > > >Ok, my source is older and doesn't contain the assertion you added. I > >applied the patch above and also added assertion by hand (I'm not =20 > >setup > >now to upgrade entire system). This is the panic I get with the new > >kernel: > > > >... > > > >If you want me to convert some of those to file:line, just let me =20 > >know. >=20 > Don't worry about line numbers - I can see where its calling from. Do =20 > you have a recipe for reproducing this? Also, could you try this patch = =20 > instead of the previous: >=20 > Index: rpc/clnt_dg.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- rpc/clnt_dg.c (revision 184968) > +++ rpc/clnt_dg.c (working copy) [...] With this patch it still panics here: panic: xdrmbuf_create with NULL mbuf chain cpuid =3D 0 KDB: enter: panic [thread pid 8305 tid 100055 ] Stopped at kdb_enter+0x3a: movl $0,kdb_why db> tr Tracing pid 8305 tid 100055 td 0x840f3b40 kdb_enter(80686620,80686620,806a1861,83ac78b4,0,...) at kdb_enter+0x3a panic(806a1861,83ac7988,805c6746,83ac7954,0,...) at panic+0x136 xdrmbuf_create(83ac7954,0,1,2a3,bb9,...) at xdrmbuf_create+0x1f clnt_dg_call(83f9b5c0,83ac7a1c,e,84111900,83ac7a58,...) at clnt_dg_call+0xc= a6 clnt_reconnect_call(83f9b540,83ac7a1c,e,84111900,83ac7a58,...) at clnt_reco= nnect_call+0x5a0 nfs_request(84218d9c,84111900,e,840f3b40,841fbe00,...) at nfs_request+0x1dd nfs_renamerpc(84218d9c,83e23610,15,841fbe00,840f3b40,...) at nfs_renamerpc+= 0x1ab nfs_sillyrename(84c0a430,8,0,0,84218d9c,...) at nfs_sillyrename+0x10a nfs_remove(83ac7c30,83ac7c30,0,83ac7c30,84c0a430,...) at nfs_remove+0x12f VOP_REMOVE_APV(806cfea0,83ac7c30,2,841c429c,7fbfdd34,...) at VOP_REMOVE_APV= +0xa5 kern_unlinkat(840f3b40,ffffff9c,7fbfdd34,0,83ac7c80,...) at kern_unlinkat+0= x187 kern_unlink(840f3b40,7fbfdd34,0,83ac7d2c,8065a4c3,...) at kern_unlink+0x27 unlink(840f3b40,83ac7cf8,4,840f3b40,806bab90,...) at unlink+0x22 syscall(83ac7d38) at syscall+0x283 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (10, FreeBSD ELF32, unlink), eip =3D 0x807d5d3, esp =3D 0x7fbfd= c7c, ebp =3D 0x7fbfdcf8 --- I can reproduce it easly. I've a netbooted system where I start 'make -ssj4 buildworld', but both src/ and obj/ directories are on local ZFS file system. So only all the system tools and libraries are on NFS. I'm using UDP for NFS, BTW. Sorry for not mentioning it earlier: /boot/loader.conf: boot.nfsroot.options=3D"nolockd,udp" /etc/fstab: # Device Mountpoint FStype Options = Dump Pass# 192.168.5.1:/zoo/camel / nfs rw,noatime,nolockd,mntudp,i= ntr,-3 0 0 192.168.5.1:/zoo/pjd /zoo/pjd nfs rw,noatime,nolockd,mntudp,i= ntr,-3 0 0 If you won't be able to reproduce that, I can give you access to this machine, it sits in the netperf cluster. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --+HP7ph2BbKc20aGI Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFJIwbwForvXbEpPzQRAgAyAKDzGjYxwQnVJ39oo2KB9EAtzBI7lwCgmCBn DqdYH7Xr2sV8RIA+G7aoNIg= =EbUD -----END PGP SIGNATURE----- --+HP7ph2BbKc20aGI--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081118181824.GA1634>