Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Nov 2008 19:18:24 +0100
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Doug Rabson <dfr@rabson.org>
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: NFS regression.
Message-ID:  <20081118181824.GA1634@garage.freebsd.pl>
In-Reply-To: <ABFB69E0-FC9C-44B0-BD07-3FB7AF7AC927@rabson.org>
References:  <20081117171017.GB1489@garage.freebsd.pl> <4AC8E131-CD12-4075-948F-DA187B4EE2AD@rabson.org> <20081117180253.GA1733@garage.freebsd.pl> <8A43CF07-D06F-4EAF-A171-DF7F10F036F5@rabson.org> <20081117183745.GB1733@garage.freebsd.pl> <ABFB69E0-FC9C-44B0-BD07-3FB7AF7AC927@rabson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--+HP7ph2BbKc20aGI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 18, 2008 at 09:13:26AM +0000, Doug Rabson wrote:
>=20
> On 17 Nov 2008, at 18:37, Pawel Jakub Dawidek wrote:
>=20
> >On Mon, Nov 17, 2008 at 06:07:52PM +0000, Doug Rabson wrote:
> >>
> >>On 17 Nov 2008, at 18:02, Pawel Jakub Dawidek wrote:
> >>
> >>>On Mon, Nov 17, 2008 at 05:54:02PM +0000, Doug Rabson wrote:
> >>>>
> >>>>On 17 Nov 2008, at 17:10, Pawel Jakub Dawidek wrote:
> >>>>
> >>>>>Hi.
> >>>>>
> >>>>>I'm seeing this panic very often now with few days old HEAD:
> >>>>>
> >>>>>
> >>>>>Any ideas?
> >>>>
> >>>>Can you reproduce this with INVARIANTS turned on? That should =20
> >>>>trigger
> >>>>a KASSERT a bit earlier and give me a chance to fix the thing.
> >>>
> >>>I've INVARIANTS on... Is there some assertion added recently you are
> >>>expecting?
> >>
> >>Hmm. I added an assert in r184921 which ought to have caught this.
> >>Could you try this patch and see if it changes anything:
> >>
> >>Index: rpc/clnt_dg.c
> >>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> >>--- rpc/clnt_dg.c	(revision 184968)
> >>+++ rpc/clnt_dg.c	(working copy)
> >>@@ -543,7 +543,7 @@
> >>
> >>		if (tv > 0) {
> >>			if (cu->cu_closing || cu->cu_closed)
> >>-				error =3D 0;
> >>+				error =3D ESHUTDOWN;
> >>			else
> >>				error =3D msleep(cr, &cs->cs_lock,
> >>				    cu->cu_waitflag, cu->cu_waitchan, tv);
> >>
> >
> >Ok, my source is older and doesn't contain the assertion you added. I
> >applied the patch above and also added assertion by hand (I'm not =20
> >setup
> >now to upgrade entire system). This is the panic I get with the new
> >kernel:
> >
> >...
> >
> >If you want me to convert some of those to file:line, just let me =20
> >know.
>=20
> Don't worry about line numbers - I can see where its calling from. Do =20
> you have a recipe for reproducing this? Also, could you try this patch =
=20
> instead of the previous:
>=20
> Index: rpc/clnt_dg.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- rpc/clnt_dg.c	(revision 184968)
> +++ rpc/clnt_dg.c	(working copy)
[...]

With this patch it still panics here:

panic: xdrmbuf_create with NULL mbuf chain
cpuid =3D 0
KDB: enter: panic
[thread pid 8305 tid 100055 ]
Stopped at      kdb_enter+0x3a: movl    $0,kdb_why
db> tr
Tracing pid 8305 tid 100055 td 0x840f3b40
kdb_enter(80686620,80686620,806a1861,83ac78b4,0,...) at kdb_enter+0x3a
panic(806a1861,83ac7988,805c6746,83ac7954,0,...) at panic+0x136
xdrmbuf_create(83ac7954,0,1,2a3,bb9,...) at xdrmbuf_create+0x1f
clnt_dg_call(83f9b5c0,83ac7a1c,e,84111900,83ac7a58,...) at clnt_dg_call+0xc=
a6
clnt_reconnect_call(83f9b540,83ac7a1c,e,84111900,83ac7a58,...) at clnt_reco=
nnect_call+0x5a0
nfs_request(84218d9c,84111900,e,840f3b40,841fbe00,...) at nfs_request+0x1dd
nfs_renamerpc(84218d9c,83e23610,15,841fbe00,840f3b40,...) at nfs_renamerpc+=
0x1ab
nfs_sillyrename(84c0a430,8,0,0,84218d9c,...) at nfs_sillyrename+0x10a
nfs_remove(83ac7c30,83ac7c30,0,83ac7c30,84c0a430,...) at nfs_remove+0x12f
VOP_REMOVE_APV(806cfea0,83ac7c30,2,841c429c,7fbfdd34,...) at VOP_REMOVE_APV=
+0xa5
kern_unlinkat(840f3b40,ffffff9c,7fbfdd34,0,83ac7c80,...) at kern_unlinkat+0=
x187
kern_unlink(840f3b40,7fbfdd34,0,83ac7d2c,8065a4c3,...) at kern_unlink+0x27
unlink(840f3b40,83ac7cf8,4,840f3b40,806bab90,...) at unlink+0x22
syscall(83ac7d38) at syscall+0x283
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (10, FreeBSD ELF32, unlink), eip =3D 0x807d5d3, esp =3D 0x7fbfd=
c7c, ebp =3D 0x7fbfdcf8 ---

I can reproduce it easly. I've a netbooted system where I start
'make -ssj4 buildworld', but both src/ and obj/ directories are on local
ZFS file system. So only all the system tools and libraries are on NFS.
I'm using UDP for NFS, BTW. Sorry for not mentioning it earlier:

/boot/loader.conf:

boot.nfsroot.options=3D"nolockd,udp"

/etc/fstab:

# Device                Mountpoint      FStype  Options                    =
             Dump    Pass#
192.168.5.1:/zoo/camel  /               nfs     rw,noatime,nolockd,mntudp,i=
ntr,-3       0       0
192.168.5.1:/zoo/pjd    /zoo/pjd        nfs     rw,noatime,nolockd,mntudp,i=
ntr,-3       0       0

If you won't be able to reproduce that, I can give you access to this
machine, it sits in the netperf cluster.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--+HP7ph2BbKc20aGI
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFJIwbwForvXbEpPzQRAgAyAKDzGjYxwQnVJ39oo2KB9EAtzBI7lwCgmCBn
DqdYH7Xr2sV8RIA+G7aoNIg=
=EbUD
-----END PGP SIGNATURE-----

--+HP7ph2BbKc20aGI--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081118181824.GA1634>