Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Dec 2006 15:15:45 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Ulrich Spoerlein <uspoerlein@gmail.com>
Cc:        stable@freebsd.org, rockowallaby@bellsouth.net, bug-followup@freebsd.org, Mohan Srinivasan <mohans@freebsd.org>
Subject:   Re: kern/92785: Using exported filesystem on OS/2 NFS client causes filesystem freeze
Message-ID:  <20061215131545.GB23698@deviant.kiev.zoral.com.ua>
In-Reply-To: <7ad7ddd90612150444ifd95974gf429a7f22c177508@mail.gmail.com>
References:  <7ad7ddd90612150444ifd95974gf429a7f22c177508@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--TakKZr9L6Hm6aLOc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 15, 2006 at 01:44:23PM +0100, Ulrich Spoerlein wrote:
> Hi,
>=20
> we too, ran into this problem. OS/2 Clients kill our NFS server. It is
> running a RELENG_6 snapshot from 2006-11-14. rpc.lockd and rpc.statd
> are running. I'll conduct a test without those two services shortly.
>=20
> You can still log in the system with ssh and cruse around, but mountd
> is stuck in ufs state and is no longer serving requests.
>=20
> root@fs2:~# ps axl | grep ufs
>    0 39370     1   0  -4  0  3052  2200 ufs    Ds    ??    0:00.01
> /usr/sbin/mountd -r
>=20
> db> show lockedvnods
> Locked vnodes
>=20
> 0xc87b9414: tag ufs, type VDIR
>    usecount 0, writecount 0, refcount 4 mountedhere 0
>    flags (VV_ROOT)
>    v_object 0xc8c43c60 ref 0 pages 1
>     lock type ufs: EXCL (count 4) by thread 0xc8bac300 (pid 6926)
> with 1 pending#0 0xc0668bf9 at lockmgr+0x4ed
> #1 0xc078572e at ffs_lock+0x76
> #2 0xc0838287 at VOP_LOCK_APV+0x87
> #3 0xc06d663c at vn_lock+0xac
> #4 0xc06ca4ca at vget+0xc2
> #5 0xc06c24a9 at vfs_hash_get+0x8d
> #6 0xc07844af at ffs_vget+0x27
> #7 0xc078b253 at ufs_lookup+0xa4b
> #8 0xc083641b at VOP_CACHEDLOOKUP_APV+0x9b
> #9 0xc06bf499 at vfs_cache_lookup+0xb5
> #10 0xc0836347 at VOP_LOOKUP_APV+0x87
> #11 0xc06c3626 at lookup+0x46e
> #12 0xc0734fba at nfs_namei+0x40e
> #13 0xc0726d81 at nfsrv_lookup+0x1dd
> #14 0xc0736765 at nfssvc_nfsd+0x3d9
> #15 0xc07360b4 at nfssvc+0x18c
> #16 0xc0825a07 at syscall+0x25b
> #17 0xc0811f7f at Xint0x80_syscall+0x1f
>=20
>        ino 2, on dev da1s2e
>=20
>=20
> db> trace 6926
> Tracing pid 6926 tid 100106 td 0xc8bac300
> sched_switch(c8bac300,0,1) at sched_switch+0x177
> mi_switch(1,0) at mi_switch+0x270
> sleepq_switch(c8678200) at sleepq_switch+0xc1
> sleepq_wait_sig(c8678200) at sleepq_wait_sig+0x1d
> msleep(c8678200,c09c9f00,158,c088bec9,0,...) at msleep+0x26a
> nfssvc_nfsd(c8bac300) at nfssvc_nfsd+0xe5
> nfssvc(c8bac300,eafd4d04) at nfssvc+0x18c
> syscall(3b,3b,3b,1,0,...) at syscall+0x25b
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (155, FreeBSD ELF32, nfssvc), eip =3D 0x280bd1b7, esp =3D
> 0xbfbfe90c, ebp =3D 0xbfbfe928 ---
> db> trace 39370
> Tracing pid 39370 tid 100102 td 0xc8bac900
> sched_switch(c8bac900,0,1) at sched_switch+0x177
> mi_switch(1,0) at mi_switch+0x270
> sleepq_switch(c87b946c,c0973440,0,c089798c,211,...) at sleepq_switch+0xc1
> sleepq_wait(c87b946c,0,c87b94dc,b7,c08929b8,...) at sleepq_wait+0x46
> msleep(c87b946c,c0972500,50,c089c1c1,0,...) at msleep+0x279
> acquire(eafe094c,40,60000,c8bac900,0,...) at acquire+0x76
> lockmgr(c87b946c,2002,c87b94dc,c8bac900) at lockmgr+0x44e
> ffs_lock(eafe09a4) at ffs_lock+0x76
> VOP_LOCK_APV(c0943320,eafe09a4) at VOP_LOCK_APV+0x87
> vn_lock(c87b9414,2002,c8bac900,c87b9414) at vn_lock+0xac
> vget(c87b9414,2002,c8bac900) at vget+0xc2
> vfs_hash_get(c86cf2e4,2,2,c8bac900,eafe0abc,0,0) at vfs_hash_get+0x8d
> ffs_vget(c86cf2e4,2,2,eafe0abc) at ffs_vget+0x27
> ufs_root(c86cf2e4,2,eafe0b00,c8bac900,0,...) at ufs_root+0x19
> lookup(eafe0ba0) at lookup+0x743
> namei(eafe0ba0) at namei+0x39a
> kern_lstat(c8bac900,bfbfd2a0,0,eafe0c74) at kern_lstat+0x47
> lstat(c8bac900,eafe0d04) at lstat+0x1b
> syscall(3b,3b,3b,281512fb,bfbfc9f1,...) at syscall+0x25b
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (190, FreeBSD ELF32, lstat), eip =3D 0x2813d427, esp =3D
> 0xbfbfc5ac, ebp =3D 0xbfbfd268 ---
>=20
> I was under the impression, that you are not allowed to sleep while
> holding a lock in the FreeBSD kernel. Doesn't this also apply to the
> lockmgr itself?
>=20
> Upon shutting down the system, I had a panic coming in:
>=20
> panic: userret: Returning with 4 locks held.
> cpuid =3D 1
> KDB: stack backtrace:
> kdb_backtrace(100,c8bac300,c8bac3c8,c8bad218,c8bac300,...) at=20
> kdb_backtrace+0x29
> panic(c089806f,4,0,c8bac300,c8bad218,...) at panic+0x114
> userret(c8bac300,eafd4d38,0,2,0,...) at userret+0x183
> syscall(3b,3b,3b,1,0,...) at syscall+0x321
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280bd1b7, esp =3D
> 0xbfbfe90c, ebp =3D 0xbfbfe928 ---
> KDB: enter: panic
> [thread pid 6926 tid 100106 ]
> Stopped at      kdb_enter+0x2b: nop
> db> bt
> Tracing pid 6926 tid 100106 td 0xc8bac300
> kdb_enter(c0894aec) at kdb_enter+0x2b
> panic(c089806f,4,0,c8bac300,c8bad218,...) at panic+0x127
> userret(c8bac300,eafd4d38,0,2,0,...) at userret+0x183
> syscall(3b,3b,3b,1,0,...) at syscall+0x321
> Xint0x80_syscall() at Xint0x80_syscall+0x1f
> --- syscall (0, FreeBSD ELF32, nosys), eip =3D 0x280bd1b7, esp =3D
> 0xbfbfe90c, ebp =3D 0xbfbfe928 ---
> db> show lockedvnods
> Locked vnodes
>=20
> 0xc8761c3c: tag ufs, type VDIR
>    usecount 1, writecount 0, refcount 1 mountedhere 0xc86cf2e4
>    flags ()
>     lock type ufs: EXCL (count 1) by thread 0xc8bac780 (pid 59934)#0
> 0xc0668bf9 at lockmgr+0x4ed
> #1 0xc078572e at ffs_lock+0x76
> #2 0xc0838287 at VOP_LOCK_APV+0x87
> #3 0xc06d663c at vn_lock+0xac
> #4 0xc06c5eba at dounmount+0x62
> #5 0xc06c5e31 at unmount+0x1e5
> #6 0xc0825a07 at syscall+0x25b
> #7 0xc0811f7f at Xint0x80_syscall+0x1f
>=20
>        ino 8260, on dev ufs/root
>=20
> 0xc87b9414: tag ufs, type VDIR
>    usecount 0, writecount 0, refcount 4 mountedhere 0
>    flags (VV_ROOT)
>    v_object 0xc8c43c60 ref 0 pages 1
>     lock type ufs: EXCL (count 4) by thread 0xc8bac300 (pid 6926)
> with 1 pending#0 0xc0668bf9 at lockmgr+0x4ed
> #1 0xc078572e at ffs_lock+0x76
> #2 0xc0838287 at VOP_LOCK_APV+0x87
> #3 0xc06d663c at vn_lock+0xac
> #4 0xc06ca4ca at vget+0xc2
> #5 0xc06c24a9 at vfs_hash_get+0x8d
> #6 0xc07844af at ffs_vget+0x27
> #7 0xc078b253 at ufs_lookup+0xa4b
> #8 0xc083641b at VOP_CACHEDLOOKUP_APV+0x9b
> #9 0xc06bf499 at vfs_cache_lookup+0xb5
> #10 0xc0836347 at VOP_LOOKUP_APV+0x87
> #11 0xc06c3626 at lookup+0x46e
> #12 0xc0734fba at nfs_namei+0x40e
> #13 0xc0726d81 at nfsrv_lookup+0x1dd
> #14 0xc0736765 at nfssvc_nfsd+0x3d9
> #15 0xc07360b4 at nfssvc+0x18c
> #16 0xc0825a07 at syscall+0x25b
> #17 0xc0811f7f at Xint0x80_syscall+0x1f
>=20
>        ino 2, on dev da1s2e
> db> ps
>  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
> 41369 75806 93096     0  R                           sleep
> 59934 55518 59934     0  S+      vfslock  0xc86cf308 umount
> 75806 62826 93096     0  R                           sh
> 62826     1 93096     0  R                           sh
> 39370     1 39370     0  Ss      ufs      0xc87b946c mountd
> 6926     1  3231     0  R       CPU 1               nfsd
>=20
> So this seems to be umount and mountd tripping over each other.
>=20
> Uli

This looks like lock leak in nfsd. Could you supply the tcpdump of the
session that causes the problem ? Also, it would be very helpful if you cou=
ld
note exact rpc that wedges the server.

--TakKZr9L6Hm6aLOc
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFgqAAC3+MBN1Mb4gRAsr0AKC+zN6OykbDcH2dGHazulrBruq0ewCgwKTu
zsHKtcnJhnFSFJ2S4a989YA=
=QZC9
-----END PGP SIGNATURE-----

--TakKZr9L6Hm6aLOc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061215131545.GB23698>