Date: Wed, 28 Jan 1998 09:32:31 -0800 (PST) From: Matt Dillon <dillon@best.net> To: FreeBSD-gnats-submit@FreeBSD.ORG Subject: kern/5592: Kernel crash due to ufslk2/ffs_vget deadlock Message-ID: <199801281732.JAA14306@flea.best.net>
index | next in thread | raw e-mail
>Number: 5592
>Category: kern
>Synopsis: ffs_inode_hash_lock can get permanently locked, causing the filesystem to lockup
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Wed Jan 28 09:40:02 PST 1998
>Last-Modified:
>Originator: Matt Dillon
>Organization:
Best Internet Communications
>Release: FreeBSD 2.2.5-STABLE i386
>Environment:
PPro 200's running medium and heavily loaded shell environments.
Lots of ram, moderate paging.
>Description:
I tracked down a crash of one of our shell machines. The crash
occured in the socket code, but was due to processes getting stuck
in ufslk2 (inetd then forking on new connections and running the
system out of network bufs).
Tracking the bug down, I found the following situation:
* most processes stuck in ufslk2
* the ufslk2 chain terminated with a process that had the vnode locked
but was suck in ffs_vget()
* the process was stuck in ffs_vget() attempting to get
ffs_inode_hash_lock and being unable to.
* I found a second process which HAD ffs_inode_hash_lock but which was
stuck as follows:
(kgdb) #0 mi_switch () at ../../kern/kern_synch.c:635
#1 0xf0114eda in tsleep (ident=0xf26e2b00, priority=0x8,
wmesg=0xf01a1071 "ufslk2", timo=0x0) at ../../kern/kern_synch.c:398
#2 0xf01a10a1 in ufs_lock (ap=0xefbffc90) at ../../ufs/ufs/ufs_vnops.c:1707
#3 0xf0132a27 in vclean (vp=0xf24fd600, flags=0x8) at vnode_if.h:731
#4 0xf0132c3b in vgone (vp=0xf24fd600) at ../../kern/vfs_subr.c:1167
#5 0xf0131e52 in getnewvnode (tag=VT_UFS, mp=0xf21d3a00, vops=0xf2196800,
vpp=0xefbffd2c) at ../../kern/vfs_subr.c:380
#6 0xf019a25c in ffs_vget (mp=0xf21d3a00, ino=0x67205, vpp=0xefbffda8)
at ../../ufs/ffs/ffs_vfsops.c:896
#7 0xf019d034 in ufs_lookup (ap=0xefbffe18) at ../../ufs/ufs/ufs_lookup.c:561
#8 0xf0131339 in lookup (ndp=0xefbffeac) at vnode_if.h:31
#9 0xf0130e7b in namei (ndp=0xefbffeac) at ../../kern/vfs_lookup.c:156
#10 0xf0135050 in lstat (p=0xf2764800, uap=0xefbfff94, retval=0xefbfff84)
at ../../kern/vfs_syscalls.c:1324
#11 0xf01bf437 in syscall (frame={tf_es = 0x27, tf_ds = 0x27,
tf_edi = 0xffffffff, tf_esi = 0x35a00, tf_ebp = 0xefbfd758,
tf_isp = 0xefbfffe4, tf_ebx = 0x35a50, tf_edx = 0x33000,
tf_ecx = 0x35a40, tf_eax = 0xbe, tf_trapno = 0x7, tf_err = 0x7,
tf_eip = 0x18a85, tf_cs = 0x1f, tf_eflags = 0x246, tf_esp = 0xefbfd6e0,
tf_ss = 0x27}) at ../../i386/i386/trap.c:914
#12 0x18a85 in ?? ()
#13 0x7a35 in ?? ()
#14 0x7742 in ?? ()
#15 0x1e99 in ?? ()
#16 0x1d11 in ?? ()
#17 0x107e in ?? ()
>How-To-Repeat:
>Fix:
I submit that calling vgone() on what is essentially a random
vnode within getnewvnode() can lead to deadlock situations in the
filesystem, especially when called from other critical filesystem
routines that hold critical global locks.
The correct solution, I believe, is to NOT have getnewvnode() attempt
to vgone/vclean the vnode it wishes to allocate if said vnode's inode
is locked at the time.
>Audit-Trail:
>Unformatted:
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801281732.JAA14306>
