Date: Thu, 12 Jan 2006 15:19:51 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: dsh@vlink.ru Cc: freebsd-stable@FreeBSD.org Subject: Re: Recurring problem: processes block accessing UFS file system Message-ID: <200601122319.k0CNJp9G046391@gw.catspoiler.org> In-Reply-To: <87wth6byey.fsf@neva.vlink.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11 Jan, Denis Shaposhnikov wrote: > Hi! > >>>>>> "Don" == Don Lewis <truckman@FreeBSD.org> writes: > > Don> Are you using any unusual file systems, such as nullfs or > Don> unionfs? > > >> Yes, I'm use a lots of nullfs. This is a host system for about 20 > >> jails with nullfs mounted ro system: > > Don> That would be my guess as to the cause of the problem. Hopefully > Don> DEBUG_VFS_LOCKS will help pinpoint the bug. > > I've got the problem again. Now I have debug kernel and crash > dump. That is an output from the kdb. Do you need any additional > information which I can get from the crash dump? > 0xc6bed3fc: tag ufs, type VDIR > usecount 3, writecount 0, refcount 4 mountedhere 0 > flags () > v_object 0xc84fce0c ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc69d1d80 (pid 546) with 2 pending#0 0xc04cd7ad at lockmgr+0x3c6 > #1 0xc053d780 at vop_stdlock+0x32 > #2 0xc0655075 at VOP_LOCK_APV+0x3b > #3 0xc05db299 at ffs_lock+0x19 > #4 0xc0655075 at VOP_LOCK_APV+0x3b > #5 0xc0557513 at vn_lock+0x6c > #6 0xc054a320 at vget+0x87 > #7 0xc053a1f0 at cache_lookup+0xde > #8 0xc053ac93 at vfs_cache_lookup+0xad > #9 0xc0652eaf at VOP_LOOKUP_APV+0x3b > #10 0xc053f786 at lookup+0x419 > #11 0xc0540262 at namei+0x2e7 > #12 0xc0558d37 at vn_open_cred+0x1c2 > #13 0xc055918a at vn_open+0x33 > #14 0xc054e6d3 at kern_open+0xd2 > #15 0xc054f055 at open+0x36 > #16 0xc0648a47 at syscall+0x23a > #17 0xc06326ff at Xint0x80_syscall+0x1f > > ino 2072795, on dev ad4s1g > > 0xc6c07bf4: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc940a744 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc9146c00 (pid 33016) with 1 pending#0 0xc04cd7ad at lockmgr+0x3c6 > #1 0xc053d780 at vop_stdlock+0x32 > #2 0xc0655075 at VOP_LOCK_APV+0x3b > #3 0xc05db299 at ffs_lock+0x19 > #4 0xc0655075 at VOP_LOCK_APV+0x3b > #5 0xc0557513 at vn_lock+0x6c > #6 0xc054a320 at vget+0x87 > #7 0xc053a1f0 at cache_lookup+0xde > #8 0xc053ac93 at vfs_cache_lookup+0xad > #9 0xc0652eaf at VOP_LOOKUP_APV+0x3b > #10 0xc053f786 at lookup+0x419 > #11 0xc0540262 at namei+0x2e7 > #12 0xc0554134 at kern_rmdir+0x98 > #13 0xc055436e at rmdir+0x22 > #14 0xc0648a47 at syscall+0x23a > #15 0xc06326ff at Xint0x80_syscall+0x1f > > ino 2072767, on dev ad4s1g > > db> alltrace > Tracing command parser3.cgi pid 33016 tid 100652 td 0xc9146c00 > sched_switch(c9146c00,0,1,5e57bc72,aaef377f) at sched_switch+0xd8 > mi_switch(1,0,0,e93d86c4,c04f425b) at mi_switch+0x150 > sleepq_switch(e93d86f8,c04e4cc6,c6bed454,50,c0672a31) at sleepq_switch+0x115 > sleepq_wait(c6bed454,50,c0672a31,0,c602a080) at sleepq_wait+0xb > msleep(c6bed454,c06b9c98,50,c0672a31,0) at msleep+0x454 > acquire(60000,da1dcb4c,c0504340,da1dcb4c,4c) at acquire+0x7a > lockmgr(c6bed454,2002,c6bed4c4,c9146c00,e93d87e0) at lockmgr+0x4ce > vop_stdlock(e93d8838,120,c06a4f20,e93d8838,e93d87f0) at vop_stdlock+0x32 > VOP_LOCK_APV(c06a55c0,e93d8838,e93d8808,c0655075,e93d8838) at VOP_LOCK_APV+0x3b > ffs_lock(e93d8838,f,2,c6bed3fc,e93d8854) at ffs_lock+0x19 > VOP_LOCK_APV(c06a4f20,e93d8838,c0653a6a,c053acd2,c0652eaf) at VOP_LOCK_APV+0x3b > vn_lock(c6bed3fc,2002,c9146c00,c06326ff,6) at vn_lock+0x6c > vget(c6bed3fc,2002,c9146c00,c0653a6a,c053acd2) at vget+0x87 > vfs_hash_get(c6268c00,1fa0db,2,c9146c00,e93d89bc) at vfs_hash_get+0xf5 > ffs_vget(c6268c00,1fa0db,2,e93d89bc,e93d89c0) at ffs_vget+0x49 > ufs_lookup(e93d8a6c,c0685ac5,c6c07bf4,e93d8c34,e93d8aa8) at ufs_lookup+0x965 > VOP_CACHEDLOOKUP_APV(c06a4f20,e93d8a6c,e93d8c34,c9146c00,c7fac400) at VOP_CACHEDLOOKUP_APV+0x59 > vfs_cache_lookup(e93d8b14,e93d8ac0,c6c07bf4,e93d8c34,e93d8b30) at vfs_cache_lookup+0xec > VOP_LOOKUP_APV(c06a4f20,e93d8b14,c9146c00,c065510a,c054a6f6) at VOP_LOOKUP_APV+0x3b > lookup(e93d8c0c,c954a800,400,e93d8c28,0) at lookup+0x419 > namei(e93d8c0c,ffffffdf,2,0,c6268c00) at namei+0x2e7 > kern_rmdir(c9146c00,824b500,0,e93d8d30,c0648a47) at kern_rmdir+0x98 > rmdir(c9146c00,e93d8d04,4,d,e93d8d38) at rmdir+0x22 > syscall(804003b,e50003b,bfbf003b,8310560,8310560) at syscall+0x23a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (137, FreeBSD ELF32, rmdir), eip = 0xe8201bf, esp = 0xbfbfbcfc, ebp = 0xbfbfbd18 --- > Tracing command nginx pid 546 tid 100122 td 0xc69d1d80 > sched_switch(c69d1d80,0,1,f002cf9e,6c43d35e) at sched_switch+0xd8 > mi_switch(1,0,c0502e44,e8a73670,c04f425b) at mi_switch+0x150 > sleepq_switch(e8a736a4,c04e4cc6,c6c07c4c,50,c0672a31) at sleepq_switch+0x115 > sleepq_wait(c6c07c4c,50,c0672a31,0,c044b82a) at sleepq_wait+0xb > msleep(c6c07c4c,c06ba45c,50,c0672a31,0) at msleep+0x454 > acquire(60000,c0655075,c05db299,c0655075,c0557513) at acquire+0x7a > lockmgr(c6c07c4c,2002,c6c07cbc,c69d1d80,e8a7378c) at lockmgr+0x4ce > vop_stdlock(e8a737e4,c06a4f20,c06a4f20,e8a737e4,e8a7379c) at vop_stdlock+0x32 > VOP_LOCK_APV(c06a55c0,e8a737e4,e8a737b4,c0655075,e8a737e4) at VOP_LOCK_APV+0x3b > ffs_lock(e8a737e4,c0673faf,2,c6c07bf4,e8a73800) at ffs_lock+0x19 > VOP_LOCK_APV(c06a4f20,e8a737e4,e8a73814,c0504340,e8a73814) at VOP_LOCK_APV+0x3b > vn_lock(c6c07bf4,2002,c69d1d80,c05e52a9,c6bed3fc) at vn_lock+0x6c > vget(c6c07bf4,2002,c69d1d80,0,e8a738a8) at vget+0x87 > cache_lookup(c6bed3fc,e8a73b80,e8a73b94,c69d1d80,c6202900) at cache_lookup+0xde > vfs_cache_lookup(e8a73940,e8a738ec,c6bed3fc,e8a73b94,e8a7395c) at vfs_cache_lookup+0xad > VOP_LOOKUP_APV(c06a4f20,e8a73940,e8a7392c,c69d1d80,0) at VOP_LOOKUP_APV+0x3b > lookup(e8a73b6c,c9630000,400,e8a73b88,c057cbf1) at lookup+0x419 > namei(e8a73b6c,c0584a9f,c7ef4900,0,e8a739a8) at namei+0x2e7 > vn_open_cred(e8a73b6c,e8a73c6c,1a4,c6202900,37) at vn_open_cred+0x1c2 > vn_open(e8a73b6c,e8a73c6c,1a4,37,c83b0d48) at vn_open+0x33 > kern_open(c69d1d80,80f0424,0,1,1a4) at kern_open+0xd2 > open(c69d1d80,e8a73d04,c,c602a480,e8a73cc8) at open+0x36 > syscall(80e003b,814003b,bfbf003b,80f0475,8151000) at syscall+0x23a > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (5, FreeBSD ELF32, open), eip = 0x282f9d6b, esp = 0xbfbfea6c, ebp = 0xbfbfeb18 --- Process 33016 is executing rmdir(). While doing the lookup, it is holding a lock on vnode 0xc6c07bf4 and attempting to lock vnode c6bed3fc. Vnode 0xc6c07bf4 should be the parent directory of c6bed3fc. Process 546 is executing open(). While doing the lookup, it is holding a lock on vnode 0xc6bed3fc while attempting to lock vnode c6c07bf4. Vnode 0xc6bed3fc should be the parent directory of c6c07bf4, but this is inconsistent with the previous paragraph. This situation should not be possible. Using kgdb on your saved crash dump, print "fmode" and "*ndp" in the vn_open_cred() stack frame of process 546, and "*nd" in the kern_rmdir() stack frame of process 33016. The path names being looked up may be helpful. Are there any symbolic links in the path names? If so, what are the link contents? Are either of these processes jailed? If so, same or different jails? What are inodes 2072767 and 2072795 on ad4s1g? Are you using snapshots?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200601122319.k0CNJp9G046391>