Date: Thu, 5 Jan 2006 10:45:16 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: dsh@vlink.ru Cc: gcr+freebsd-stable@tharned.org, freebsd@McKusick.COM, freebsd-stable@FreeBSD.org, kris@obsecurity.org Subject: Re: Recurring problem: processes block accessing UFS file system Message-ID: <200601051845.k05IjGCI015543@gw.catspoiler.org> In-Reply-To: <8764oyinh7.fsf@neva.vlink.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5 Jan, Denis Shaposhnikov wrote: > Hi! > >>>>>> "Greg" == Greg Rivers <gcr+freebsd-stable@tharned.org> writes: > > Greg> It's taken more than a month, but the problem has recurred > Greg> without snapshots ever having been run. I've got a good trace > > I think that I have the same problem on a fresh CURRENT. For some > processes I see MWCHAN = ufs and "D" in the STAT. And I can't kill > such processes even with -9. And system can't kill them too on > shutdown. So, system can't do shutdown and wait forever after "All > buffers synced" message. At this moment I've entered to KDB do "show > lockedvnods": > > Locked vnodes > > 0xc687cb58: tag ufs, type VDIR > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > v_object 0xcb5b1934 ref 0 pages 0 > lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending > ino 2072602, on dev ad4s1g > > 0xc687ca50: tag ufs, type VDIR > usecount 31, writecount 0, refcount 32 mountedhere 0 > flags () > v_object 0xc85d2744 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending > ino 2072603, on dev ad4s1g > > 0xc687c948: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc875d000 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending > ino 2072615, on dev ad4s1g > > 0xc691f420: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc8a773e0 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending > ino 2072680, on dev ad4s1g > > 0xc691f318: tag ufs, type VDIR > usecount 3, writecount 0, refcount 4 mountedhere 0 > flags () > v_object 0xc8a7b2e8 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc7019780 (pid 74103) with 2 pending > ino 2072795, on dev ad4s1g > > 0xc69bb528: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc7890744 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc91f4600 (pid 74129) with 1 pending > ino 2072767, on dev ad4s1g > Locked vnodes > > 0xc687cb58: tag ufs, type VDIR > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > v_object 0xcb5b1934 ref 0 pages 0 > lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending > ino 2072602, on dev ad4s1g > > 0xc687ca50: tag ufs, type VDIR > usecount 31, writecount 0, refcount 32 mountedhere 0 > flags () > v_object 0xc85d2744 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending > ino 2072603, on dev ad4s1g > > 0xc687c948: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc875d000 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending > ino 2072615, on dev ad4s1g > > 0xc691f420: tag ufs, type VDIR > usecount 2, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc8a773e0 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending > ino 2072680, on dev ad4s1g > > 0xc691f318: tag ufs, type VDIR > usecount 3, writecount 0, refcount 4 mountedhere 0 > flags () > v_object 0xc8a7b2e8 ref 0 pages 1 > lock type ufs: EXCL (count 1) by t(kgdb) > > After that I've done "call doadump" and got vmcore. > > ps show me: > > (kgdb) ps > During symbol reading, Incomplete CFI data; unspecified registers at 0xc04d97eb. > pid proc uid ppid pgrp flag stat comm wchan > 74686 c9464000 0 1 1 000000 1 sh ufs c687caa8 > 74195 c970d000 0 3074 74195 4000100 1 sshd ufs c687caa8 > 74178 c7682adc 0 3074 74178 004000 1 sshd ufs c687c9a0 > 74129 c9b82adc 1008 1 5504 004000 1 parser3.cgi ufs c691f370 > 74103 c70b5458 1008 1 5504 000000 1 httpd ufs c69bb580 > 65610 c92c0458 1005 1 65610 004000 1 sftp-server ufs c691f478 > 5518 c6247458 1008 1 5516 004002 1 perl5.8.7 ufs c687caa8 > 3081 c7523d08 0 1 3081 000000 1 cron ufs c687caa8 > 3074 c7682d08 0 1 3074 000100 1 sshd ufs c687caa8 > 3016 c7523adc 0 1 3016 000000 1 syslogd ufs c687caa8 > 519 c68e4d08 80 1 518 000100 1 nginx ufs c691f370 > 34 c6260000 0 0 0 000204 1 schedcpu - e88b3cf0 > 33 c62438b0 0 0 0 000204 1 syncer ktsusp c6243938 > 32 c6243adc 0 0 0 000204 1 vnlru ktsusp c6243b64 > 31 c6243d08 0 0 0 000204 1 bufdaemon ktsusp c6243d90 > 30 c6244000 0 0 0 00020c 1 pagezero pgzero c06c21a0 > 29 c624422c 0 0 0 000204 1 vmdaemon psleep c06c1d08 > 28 c6244458 0 0 0 000204 1 pagedaemon psleep c06c1cc8 > 27 c602e684 0 0 0 000204 1 irq1: atkbd0 > 26 c602e8b0 0 0 0 000204 1 swi0: sio > 25 c602eadc 0 0 0 000204 1 irq18: atapci1 > 24 c602ed08 0 0 0 000204 1 irq15: ata1 > 23 c6074000 0 0 0 000204 1 irq14: ata0 > 22 c607422c 0 0 0 000204 1 irq27: em1 > 21 c6074458 0 0 0 000204 1 irq26: em0 > 20 c6074684 0 0 0 000204 1 irq9: acpi0 > 19 c60748b0 0 0 0 000204 1 swi2: cambio > 18 c6074adc 0 0 0 000204 1 swi6: task queue > 9 c5fd822c 0 0 0 000204 1 acpi_task2 - c6061e40 > 8 c5fd8458 0 0 0 000204 1 acpi_task1 - c6061e40 > 7 c5fd8684 0 0 0 000204 1 acpi_task0 - c6061e40 > 17 c5fd88b0 0 0 0 000204 1 swi6: Giant taskq > 6 c5fd8adc 0 0 0 000204 1 thread taskq - c6062040 > 16 c5fd8d08 0 0 0 000204 1 swi5: Fast taskq > 5 c602e000 0 0 0 000204 1 kqueue taskq - c6062100 > 15 c602e22c 0 0 0 000204 1 yarrow - c06b1720 > 4 c602e458 0 0 0 000204 1 g_down - c06b1fc0 > 3 c5fd3000 0 0 0 000204 1 g_up - c06b1fbc > 2 c5fd322c 0 0 0 000204 1 g_event - c06b1fb4 > 14 c5fd3458 0 0 0 000204 1 swi3: vm > 13 c5fd3684 0 0 0 00020c 1 swi4: clock sio > 12 c5fd38b0 0 0 0 000204 1 swi1: net > 11 c5fd3adc 0 0 0 00020c 1 idle: cpu0 > 10 c5fd3d08 0 0 0 00020c 1 idle: cpu1 > 1 c5fd8000 0 0 1 004200 1 init ufs c687cbb0 > 0 c06b20c0 0 0 0 000200 1 swapper > There is no member named p_pptr. This is different because only vnode locks appear to be involved. One thread holding a vnode lock is getting stuck, and then all the other threads pile up behind it. I traced the list of locked vnodes and get stuck here: > 0xc691f318: tag ufs, type VDIR > usecount 3, writecount 0, refcount 4 mountedhere 0 > flags () > v_object 0xc8a7b2e8 ref 0 pages 1 > lock type ufs: EXCL (count 1) by t(kgdb) pid 519 wants to lock this vnode but some other thread is holding the vnode lock. Unfortunately we don't know who the lock holder is because the message is truncated. This might just be a vnode lock leak. Build a debug kernel with the DEBUG_VFS_LOCKS and DEBUG_LOCKS options and see if anything shows up.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200601051845.k05IjGCI015543>