From owner-freebsd-stable@FreeBSD.ORG Fri Jun 30 09:28:58 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D98C616A412; Fri, 30 Jun 2006 09:28:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (ll-227.216.82.212.sovam.net.ua [212.82.216.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 11D7F43D6B; Fri, 30 Jun 2006 09:28:42 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k5U9SbZZ025715 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Jun 2006 12:28:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k5U9SbU4058634; Fri, 30 Jun 2006 12:28:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k5U9SToY058633; Fri, 30 Jun 2006 12:28:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 30 Jun 2006 12:28:29 +0300 From: Kostik Belousov To: Mike Jakubik Message-ID: <20060630092829.GE1258@deviant.kiev.zoral.com.ua> References: <20060523181638.GC767@dimma.mow.oilspace.com> <6eb82e0605231120q37224c6r3b25982f556bed72@mail.gmail.com> <447366AD.30203@rogers.com> <44736E11.6060104@mkproductions.org> <20060523203521.GA48061@xor.obsecurity.org> <20060524062118.GA766@dimma.mow.oilspace.com> <447400BB.9060603@samsco.org> <4485C010.9040402@rogers.com> <20060606182234.GB72368@deviant.kiev.zoral.com.ua> <44A490E6.1000502@rogers.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6Vw0j8UKbyX0bfpA" Content-Disposition: inline In-Reply-To: <44A490E6.1000502@rogers.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, Dmitriy Kirhlarov Subject: md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 09:28:59 -0000 --6Vw0j8UKbyX0bfpA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 29, 2006 at 10:48:06PM -0400, Mike Jakubik wrote: > Konstantin Belousov wrote: > >On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote: > > =20 > >>Scott Long wrote: > >> =20 > >>>Dmitriy Kirhlarov wrote: > >>> > >>> =20 > >>>>Hi! > >>>> > >>>>On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote: > >>>> > >>>> > >>>> =20 > >>>>>>>>6.1-STABLE after 6.1-RELEASE is releases. So I think you may want > >>>>>>>> =20 > >>>>>If you use snapshots with your quotas, update to 6.1-STABLE. If you > >>>>> =20 > >>>>Sorry, guys. You are mean RELENG_6_1 or RELENG_6? > >>>> > >>>>WBR > >>>> =20 > >>>RELENG_6. However, the changes will likely make their way into=20 > >>>RELENG_6_1 in a few weeks as part of an errata update. > >>> > >>>Scott > >>> =20 > >>I have just done tests on 6.1-R and RELENG_6 as of yesterday evening.= =20 > >>Unfortunately both still lock up hard, no crash, just a frozen system. = I=20 > >>cant enter the KDB (ddb) via the console, but its unusable, as it wont= =20 > >>let me type in anything. There must be some other change in -CURRENT=20 > >>that fixes this, as -CURRENT did not freeze during my previous tests. > >> > >> > >>Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system: > >> > >>/usr/src/sys/ufs/ufs/ufs_quota.c: > >> $FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14=20 > >>00:23:27 tegge Exp $ > >> =20 > >The hangs are mostly related to snapshots. It would be better to > >update to the latest RELENG_6. > > > >Hangs on RELENG_6_1 is not so much interesting. For > >hanged RELENG_6 system, please do what described below and post > >the log of the ddb session. > > > >I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes). > >If you have it in your kernel, add the line > >hint.kbdmux.0.disabled=3D"1" > >into the /boot/device.hints to make ddb usable. > > > >After that, on the hang, enter ddb, and > >do ps and tr for all suspected processes. > >Better yet, add the following options to your kernel: > > > >options INVARIANTS > >options INVARIANT_SUPPORT > >options WITNESS > >options DEBUG_LOCKS > >options DEBUG_VFS_LOCKS > >options DIAGNOSTIC > > > >and, after hang, do in ddb > > > >show allpcpu > >show alllocks > >show lockedvnods > >ps > > > >For each process mentioned in show output, do where > >(for threaded processes, do thread ; where). > > > >BTW, it would be great to add this instructions to the FAQ. > > =20 >=20 > Well, i finally got around to setting up a serial console on this box,=20 > the following is the output from the debugger after the system stopped=20 > responding. Let me know if you need any more/different information, i=20 > also made the kernel changes you recommended. >=20 > FreeBSD 6.1-STABLE #1: Thu Jun 10 00:22:29 EDT 2006 >=20 > --- > KDB: enter: Line break on console > [thread pid 12 tid 100004 ] > Stopped at kdb_enter+0x30: leave =20 > db> ps > pid proc uid ppid pgrp flag stat wmesg wchan cmd > 552 c3622830 2 550 549 0004000 [SLPQ flswai 0xc0707c24][SLP] rm > 550 c3570830 2 549 549 0004000 [SLPQ wait 0xc3570830][SLP] sh > 549 c342ec48 2 548 549 0004000 [SLPQ wait 0xc342ec48][SLP] sh > 548 c3622624 0 422 422 0000000 [SLPQ piperd 0xc36027f8][SLP] cron > 547 c361f830 0 524 547 0004002 [SLPQ ufs 0xc3777c94][SLP] ls > 546 c36bc418 0 544 544 0004002 [SLPQ wdrain 0xc0707be4][SLP]=20 > fsck_4.2bsd > 544 c36bcc48 0 511 544 0004002 [SLPQ wait 0xc36bcc48][SLP] fsck > 524 c35e020c 0 522 524 0004002 [SLPQ wait 0xc35e020c][SLP] bash > 522 c3570c48 0 406 522 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd > 515 c36bc20c 0 0 0 0000204 [SLPQ wdrain 0xc0707be4][SLP] md0 > 511 c36bb624 0 500 511 0004002 [SLPQ wait 0xc36bb624][SLP] bash > 509 c3570418 65 1 509 0000100 [SLPQ select 0xc0707644][SLP]=20 > dhclient > 500 c361fa3c 0 406 500 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd > 480 c342ea3c 0 1 256 0000000 [SLPQ select 0xc0707644][SLP]=20 > dhclient > 465 c361f624 0 1 465 0004002 [SLPQ ttyin 0xc342b010][SLP] getty > 464 c35e0c48 0 1 464 0004002 [SLPQ ttyin 0xc3429410][SLP] getty > 463 c356fa3c 0 1 463 0004002 [SLPQ ttyin 0xc3429810][SLP] getty > 462 c356f418 0 1 462 0004002 [SLPQ ttyin 0xc343f010][SLP] getty > 422 c342e624 0 1 422 0000000 [SLPQ nanslp 0xc06ba32c][SLP] cron > 416 c356f000 25 1 416 0000100 [SLPQ pause 0xc356f034][SLP]=20 > sendmail > 412 c356f624 0 1 412 0000100 [SLPQ select 0xc0707644][SLP]=20 > sendmail > 406 c35e0000 0 1 406 0000100 [SLPQ select 0xc0707644][SLP] sshd > 290 c361f20c 0 1 290 0000000 [SLPQ flswai 0xc0707c24][SLP]=20 > syslogd > 256 c3622418 0 1 256 0000000 [SLPQ select 0xc0707644][SLP] devd > 145 c356f830 0 1 145 0000000 [SLPQ pause 0xc356f864][SLP]=20 > adjkerntz > 38 c3378c48 0 0 0 0000204 [SLPQ - 0xd56f5cf8][SLP] schedcpu > 37 c342d000 0 0 0 0000204 [SLPQ sdflush 0xc070a3b4][SLP]=20 > softdepflush > 36 c342d20c 0 0 0 0000204 [SLPQ vlruwt 0xc342d20c][SLP] vnlru > 35 c342d418 0 0 0 0000204 [SLPQ ufs 0xc363c46c][SLP] syncer > 34 c342d624 0 0 0 0000204 [SLPQ wdrain 0xc0707be4][SLP]=20 > bufdaemon > 33 c342d830 0 0 0 000020c [SLPQ pgzero 0xc070b324][SLP]=20 > pagezero > 32 c342da3c 0 0 0 0000204 [SLPQ psleep 0xc070ae74][SLP]=20 > vmdaemon > 31 c342dc48 0 0 0 0000204 [SLPQ psleep 0xc070ae30][SLP]=20 > pagedaemon > 30 c342e000 0 0 0 0000204 [IWAIT] irq7: ppc0 > 29 c342e20c 0 0 0 0000204 [IWAIT] swi0: sio > 28 c342e418 0 0 0 0000204 [IWAIT] irq1: atkbd0 > 27 c3319624 0 0 0 0000204 [SLPQ - 0xc32c943c][SLP] fdc0 > 26 c3319830 0 0 0 0000204 [IWAIT] irq16: fxp0 > 25 c3319a3c 0 0 0 0000204 [SLPQ aifthd 0xc3319a3c][SLP]=20 > aac0aif > 24 c3319c48 0 0 0 0000204 [SLPQ idle 0xc32c8400][SLP]=20 > aic_recovery0 > 23 c3378000 0 0 0 0000204 [IWAIT] irq30: ahc0 > 22 c337820c 0 0 0 0000204 [SLPQ idle 0xc32c8400][SLP]=20 > aic_recovery0 > 21 c3378418 0 0 0 0000204 [IWAIT] irq9: acpi0 > 9 c3378624 0 0 0 0000204 [SLPQ - 0xc3321200][SLP] thread=20 > taskq > 20 c3378830 0 0 0 0000204 [IWAIT] swi6: + > 19 c3378a3c 0 0 0 0000204 [IWAIT] swi6: task queue > 8 c32da20c 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task2 > 7 c32da418 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task1 > 6 c32da624 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task0 > 5 c32da830 0 0 0 0000204 [SLPQ - 0xc3321500][SLP] kqueue=20 > taskq > 18 c32daa3c 0 0 0 0000204 [IWAIT] swi2: cambio > 17 c32dac48 0 0 0 0000204 [IWAIT] swi5: + > 16 c3319000 0 0 0 0000204 [SLPQ - 0xc06b6b60][SLP] yarrow > 4 c331920c 0 0 0 0000204 [SLPQ - 0xc06b7828][SLP] g_down > 3 c3319418 0 0 0 0000204 [SLPQ - 0xc06b7824][SLP] g_up > 2 c32d5000 0 0 0 0000204 [SLPQ - 0xc06b781c][SLP] g_event > 15 c32d520c 0 0 0 0000204 [IWAIT] swi1: net > 14 c32d5418 0 0 0 0000204 [IWAIT] swi3: vm > 13 c32d5624 0 0 0 000020c [IWAIT] swi4: clock sio > 12 c32d5830 0 0 0 000020c [CPU 0] idle: cpu0 > 11 c32d5a3c 0 0 0 000020c [CPU 1] idle: cpu1 > 1 c32d5c48 0 0 1 0004200 [SLPQ wait 0xc32d5c48][SLP] init > 10 c32da000 0 0 0 0000204 [SLPQ ktrace 0xc06b8258][SLP] ktra= ce > 0 c06b7920 0 0 0 0000200 [IWAIT] swapper > db> tr 524 > Tracing pid 524 tid 100057 td 0xc35e1d80 > sched_switch(c35e1d80,0,1,10a,73683eb3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c35e020c) at mi_switch+0x2e6 > sleepq_switch(c35e020c,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c35e020c,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c35e020c,c35e0274,15c,c0667749,0) at msleep+0x326 > kern_wait(c35e1d80,ffffffff,dab4cc7c,6,0) at kern_wait+0x8bd > wait4(c35e1d80,dab4cd04,10,41d,4) at wait4+0x3c > syscall(3b,3b,3b,1,0) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x2829d273, esp =3D=20 > 0xbfbfe56c, ebp =3D 0xbfbfe588 --- > db> tr 544 > Tracing pid 544 tid 100090 td 0xc36c0000 > sched_switch(c36c0000,0,1,10a,753725b3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c36bcc48) at mi_switch+0x2e6 > sleepq_switch(c36bcc48,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c36bcc48,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c36bcc48,c36bccb0,15c,c0667749,0) at msleep+0x326 > kern_wait(c36c0000,222,dabc3c7c,0,0) at kern_wait+0x8bd > wait4(c36c0000,dabc3d04,10,41d,4) at wait4+0x3c > syscall(3b,3b,3b,8050100,2) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x280d4273, esp =3D=20 > 0xbfbfe30c, ebp =3D 0xbfbfe328 --- > db> tr 511 > Tracing pid 511 tid 100080 td 0xc35e2c00 > sched_switch(c35e2c00,0,1,10a,ba5fc6b3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c36bb624) at mi_switch+0x2e6 > sleepq_switch(c36bb624,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c36bb624,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c36bb624,c36bb68c,15c,c0667749,0) at msleep+0x326 > kern_wait(c35e2c00,ffffffff,dab67c7c,6,0) at kern_wait+0x8bd > wait4(c35e2c00,dab67d04,10,41d,4) at wait4+0x3c > syscall(3b,3b,bfbf003b,1,0) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x2829d273, esp =3D=20 > 0xbfbfe88c, ebp =3D 0xbfbfe8a8 --- > db> show allpcpu > Current CPU: 0 >=20 > cpuid =3D 0 > curthread =3D 0xc32d6900: pid 12 "idle: cpu0" > curpcb =3D 0xd44dad90 > fpcurthread =3D none > idlethread =3D 0xc32d6900: pid 12 "idle: cpu0" > APIC ID =3D 1 > currentldt =3D 0x50 > spin locks held: >=20 > cpuid =3D 1 > curthread =3D 0xc32d6780: pid 11 "idle: cpu1" > curpcb =3D 0xd44d7d90 > fpcurthread =3D none > idlethread =3D 0xc32d6780: pid 11 "idle: cpu1" > APIC ID =3D 0 > currentldt =3D 0x50 > spin locks held: >=20 > db> show alllocks > db> show lockedvnods > Locked vnodes >=20 > 0xc35d76cc: tag syncer, type VNON > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > lock type syncer: EXCL (count 1) by thread 0xc32dbc00 (pid 35)#0=20 > 0xc04d300c at lockmgr+0x5bc > #1 0xc0541a72 at vop_stdlock+0x32 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc054beb2 at sync_vnode+0x132 > #5 0xc054c1ff at sched_sync+0x26f > #6 0xc04c7851 at fork_exit+0xc1 > #7 0xc0614fac at fork_trampoline+0x8 >=20 >=20 > 0xc363c414: tag ufs, type VREG > usecount 1, writecount 1, refcount 1536 mountedhere 0 > flags () > v_object 0xc36c2210 ref 0 pages 52780 > lock type ufs: EXCL (count 1) by thread 0xc35e2480 (pid 515) with 1= =20 > pending#0 0xc04d300c at lockmgr+0x5bc > #1 0xc05b7ac6 at ffs_lock+0xa6 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc373e485 at mdstart_vnode+0xe5 > #5 0xc373ec5f at md_kthread+0x14f > #6 0xc04c7851 at fork_exit+0xc1 > #7 0xc0614fac at fork_trampoline+0x8 >=20 > ino 1515, on dev aacd0s1f >=20 > 0xc368f000: tag ufs, type VDIR > usecount 4, writecount 0, refcount 6 mountedhere 0 > flags () > v_object 0xc360318c ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc3573d80 (pid 547)#0=20 > 0xc04d300c at lockmgr+0x5bc > #1 0xc05b7ac6 at ffs_lock+0xa6 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc0543ce6 at lookup+0xe6 > #5 0xc0543918 at namei+0x488 > #6 0xc055467f at kern_lstat+0x4f > #7 0xc05545ff at lstat+0x2f > #8 0xc062aea0 at syscall+0x300 > #9 0xc0614f9f at Xint0x80_syscall+0x1f >=20 > ino 3, on dev md0a >=20 > 0xc3777c3c: tag ufs, type VREG > usecount 1, writecount 0, refcount 239 mountedhere 0 > flags () > lock type ufs: EXCL (count 1) by thread 0xc35e2300 (pid 546) with 1= =20 > pending#0 0xc04d300c at lockmgr+0x5bc > #1 0xc0542cd6 at vfs_hash_insert+0x36 > #2 0xc05b657e at ffs_vget+0x1ce > #3 0xc05960c7 at ffs_valloc+0x137 > #4 0xc05c4d19 at ufs_makeinode+0x79 > #5 0xc05c1826 at ufs_create+0x36 > #6 0xc063f332 at VOP_CREATE_APV+0xd2 > #7 0xc05a015a at ffs_snapshot+0x33a > #8 0xc05b3cb1 at ffs_mount+0xa81 > #9 0xc05469fe at vfs_domount+0x6be > #10 0xc05460ea at vfs_donmount+0x47a > #11 0xc054906e at kernel_mount+0x7e > #12 0xc05b3ee4 at ffs_cmount+0x84 > #13 0xc0546326 at mount+0x1e6 > #14 0xc062aea0 at syscall+0x300 > #15 0xc0614f9f at Xint0x80_syscall+0x1f >=20 > ino 4, on dev md0a > db> where 35 > Tracing pid 35 tid 100030 td 0xc32dbc00 > sched_switch(c32dbc00,0,1,10a,3a53d433) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c363c46c,0,c066587f,20c,d451ca24) at sleepq_switch+0x112 > sleepq_wait(c363c46c,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c363c46c,c06b96fc,50,c0669f9f,0) at msleep+0x335 > acquire(d451cad0,40,60000,b1,c32dbc00) at acquire+0x8e > lockmgr(c363c46c,2002,c363c4dc,c32dbc00,c363c4dc) at lockmgr+0x516 > ffs_lock(d451cb38,d451cb1c,c04d73fd,2002,c363c414) at ffs_lock+0xa6 > VOP_LOCK_APV(c06a5920,d451cb38,c0673d22,d451cb3c,c050ccb0) at=20 > VOP_LOCK_APV+0xb4 > vn_lock(c363c414,2002,c32dbc00,7a5,2002) at vn_lock+0xec > vget(c363c414,2002,c32dbc00,2fc,c3558c90) at vget+0xff > qsync(c3558c00,0,c0673039,47c,c0661d3b) at qsync+0x13d > ffs_sync(c3558c00,3,c32dbc00,c32dbc00,c3558c00) at ffs_sync+0x392 > sync_fsync(d451cca0,c0680e2f,c35d76cc,c35d76cc,c35d77d8) at sync_fsync+0x= 19e > VOP_FSYNC_APV(c069f540,d451cca0,c32dbc00,620,0) at VOP_FSYNC_APV+0xd2 > sync_vnode(c35d77d8,c32dbc00,c066c1b0,657,0) at sync_vnode+0x158 > sched_sync(0,d451cd38,c065fd98,31d,0) at sched_sync+0x26f > fork_exit(c054bf90,0,d451cd38) at fork_exit+0xc1 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip =3D 0, esp =3D 0xd451cd6c, ebp =3D 0 --- > db> where 515 > Tracing pid 515 tid 100085 td 0xc35e2480 > sched_switch(c35e2480,0,1,10a,f85dbeb3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c0707be4,0,c066587f,20c,dab58894) at sleepq_switch+0x112 > sleepq_wait(c0707be4,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c0707be4,c0707c00,44,c066a543,0) at msleep+0x335 > waitrunningbufspace(c363c520,cd895584,cd8955e4,4000,cd895584) at=20 > waitrunningbufspace+0x72 > bufwrite(cd895584,c05b8419,c06400ac,246,c06969c4) at bufwrite+0x1a1 > vfs_bio_awrite(cd895584,0,c06733a7,e4,c80000) at vfs_bio_awrite+0x29e > ffs_syncvnode(c363c414,2,c06a5920,dab58980,c0640922) at ffs_syncvnode+0x3= 52 > ffs_fsync(dab589bc,c0680e2f,c35e2480,c35e2480,cd81e4b8) at ffs_fsync+0x1c > VOP_FSYNC_APV(c06a5920,dab589bc,c066a527,37f,c363c414) at VOP_FSYNC_APV+0= xd2 > bdwrite(cd81e4b8,c3671948,cd851fa8,7fa,54c540) at bdwrite+0x12b > ffs_balloc_ufs2(c363c414,f6352000,2f,2000,c3655300) at=20 > ffs_balloc_ufs2+0x193e > ffs_write(dab58c78,c0680c61,0,0,0) at ffs_write+0x369 > VOP_WRITE_APV(c06a5920,dab58c78,c35e2480,1ea,c3558c00) at=20 > VOP_WRITE_APV+0x17c > mdstart_vnode(c36b2000,c37df39c,c373ff9e,2a3,0) at mdstart_vnode+0x126 > md_kthread(c36b2000,dab58d38,c065fd98,31d,c35e2480) at md_kthread+0x14f > fork_exit(c373eb10,c36b2000,dab58d38) at fork_exit+0xc1 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip =3D 0, esp =3D 0xdab58d6c, ebp =3D 0 --- > db> where 547 > Tracing pid 547 tid 100067 td 0xc3573d80 > sched_switch(c3573d80,0,1,10a,5da4c133) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c3777c94,0,c066587f,20c,dab2e71c) at sleepq_switch+0x112 > sleepq_wait(c3777c94,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c3777c94,c06b8814,50,c0669f9f,0) at msleep+0x335 > acquire(dab2e7c8,40,60000,b1,c3573d80) at acquire+0x8e > lockmgr(c3777c94,2002,c3777d04,c3573d80,c3777d04) at lockmgr+0x516 > ffs_lock(dab2e830,dab2e814,c04d73fd,2002,c3777c3c) at ffs_lock+0xa6 > VOP_LOCK_APV(c06a5920,dab2e830,c066b6c8,dab2e834,c050ccb0) at=20 > VOP_LOCK_APV+0xb4 > vn_lock(c3777c3c,2002,c3573d80,7a5,2002) at vn_lock+0xec > vget(c3777c3c,2002,c3573d80,50,dab2ebc0) at vget+0xff > vfs_hash_get(c3429c00,4,2,c3573d80,dab2e98c) at vfs_hash_get+0xe2 > ffs_vget(c3429c00,4,2,dab2e98c,dab2e990) at ffs_vget+0x49 > ufs_lookup(dab2ea40,c0680949,c368f000,c368f000,dab2ebc0) at ufs_lookup+0x= bdf > VOP_CACHEDLOOKUP_APV(c06a5920,dab2ea40,dab2ebc0,c3573d80,c376f380) at=20 > VOP_CACHEDLOOKUP_APV+0xd2 > vfs_cache_lookup(dab2eaec,dab2eaec,0,c368f000,dab2ebc0) at=20 > vfs_cache_lookup+0xd0 > VOP_LOOKUP_APV(c06a5920,dab2eaec,c3573d80,3,1) at VOP_LOOKUP_APV+0xb4 > lookup(dab2eb98,0,c066b85d,b6,6b2) at lookup+0x528 > namei(dab2eb98,dab2ebe8,60,854,c3573d80) at namei+0x488 > kern_lstat(c3573d80,80524a8,0,dab2ec6c,dab2ec88) at kern_lstat+0x4f > lstat(c3573d80,dab2ed04,8,41d,2) at lstat+0x2f > syscall(3b,3b,3b,8052448,8052400) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (190, FreeBSD ELF32, lstat), eip =3D 0x28182613, esp =3D=20 > 0xbfbfe55c, ebp =3D 0xbfbfe5f8 --- > db> where 546 > Tracing pid 546 tid 100086 td 0xc35e2300 > sched_switch(c35e2300,0,1,10a,6af4f133) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,2) at mi_switch+0x2e6 > sleepq_switch(c0707be4,0,c066587f,20c,dab5560c) at sleepq_switch+0x112 > sleepq_wait(c0707be4,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c0707be4,c0707c00,44,c066a543,0) at msleep+0x335 > waitrunningbufspace(c3777d48,cd832380,c3527c00,c3527c00,c3580000) at=20 > waitrunningbufspace+0x72 > bufwrite(cd832380,0,dab55934,c05a043f,cd832380) at bufwrite+0x1a1 > bawrite(cd832380,4a030000,2b,4000,c3655300) at bawrite+0x6b > ffs_snapshot(c3429c00,c3572680,dab559a8,6c,1) at ffs_snapshot+0x61f > ffs_mount(c3429c00,c35e2300,c066ba58,331,c06c0ea0) at ffs_mount+0xa81 > vfs_domount(c35e2300,c354a680,c3391030,1211000,c3391230) at=20 > vfs_domount+0x6be > vfs_donmount(c35e2300,1211000,dab55bf4,c3779080,e) at vfs_donmount+0x47a > kernel_mount(c3391910,1211000,dab55c38,6c,805bb00) at kernel_mount+0x7e > ffs_cmount(c3391910,bfbfec70,1211000,c35e2300,c06a5600) at ffs_cmount+0x84 > mount(c35e2300,dab55d04,c067d6fd,3cb,4) at mount+0x1e6 > syscall(3b,3b,3b,bfbfec70,805dab8) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (21, FreeBSD ELF32, mount), eip =3D 0x280cddb7, esp =3D=20 > 0xbfbfea0c, ebp =3D 0xbfbfed18 --- > db> >=20 First, I set the followup to the right mailing list. Second, I am really curious what you do. My understanding follows: you have set up vnode-backed md device (md0a) on sparce file, created ufs2 on it, mounted it with quotas, and run background fsck on that fs. At the same time, you did rm for the snapshot file created by fsck. Right ? Anyway, the problem seems to be not related to neither snapshots nor quotas. In your trace, process 35 (syncer) tries to sync the vnode 0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That vnode is already locked by process 515 (md0 kthread). Process 515 is stuck in the wdrain state, waiting for buffers to be flushed. It seems that there is huge amount of dirty buffers going to be written to md0, caused by snapshotting the fs. As result, system deadlocks due to md0 hung waiting for buffer' runspace, that is occupied by pending write requests to md0. Do -fs@ readers agree with analysis ? I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file- backed md threads to prevent such deadlocks. That i/o is already accounted for in the upper layer. Moreover, that already accounted requests do not really differ from requests (re)issued by md. Please, comment. --6Vw0j8UKbyX0bfpA Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD4DBQFEpO68C3+MBN1Mb4gRAlo+AKDhO2wjG289EAcx80RCaYc3zzGkvgCVG66T JuFCzZZM2kkdQGV0L5IRTQ== =4WVF -----END PGP SIGNATURE----- --6Vw0j8UKbyX0bfpA--