From owner-freebsd-current@FreeBSD.ORG Wed Dec 3 06:39:18 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 12C8416A4CE; Wed, 3 Dec 2003 06:39:18 -0800 (PST) Received: from ptcnat.era.pl (ptcnat.era.pl [213.158.197.100]) by mx1.FreeBSD.org (Postfix) with ESMTP id 93E2A43F3F; Wed, 3 Dec 2003 06:39:13 -0800 (PST) (envelope-from zaks@era.pl) Received: by localhost (Postfix, from userid 1001) id 40C6D11426; Wed, 3 Dec 2003 15:39:11 +0100 (CET) To: Robert Watson From: =?iso-8859-2?q?S=B3awek_=AFak?= References: Date: Wed, 03 Dec 2003 15:39:11 +0100 In-Reply-To: (Robert Watson's message of "Tue, 2 Dec 2003 01:45:43 -0500 (EST)") Message-ID: <86k75eufsg.fsf@thirst.unx.era.pl> User-Agent: Gnus/5.1003 (Gnus v5.10.3) XEmacs/21.4 (Reasonable Discussion, berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii cc: Jun Kuriyama cc: Current Subject: Re: -current lockup (how to diagnose?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Dec 2003 14:39:18 -0000 Robert Watson writes: > On Tue, 2 Dec 2003, Jun Kuriyama wrote: > >> At Mon, 1 Dec 2003 09:23:21 -0500 (EST), >> Robert Watson wrote: >> > This could be a sign of a VM or VFS lock leak or deadlock. I'd advise >> > hooking up a serial console, dropping to DDB over serial line, and posting >> > the results of "ps" and "show lockedvnods". We might then ask you to use >> > the "show locks" command on various processes. You'll need to have DDB >> > and WITNESS compiled in. >> >> I got it. >> >> http://www.imgsrc.co.jp/~kuriyama/BSD/lock-20031202.log > > "ouch" > > Could you try compiling in DEBUG_LOCKS into your kernel and doing "show > lockedvnods" with that? Unfortunately, someone removed the pid from the > output of that command, but didn't add the thread pointer to the DDB ps > output, so you'll probably need to modify the lockmgr_printinfo() function > in vfs_subr.c to print out lkp->lk_lockholder->td_proc->p_pid as well for > exclusive locks. It looks like maybe something isn't releasing a vnode > lock before returning to userspace. I have some patches to assert that no > lockmgr locks are held on the return to userspace, but I'll have to dig > them up tomorrow and send them to you. Basically, it adds a per-thread > lockmgr lock count in a thread-local variable, incrementing for each lock, > and decrementing for each release, and then KASSERT()'s in userret that > the variable is 0. I have the same problem. find is offending in my case: db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 35331 c955654c f0bba000 0 91902 35331 0004002 [SLP]getblk 0xd5782e3c] find 85666 c9556388 f0bb9000 0 86354 85666 0004002 [CV]select 0xc06d0b04] top 86354 c91a4710 ee7f5000 0 29327 86354 0004002 [SLP]pause 0xc9544000] csh 29327 c91a4e20 ee7f9000 0 40805 29327 0000100 [CV]select 0xc06d0b04] sshd 91902 c9385710 f0b54000 0 44297 91902 0004002 [SLP]pause 0xc9566000] csh 44297 c9273000 f0a46000 0 40805 44297 0000100 [CV]select 0xc06d0b04] sshd 89288 c92731c4 f0a8f000 0 1 89288 0004002 [SLP]ttyin 0xc91ea010] getty 30656 c9557388 f0bc2000 0 1 30656 0004002 [SLP]ttyin 0xc93c4010] getty 23066 c8f1e710 e60d6000 0 1 23066 0004002 [SLP]ttyin 0xc92bfe10] getty 3582 c9557000 f0bc0000 0 1 3582 0004002 [SLP]ttyin 0xc93c3210] getty 38198 c9273c5c f0a95000 0 1 38198 0004002 [SLP]ttyin 0xc93c3810] getty 32813 c95568d4 f0bbc000 0 1 32813 0004002 [SLP]ttyin 0xc93c3e10] getty 25214 c9556a98 f0bbd000 0 1 25214 0004002 [SLP]ttyin 0xc93c4410] getty 85142 c9231000 f09ea000 0 1 85142 0004002 [SLP]ttyin 0xc91ea210] getty 27067 c9385e20 f0ba0000 0 1 27067 0004002 [SLP]ttyin 0xc8fc1410] getty 89169 c955754c f0bc3000 0 1 89169 0000000 [SLP]nanslp 0xc06a971c] cron 40805 c9231710 f0a36000 0 1 40805 0000100 [CV]select 0xc06d0b04] sshd 13151 c92a1710 f0aff000 0 1 13151 0000000 [CV]select 0xc06d0b04] syslogd 83 c91a3388 ee7c6000 0 0 0 0000204 [SLP]- 0xc06d312c] nfsiod 3 82 c91a354c ee7c7000 0 0 0 0000204 [SLP]- 0xc06d3128] nfsiod 2 81 c91a3710 ee7c8000 0 0 0 0000204 [SLP]- 0xc06d3124] nfsiod 1 80 c91a38d4 ee7c9000 0 0 0 0000204 [SLP]- 0xc06d3120] nfsiod 0 79 c91a3a98 ee7ca000 0 0 0 0000204 [SLP]vlruwt 0xc91a3a98] vnlru 78 c91a3c5c ee7cb000 0 0 0 0000204 [SLP]syncer 0xc06a90e0] syncer 77 c91a3e20 ee7cc000 0 0 0 0000204 [SLP]psleep 0xc06d0f6c] bufdaen 76 c91a4000 ee7cd000 0 0 0 000020c [SLP]pgzero 0xc06d98e8] pagezeo 75 c8ee7a98 e60a2000 0 0 0 0000204 [SLP]psleep 0xc06d9940] vmdaemn 74 c8ee7c5c e60a3000 0 0 0 0000204 [SLP]psleep 0xc06d992c] pagedan 73 c8ee7e20 e60a4000 0 0 0 0000204 [IWAIT] swi0: tty:sio 9 c8f1b000 e60a5000 0 0 0 0000204 [SLP]aifthd 0xc8f1b000] aac0aif 8 c8f1b1c4 e60a6000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t2 7 c8f1b388 e60a7000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t1 6 c8f1b54c e60a8000 0 0 0 0000204 [SLP]actask 0xc06a0dcc] acpi_t0 72 c8f1b710 e60a9000 0 0 0 0000204 new [IWAIT] swi5:+ 5 c8f1b8d4 e60aa000 0 0 0 0000204 [SLP]tqthr 0xc06aab68] taskquee 71 c8f1ba98 e60ab000 0 0 0 0000204 new [IWAIT] swi6:+ 70 c8f1bc5c e60ac000 0 0 0 0000204 [IWAIT] swi7: task queue 69 c8f1be20 e60ad000 0 0 0 0000204 [IWAIT] swi7: acpitaskq 68 c8f1e000 e60ae000 0 0 0 0000204 new [IWAIT] swi3: cambio 67 c8f1e1c4 e60af000 0 0 0 0000204 new [IWAIT] swi2: camnet 66 c8f1e388 e60b0000 0 0 0 0000204 [SLP]- 0xc06a1040] random 4 c8ed21c4 e604c000 0 0 0 0000204 [SLP]- 0xc06a3d60] g_down 3 c8ed2388 e604d000 0 0 0 0000204 [SLP]- 0xc06a3d5c] g_up 2 c8ed254c e604e000 0 0 0 0000204 [SLP]- 0xc06a3d54] g_event 65 c8ed2710 e604f000 0 0 0 0000204 [IWAIT] swi1: net 64 c8ed28d4 e6050000 0 0 0 0000204 new [IWAIT] swi4: vm 63 c8ed2a98 e6051000 0 0 0 000020c [LOCK Giant c9299200] swi8: tty:sio clock 62 c8ed2c5c e6052000 0 0 0 0000204 new [IWAIT] irq0: clk 61 c8ed2e20 e6053000 0 0 0 0000204 new [IWAIT] irq47: 60 c8ee7000 e6054000 0 0 0 0000204 new [IWAIT] irq46: 59 c8ee71c4 e6055000 0 0 0 0000204 new [IWAIT] irq45: 58 c8ee7388 e6056000 0 0 0 0000204 new [IWAIT] irq44: 57 c8ee754c e6057000 0 0 0 0000204 new [IWAIT] irq43: 56 c8ee7710 e6058000 0 0 0 0000204 new [IWAIT] irq42: 55 c8ee78d4 e60a1000 0 0 0 0000204 new [IWAIT] irq41: 54 c8ec3a98 e601b000 0 0 0 0000204 new [IWAIT] irq40: 53 c8ec3c5c e601c000 0 0 0 0000204 new [IWAIT] irq39: 52 c8ec3e20 e601d000 0 0 0 0000204 new [IWAIT] irq38: 51 c8ed0000 e601e000 0 0 0 0000204 new [IWAIT] irq37: 50 c8ed01c4 e601f000 0 0 0 0000204 new [IWAIT] irq36: 49 c8ed0388 e6020000 0 0 0 0000204 new [IWAIT] irq35: 48 c8ed054c e6021000 0 0 0 0000204 new [IWAIT] irq34: 47 c8ed0710 e6022000 0 0 0 0000204 new [IWAIT] irq33: 46 c8ed08d4 e6023000 0 0 0 0000204 new [IWAIT] irq32: 45 c8ed0a98 e6024000 0 0 0 0000204 new [IWAIT] irq31: 44 c8ed0c5c e6049000 0 0 0 0000204 [IWAIT] irq30: aac0 43 c8ed0e20 e604a000 0 0 0 0000204 new [IWAIT] irq29: bge1 42 c8ed2000 e604b000 0 0 0 0000204 [IWAIT] irq28: bge0 41 c8eb554c e5feb000 0 0 0 0000204 new [IWAIT] irq27: 40 c8eb5710 e5fec000 0 0 0 0000204 new [IWAIT] irq26: 39 c8eb58d4 e5fed000 0 0 0 0000204 new [IWAIT] irq25: 38 c8eb5a98 e5fee000 0 0 0 0000204 new [IWAIT] irq24: 37 c8eb5c5c e5fef000 0 0 0 0000204 new [IWAIT] irq23: 36 c8eb5e20 e5ff0000 0 0 0 0000204 new [IWAIT] irq22: 35 c8ec3000 e5ff1000 0 0 0 0000204 new [IWAIT] irq21: 34 c8ec31c4 e6016000 0 0 0 0000204 new [IWAIT] irq20: 33 c8ec3388 e6017000 0 0 0 0000204 new [IWAIT] irq19: 32 c8ec354c e6018000 0 0 0 0000204 new [IWAIT] irq18: 31 c8ec3710 e6019000 0 0 0 0000204 new [IWAIT] irq17: 30 c8ec38d4 e601a000 0 0 0 0000204 new [IWAIT] irq16: 29 c3bc51c4 e3db8000 0 0 0 0000204 new [IWAIT] irq15: ata1 28 c3bc5388 e3db9000 0 0 0 0000204 [IWAIT] irq14: ata0 27 c3bc554c e3dba000 0 0 0 0000204 new [IWAIT] irq13: 26 c3bc5710 e3dbb000 0 0 0 0000204 new [IWAIT] irq12: psm0 25 c3bc58d4 e3de0000 0 0 0 0000204 new [IWAIT] irq11: 24 c3bc5a98 e3de1000 0 0 0 0000204 new [IWAIT] irq10: 23 c3bc5c5c e3de2000 0 0 0 0000204 new [IWAIT] irq9: acpi0 22 c3bc5e20 e3de3000 0 0 0 0000204 new [IWAIT] irq8: rtc 21 c8eb5000 e5fe8000 0 0 0 0000204 new [IWAIT] irq7: 20 c8eb51c4 e5fe9000 0 0 0 0000204 [IWAIT] irq6: fdc0 19 c8eb5388 e5fea000 0 0 0 0000204 new [IWAIT] irq5: 18 c3bbe000 e3d66000 0 0 0 0000204 new [IWAIT] irq4: sio0 17 c3bbe1c4 e3daf000 0 0 0 0000204 new [IWAIT] irq3: sio1 16 c3bbe388 e3db0000 0 0 0 0000204 new [IWAIT] irq2: 15 c3bbe54c e3db1000 0 0 0 0000204 [CPU 0] irq1: atkbd0 14 c3bbe710 e3db2000 0 0 0 000020c [Can run] idle: cpu0 13 c3bbe8d4 e3db3000 0 0 0 000020c [CPU 1] idle: cpu1 12 c3bbea98 e3db4000 0 0 0 000020c [CPU 2] idle: cpu2 11 c3bbec5c e3db5000 0 0 0 000020c [CPU 3] idle: cpu3 1 c3bbee20 e3db6000 0 0 1 0004200 [SLP]wait 0xc3bbee20] init 10 c3bc5000 e3db7000 0 0 0 0000204 [CV]ktrace 0xc06a7304] ktrace 0 c06a3e60 c081f000 0 0 0 0000200 [SLP]sched 0xc06a3e60] swapper db> show lockedvnods Locked vnodes 0xc96734a0: tag ufs, type VDIR, usecount 0, writecount 0, refcount 1, flags (VV_OBJBUF), l0 ino 4764471, on dev aacd0s2d (4, 21) db> trace 35331 sched_switch(c9386c80,2,c0657786,1d4,5a0daf6f) at sched_switch+0x9c mi_switch(c9386c80,50,c0657786,ca,1) at mi_switch+0x20a msleep(d5782e3c,c06a8868,50,c065d71e,0) at msleep+0x4b5 acquire(f0b78abc,2000020,600,e7,c9386c80) at acquire+0xa0 debuglockmgr(d5782e3c,2090022,c9259128,c9386c80,c0650ee8) at debuglockmgr+0x3f7 getblk(c9259128,48813a0,0,4000,0) at getblk+0x184 breadn(c9259128,48813a0,0,4000,0) at breadn+0x52 bread(c9259128,48813a0,0,4000,0) at bread+0x4c ffs_update(c96734a0,0,1,54,246) at ffs_update+0x1ff ufs_inactive(f0b78c78,f0b78c90,c0548934,f0b78c78,0) at ufs_inactive+0x1f0 ufs_vnoperate(f0b78c78,0,c065e9c5,89a,897) at ufs_vnoperate+0x18 vrele(c96734a0,0,c065f03f,2cb,2b0) at vrele+0x147 fchdir(c9386c80,f0b78d14,c066d42a,3ee,1) at fchdir+0x1ca syscall(2f,2f,2f,5,8055080) at syscall+0x292 Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (13, FreeBSD ELF32, fchdir), eip = 0x280c4fcf, esp = 0xbfbfeb0c, ebp = 0xbfbfe- db> /S