Date: Thu, 22 Aug 2013 12:16:07 -0400 From: J David <j.david.lists@gmail.com> To: freebsd-stable <freebsd-stable@freebsd.org> Cc: Rick Macklem <rmacklem@uoguelph.ca> Subject: Re: NFS deadlock on 9.2-Beta1 Message-ID: <CABXB=RSGrpUr34EdDu_6kab4nDm4NmYAoJZR8MPJOitpVzSNrA@mail.gmail.com> In-Reply-To: <CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw@mail.gmail.com> References: <20130821131032.GX4972@kib.kiev.ua> <461961460.12238255.1377133690607.JavaMail.root@uoguelph.ca> <CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
One deadlocked process cropped up overnight, but I managed to panic the box before getting too much debugging info. :( The process was in state T instead of D, which I guess must be a side effect of some of the debugging code compiled in. Here are the details I was able to capture: db> show proc 7692 Process 7692 (httpd) at 0xfffffe0158793000: state: NORMAL uid: 25000 gids: 25000 parent: pid 1 at 0xfffffe00039c3950 ABI: FreeBSD ELF64 arguments: /nfsn/apps/tapache22/bin/httpd threads: 3 100674 D newnfs 0xfffffe021cdd9848 httpd 100597 D pgrbwt 0xfffffe02fda788b8 httpd 100910 s httpd db> show thread 100674 Thread 100674 at 0xfffffe0108c79480: proc (pid 7692): 0xfffffe0158793000 name: httpd stack: 0xffffff834c80f000-0xffffff834c812fff flags: 0x2a804 pflags: 0 state: INHIBITED: {SLEEPING} wmesg: newnfs wchan: 0xfffffe021cdd9848 priority: 96 container lock: sleepq chain (0xffffffff813c03c8) db> tr 100674 Tracing pid 7692 tid 100674 td 0xfffffe0108c79480 sched_switch() at sched_switch+0x234/frame 0xffffff834c812360 mi_switch() at mi_switch+0x15c/frame 0xffffff834c8123a0 sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c8123e0 sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c812410 sleeplk() at sleeplk+0x11a/frame 0xffffff834c812460 __lockmgr_args() at __lockmgr_args+0x9a9/frame 0xffffff834c812580 nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c8125b0 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c8125e0 _vn_lock() at _vn_lock+0x63/frame 0xffffff834c812640 ncl_upgrade_vnlock() at ncl_upgrade_vnlock+0x5e/frame 0xffffff834c812670 ncl_bioread() at ncl_bioread+0x195/frame 0xffffff834c8127e0 VOP_READ_APV() at VOP_READ_APV+0xd1/frame 0xffffff834c812810 vn_rdwr() at vn_rdwr+0x2bc/frame 0xffffff834c8128d0 kern_sendfile() at kern_sendfile+0xa90/frame 0xffffff834c812ac0 do_sendfile() at do_sendfile+0x92/frame 0xffffff834c812b20 amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c812c30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c812c30 --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp = 0x7ffffe9f43c8, rbp = 0x7ffffe9f4700 --- db> show lockchain 100674 thread 100674 (pid 7692, httpd) inhibited db> show thread 100597 Thread 100597 at 0xfffffe021c976000: proc (pid 7692): 0xfffffe0158793000 name: httpd stack: 0xffffff834c80a000-0xffffff834c80dfff flags: 0x28804 pflags: 0 state: INHIBITED: {SLEEPING} wmesg: pgrbwt wchan: 0xfffffe02fda788b8 priority: 84 container lock: sleepq chain (0xffffffff813c0148) db> tr 100597 Tracing pid 7692 tid 100597 td 0xfffffe021c976000 sched_switch() at sched_switch+0x234/frame 0xffffff834c80d750 mi_switch() at mi_switch+0x15c/frame 0xffffff834c80d790 sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c80d7d0 sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c80d800 _sleep() at _sleep+0x30f/frame 0xffffff834c80d890 vm_page_grab() at vm_page_grab+0x120/frame 0xffffff834c80d8d0 kern_sendfile() at kern_sendfile+0x992/frame 0xffffff834c80dac0 do_sendfile() at do_sendfile+0x92/frame 0xffffff834c80db20 amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c80dc30 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c80dc30 --- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp = 0x7ffffebf53c8, rbp = 0x7ffffebf5700 --- db> show lockchain 100597 thread 100597 (pid 7692, httpd) inhibited The "inhibited" is not something I'm familiar with and didn't match the example output; I thought that maybe the T state was overpowering the locks, and that maybe I should continue the system and then -CONT the process. However, a few seconds after I issued "c" at the DDB prompt, the system panicked in the console driver ("mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/sys/kern/kern_cons.c:500"), so I guess that's not a thing to do. :( Sorry my stupidity and ignorance is dragging this out. :( This is all well outside my comfort zone, but next time I'll get it for sure. Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RSGrpUr34EdDu_6kab4nDm4NmYAoJZR8MPJOitpVzSNrA>