FreeBSD Mail Archives

Date:      Thu, 22 Aug 2013 12:16:07 -0400
From:      J David <j.david.lists@gmail.com>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>
Subject:   Re: NFS deadlock on 9.2-Beta1
Message-ID:  <CABXB=RSGrpUr34EdDu_6kab4nDm4NmYAoJZR8MPJOitpVzSNrA@mail.gmail.com>
In-Reply-To: <CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw@mail.gmail.com>
References:  <20130821131032.GX4972@kib.kiev.ua> <461961460.12238255.1377133690607.JavaMail.root@uoguelph.ca> <CABXB=RQZNWg7wmajNWrBLQAiUsAYXqMFAF1GVpFTMf2QvqLqWw@mail.gmail.com>

One deadlocked process cropped up overnight, but I managed to panic
the box before getting too much debugging info. :(

The process was in state T instead of D, which I guess must be a side
effect of some of the debugging code compiled in.

Here are the details I was able to capture:

db>  show proc 7692
Process 7692 (httpd) at 0xfffffe0158793000:
 state: NORMAL
 uid: 25000  gids: 25000
 parent: pid 1 at 0xfffffe00039c3950
 ABI: FreeBSD ELF64
 arguments: /nfsn/apps/tapache22/bin/httpd
 threads: 3
100674                   D       newnfs   0xfffffe021cdd9848 httpd
100597                   D       pgrbwt   0xfffffe02fda788b8 httpd
100910                   s                                   httpd

db> show thread 100674
Thread 100674 at 0xfffffe0108c79480:
 proc (pid 7692): 0xfffffe0158793000
 name: httpd
 stack: 0xffffff834c80f000-0xffffff834c812fff
 flags: 0x2a804  pflags: 0
 state: INHIBITED: {SLEEPING}
 wmesg: newnfs  wchan: 0xfffffe021cdd9848
 priority: 96
 container lock: sleepq chain (0xffffffff813c03c8)

db> tr 100674
Tracing pid 7692 tid 100674 td 0xfffffe0108c79480
sched_switch() at sched_switch+0x234/frame 0xffffff834c812360
mi_switch() at mi_switch+0x15c/frame 0xffffff834c8123a0
sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c8123e0
sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c812410
sleeplk() at sleeplk+0x11a/frame 0xffffff834c812460
__lockmgr_args() at __lockmgr_args+0x9a9/frame 0xffffff834c812580
nfs_lock1() at nfs_lock1+0x87/frame 0xffffff834c8125b0
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbe/frame 0xffffff834c8125e0
_vn_lock() at _vn_lock+0x63/frame 0xffffff834c812640
ncl_upgrade_vnlock() at ncl_upgrade_vnlock+0x5e/frame 0xffffff834c812670
ncl_bioread() at ncl_bioread+0x195/frame 0xffffff834c8127e0
VOP_READ_APV() at VOP_READ_APV+0xd1/frame 0xffffff834c812810
vn_rdwr() at vn_rdwr+0x2bc/frame 0xffffff834c8128d0
kern_sendfile() at kern_sendfile+0xa90/frame 0xffffff834c812ac0
do_sendfile() at do_sendfile+0x92/frame 0xffffff834c812b20
amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c812c30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c812c30
--- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp
= 0x7ffffe9f43c8, rbp = 0x7ffffe9f4700 ---

db> show lockchain 100674
thread 100674 (pid 7692, httpd) inhibited

db> show thread 100597
Thread 100597 at 0xfffffe021c976000:
 proc (pid 7692): 0xfffffe0158793000
 name: httpd
 stack: 0xffffff834c80a000-0xffffff834c80dfff
 flags: 0x28804  pflags: 0
 state: INHIBITED: {SLEEPING}
 wmesg: pgrbwt  wchan: 0xfffffe02fda788b8
 priority: 84
 container lock: sleepq chain (0xffffffff813c0148)

db> tr 100597
Tracing pid 7692 tid 100597 td 0xfffffe021c976000
sched_switch() at sched_switch+0x234/frame 0xffffff834c80d750
mi_switch() at mi_switch+0x15c/frame 0xffffff834c80d790
sleepq_switch() at sleepq_switch+0x17d/frame 0xffffff834c80d7d0
sleepq_wait() at sleepq_wait+0x43/frame 0xffffff834c80d800
_sleep() at _sleep+0x30f/frame 0xffffff834c80d890
vm_page_grab() at vm_page_grab+0x120/frame 0xffffff834c80d8d0
kern_sendfile() at kern_sendfile+0x992/frame 0xffffff834c80dac0
do_sendfile() at do_sendfile+0x92/frame 0xffffff834c80db20
amd64_syscall() at amd64_syscall+0x259/frame 0xffffff834c80dc30
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff834c80dc30
--- syscall (393, FreeBSD ELF64, sys_sendfile), rip = 0x801b26f4c, rsp
= 0x7ffffebf53c8, rbp = 0x7ffffebf5700 ---

db> show lockchain 100597
thread 100597 (pid 7692, httpd) inhibited

The "inhibited" is not something I'm familiar with and didn't match
the example output; I thought that maybe the T state was overpowering
the locks, and that maybe I should continue the system and then -CONT
the process.  However, a few seconds after I issued "c" at the DDB
prompt, the system panicked in the console driver ("mtx_lock_spin:
recursed on non-recursive mutex cnputs_mtx @
/usr/src/sys/kern/kern_cons.c:500"), so I guess that's not a thing to
do. :(

Sorry my stupidity and ignorance is dragging this out. :(  This is all
well outside my comfort zone, but next time I'll get it for sure.

Thanks!

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RSGrpUr34EdDu_6kab4nDm4NmYAoJZR8MPJOitpVzSNrA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation