Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Oct 2012 08:25:24 -0700
From:      Dennis Glatting <freebsd@penx.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: I have a DDB session open to a crashed ZFS server
Message-ID:  <1350401124.72003.38.camel@btw.pki2.com>
In-Reply-To: <1350400597.72003.32.camel@btw.pki2.com>
References:  <1350317019.71982.50.camel@btw.pki2.com> <201210160844.41042.jhb@freebsd.org> <1350400597.72003.32.camel@btw.pki2.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 2012-10-16 at 08:44 -0400, John Baldwin wrote:
> On Monday, October 15, 2012 12:03:39 pm Dennis Glatting wrote:
> > FreeBSD/amd64 (mc) (ttyu0)
> > 
> > login: NMI ... going to debugger
> > [ thread pid 11 tid 100003 ]
> 
> You got an NMI, not a crash.  What happens if you just continue ('c'
command) 
> from DDB?
> 

I hit the NMI button because of the "crash," which is a misword, to get
into DDB. 

The problem I am having with ZFS where the file systems go dorment under
load within 24 hours. Specifically, the processes are still alive, stuck
in disk wait, and there is no disk I/O. I have this problem across four
machines for months.

The network and console still work but if I enter a command requiring a
pull from disk, nothing comes back.


> I have heard of machines sending spurious NMIs in the past.  If that
is what 
> you are seeing, there is a sysctl to disable dropping into DDB due to
an NMI:
> 
> machdep.kdb_on_nmi: 1
> 
> If you keep getting NMIs, try setting that to 0.
> 

The DDB session is still open but I don't see why this system is stuck.
I have been looking at locked processes (two below) but I'm not familiar
with the code. Maybe a deadlock? Maybe a missed interrupt? Maybe an
unsupported controller? I dunno.




0xfffffe0b989803f0: 0xfffffe0b989803f0: tag zfs, type VDIR
tag zfs, type VDIR
    usecount 0, writecount 0, refcount 2 mountedhere 0
    usecount 0, writecount 0, refcount 2 mountedhere 0
    flags (VI_DOINGINACT|VI(0x200))
    flags (VI_DOINGINACT|VI(0x200))
    v_object 0xfffffe0b87af7488 ref 0 pages 0
    v_object 0xfffffe0b87af7488 ref 0 pages 0
        lock type zfs: EXCL by thread 0xfffffe09b48fe900 (pid 70646)
lock type zfs: EXCL by thread 0xfffffe09b48fe900 (pid 70646)

db> show thread 0xfffffe09b48fe900
Thread 104609 at 0xfffffe09b48fe900:
 proc (pid 70646): 0xfffffe0af7e79940
 name: find
 stack: 0xffffffa3329b2000-0xffffffa3329b5fff
 flags: 0x4  pflags: 0
 state: INHIBITED: {SLEEPING}
 wmesg: tx->tx_quiesce_done_cv)  wchan: 0xfffffe0059b60240
 priority: 120
 container lock: sleepq chain (0xffffffff8126d498)

db> sh proc 70646
Process 70646 (find) at 0xfffffe0af7e79940:
 state: NORMAL
 uid: 0  gids: 0, 5
 parent: pid 70645 at 0xfffffe006e4974a0
 ABI: FreeBSD ELF64
 arguments: find
 threads: 1
104609                   D       tx->tx_q 0xfffffe0059b60240 find

db> tr 104609
Tracing pid 70646 tid 104609 td 0xfffffe09b48fe900
sched_switch() at sched_switch+0x28b
mi_switch() at mi_switch+0xdf
sleepq_wait() at sleepq_wait+0x3a
_cv_wait() at _cv_wait+0x164
txg_wait_open() at txg_wait_open+0x85
dmu_tx_assign() at dmu_tx_assign+0x38
zfs_inactive() at zfs_inactive+0x8e
zfs_freebsd_inactive() at zfs_freebsd_inactive+0xd
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0x5d
vinactive() at vinactive+0xef
vputx() at vputx+0x244
sys_fchdir() at sys_fchdir+0x3f0
amd64_syscall() at amd64_syscall+0x334
Xfast_syscall() at Xfast_syscall+0xfb
--- syscall (13, FreeBSD ELF64, sys_fchdir), rip = 0x80088396c, rsp =
0x7fffffffd998, rbp = 0x7fffffffda40 ---




0xfffffe005c8855e8: 0xfffffe005c8855e8: tag syncer, type VNON
tag syncer, type VNON
    usecount 1, writecount 0, refcount 2 mountedhere 0
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags (VI(0x200))
    flags (VI(0x200))
        lock type syncer: EXCL by thread 0xfffffe0039147480 (pid 20)
lock type syncer: EXCL by thread 0xfffffe0039147480 (pid 20)

db> sh thread 0xfffffe0039147480
Thread 100242 at 0xfffffe0039147480:
 proc (pid 20): 0xfffffe003f4bc940
 name: syncer
 stack: 0xffffffa2fc73b000-0xffffffa2fc73efff
 flags: 0x4  pflags: 0x240800
 state: INHIBITED: {SLEEPING}
 wmesg: zio->io_cv)  wchan: 0xfffffe01187c5320
 priority: 116
 container lock: sleepq chain (0xffffffff8126e140)

db> sh proc 20
Process 20 (syncer) at 0xfffffe003f4bc940:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xffffffff812cbdd8
 ABI: null
 threads: 1
100242                   D       zio->io_ 0xfffffe01187c5320 [syncer]

db> tr 100242
Tracing pid 20 tid 100242 td 0xfffffe0039147480
sched_switch() at sched_switch+0x28b
mi_switch() at mi_switch+0xdf
sleepq_wait() at sleepq_wait+0x3a
_cv_wait() at _cv_wait+0x164
zio_wait() at zio_wait+0x5b
zil_commit() at zil_commit+0x833
zfs_sync() at zfs_sync+0xaa
sync_fsync() at sync_fsync+0x168
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x5d
sync_vnode() at sync_vnode+0x1b0
sched_sync() at sched_sync+0x29f
fork_exit() at fork_exit+0x9a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffa2fc73eb30, rbp = 0 ---










Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1350401124.72003.38.camel>