Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Dec 2012 10:18:23 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Derek Kulinski <takeda@takeda.tk>
Cc:        freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Message-ID:  <50D9614F.6040306@FreeBSD.org>
In-Reply-To: <574019558.20121224161156@takeda.tk>
References:  <1824023197.20121223142308@takeda.tk> <50D87C56.70709@FreeBSD.org> <331959998.20121224101719@takeda.tk> <50D8E500.1070408@FreeBSD.org> <574019558.20121224161156@takeda.tk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 25/12/2012 02:11 Derek Kulinski said the following:
> Hello Andriy,
> 
> Monday, December 24, 2012, 3:28:00 PM, you wrote:
> 
>> I've looked through the cores and it does look like in all cases some sort of
>> memory corruption is a precursor to a subsequent crash.
> 
>> I can't decidedly say if the corruptions are caused by the hardware, by some
>> code overwriting random memory locations ("rogue" driver) or by a "simpler" bug
>> like use after free.
> 
>> I am always inclined to suspect the hardware first.
> 
>> You can try to reproduce the problem with some additional checks enabled in the
>> kernel.  Those should catch the problem earlier and thus make its source clearer.
> 
>> I recommend the following:
>> options         INVARIANTS
>> options         INVARIANT_SUPPORT
>> options         WITNESS
>> options         DEBUG_MEMGUARD
>> makeoptions     DEBUG+="-DDEBUG"
> 
>> The last is really needed only for the ZFS and OpenSolaris compat code.  It make
>> result in some extra noise from unrelated subsystems.
>> Perhaps you could just add "#define DEBUG" to
>> sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
>> approach though.
> 
>> Also, please put vm.memguard.desc="arc_buf_hdr_t" into loader.conf.
> 
>> Please note that these options will make your system significantly slower.
> 
> I recompiled the kernel and is running with options you specified (I
> enabled DEBUG in the file).
> 
> Anyway even at boot time I started getting following warnings, is this
> anything:

These witness warning are OK-ish.
Watch for panics.

BTW, I should have said this earlier.  Whatever the kind of the corruptions it
would be much worse if a corruption would get propagated to the stable storage.
Especially if it would be in any kind of pool metadata.

So, your data is at great risk now.
Please also take measures to back it up.  Preferably by using a different system.

> Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files
> Dec 24 16:06:03 chinatsu kernel: lock order reversal:
> Dec 24 16:06:03 chinatsu kernel: 1st 0xffffffff80bf5780 pf task mtx (pf task mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330
> Dec 24 16:06:03 chinatsu kernel: .
> Dec 24 16:06:03 chinatsu kernel: 2nd 0xfffffe0009211af8 radix node head (radix node head) @ /usr/src/sys/net/route.c:384
> Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace:
> Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
> Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
> Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844
> Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at
> Dec 24 16:06:03 chinatsu kernel: Starting syslogd.
> Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81
> Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c
> Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5
> Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd
> Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf
> Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58
> Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b
> Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2
> Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc
> Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170
> Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d
> Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209
> Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170
> Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff
> Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at taskqueue_run_locked+0x93
> Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0x3e
> Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133
> Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe
> Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xffffff85fb2ebbb0, rbp = 0 ---
> Dec 24 16:06:03 chinatsu kernel: No core dumps found.
> Dec 24 16:06:04 chinatsu kernel: lock order reversal:
> Dec 24 16:06:04 chinatsu kernel: 1st 0xffffff85b9cb8dd8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2677
> Dec 24 16:06:04 chinatsu kernel: 2nd 0xfffffe00092c5c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284
> Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
> Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
> Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
> Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844
> Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61
> Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x33
> Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove() at
> Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove+0x16
> Dec 24 16:06:04 chinatsu kernel: ufs_dirremove() at ufs_dirremove+0x1bb
> Dec 24 16:06:04 chinatsu kernel: ufs_remove() at ufs_remove+0x92
> Dec 24 16:06:04 chinatsu kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xb7
> Dec 24 16:06:04 chinatsu kernel: kern_unlinkat() at kern_unlinkat+0x2eb
> Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e
> Dec 24 16:06:04 chinatsu kernel: Xfast_syscall() at Xfast_syscall+0xf7
> Dec 24 16:06:04 chinatsu kernel: --- syscall (10, FreeBSD ELF64, sys_unlink), rip = 0x80090a22c, rsp = 0x7fffffff
> Dec 24 16:06:04 chinatsu kernel: ca88, rbp = 0x7fffffffdf20 ---
> Dec 24 16:06:04 chinatsu kernel: lock order reversal:
> Dec 24 16:06:04 chinatsu kernel: 1st 0xfffffe00266ddbd8 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:849
> Dec 24 16:06:04 chinatsu kernel: 2nd 0xfffffe002679a818 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2158
> Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace:
> Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37
> Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c
> Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844
> Dec 24 16:06:04 chinatsu kernel: __lockmgr_args() at __lockmgr_args+0x10d9
> Dec 24 16:06:04 chinatsu kernel: vop_stdlock() at vop_stdlock+0x39
> Dec 24 16:06:04 chinatsu kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbf
> Dec 24 16:06:04 chinatsu kernel: _vn_lock() at _vn_lock+0x47
> Dec 24 16:06:04 chinatsu kernel: vget() at vget+0x7b
> Dec 24 16:06:04 chinatsu kernel: devfs_allocv() at devfs_allocv+0x13f
> Dec 24 16:06:04 chinatsu kernel: devfs_root() at devfs_root+0x4d
> Dec 24 16:06:04 chinatsu kernel: vfs_donmount() at vfs_donmount+0xafa
> Dec 24 16:06:04 chinatsu kernel: sys_nmount() at sys_nmount+0x66
> Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e
> Dec 24 16:06:04 chinatsu kernel: Xfast_syscall() at Xfast_syscall+0xf7
> Dec 24 16:06:04 chinatsu kernel: --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x800a8d71c, rsp = 0x7fffffffccc8, rbp = 0x801009048 ---
> Dec 24 16:06:05 chinatsu named[1387]: starting BIND 9.8.3-P4 -t /var/named -u bind
> Dec 24 16:06:05 chinatsu kernel: Starting named.
> 
> 
> 


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50D9614F.6040306>