From owner-freebsd-stable@FreeBSD.ORG Tue Dec 25 08:18:42 2012 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2856A5F7; Tue, 25 Dec 2012 08:18:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 324858FC0C; Tue, 25 Dec 2012 08:18:40 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA10993; Tue, 25 Dec 2012 10:18:27 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TnPiU-000Iee-Ul; Tue, 25 Dec 2012 10:18:27 +0200 Message-ID: <50D9614F.6040306@FreeBSD.org> Date: Tue, 25 Dec 2012 10:18:23 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Derek Kulinski Subject: Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines References: <1824023197.20121223142308@takeda.tk> <50D87C56.70709@FreeBSD.org> <331959998.20121224101719@takeda.tk> <50D8E500.1070408@FreeBSD.org> <574019558.20121224161156@takeda.tk> In-Reply-To: <574019558.20121224161156@takeda.tk> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Dec 2012 08:18:42 -0000 on 25/12/2012 02:11 Derek Kulinski said the following: > Hello Andriy, > > Monday, December 24, 2012, 3:28:00 PM, you wrote: > >> I've looked through the cores and it does look like in all cases some sort of >> memory corruption is a precursor to a subsequent crash. > >> I can't decidedly say if the corruptions are caused by the hardware, by some >> code overwriting random memory locations ("rogue" driver) or by a "simpler" bug >> like use after free. > >> I am always inclined to suspect the hardware first. > >> You can try to reproduce the problem with some additional checks enabled in the >> kernel. Those should catch the problem earlier and thus make its source clearer. > >> I recommend the following: >> options INVARIANTS >> options INVARIANT_SUPPORT >> options WITNESS >> options DEBUG_MEMGUARD >> makeoptions DEBUG+="-DDEBUG" > >> The last is really needed only for the ZFS and OpenSolaris compat code. It make >> result in some extra noise from unrelated subsystems. >> Perhaps you could just add "#define DEBUG" to >> sys/cddl/contrib/opensolaris/uts/common/sys/debug.h. I haven't tested this >> approach though. > >> Also, please put vm.memguard.desc="arc_buf_hdr_t" into loader.conf. > >> Please note that these options will make your system significantly slower. > > I recompiled the kernel and is running with options you specified (I > enabled DEBUG in the file). > > Anyway even at boot time I started getting following warnings, is this > anything: These witness warning are OK-ish. Watch for panics. BTW, I should have said this earlier. Whatever the kind of the corruptions it would be much worse if a corruption would get propagated to the stable storage. Especially if it would be in any kind of pool metadata. So, your data is at great risk now. Please also take measures to back it up. Preferably by using a different system. > Dec 24 16:06:03 chinatsu kernel: Creating and/or trimming log files > Dec 24 16:06:03 chinatsu kernel: lock order reversal: > Dec 24 16:06:03 chinatsu kernel: 1st 0xffffffff80bf5780 pf task mtx (pf task mtx) @ /usr/src/sys/contrib/pf/net/pf.c:3330 > Dec 24 16:06:03 chinatsu kernel: . > Dec 24 16:06:03 chinatsu kernel: 2nd 0xfffffe0009211af8 radix node head (radix node head) @ /usr/src/sys/net/route.c:384 > Dec 24 16:06:03 chinatsu kernel: KDB: stack backtrace: > Dec 24 16:06:03 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Dec 24 16:06:03 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 > Dec 24 16:06:03 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c > Dec 24 16:06:03 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 > Dec 24 16:06:03 chinatsu kernel: _rw_rlock() at > Dec 24 16:06:03 chinatsu kernel: Starting syslogd. > Dec 24 16:06:03 chinatsu kernel: _rw_rlock+0x81 > Dec 24 16:06:03 chinatsu kernel: rtalloc1_fib() at rtalloc1_fib+0x11c > Dec 24 16:06:03 chinatsu kernel: rtalloc_ign_fib() at rtalloc_ign_fib+0xc5 > Dec 24 16:06:03 chinatsu kernel: pf_routable() at pf_routable+0x1fd > Dec 24 16:06:03 chinatsu kernel: pf_test_rule() at pf_test_rule+0x6cf > Dec 24 16:06:03 chinatsu kernel: pf_test() at pf_test+0xf58 > Dec 24 16:06:03 chinatsu kernel: pf_check_in() at pf_check_in+0x2b > Dec 24 16:06:03 chinatsu kernel: pfil_run_hooks() at pfil_run_hooks+0xd2 > Dec 24 16:06:03 chinatsu kernel: ip_input() at ip_input+0x2dc > Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 > Dec 24 16:06:03 chinatsu kernel: ether_demux() at ether_demux+0x17d > Dec 24 16:06:03 chinatsu kernel: ether_nh_input() at ether_nh_input+0x209 > Dec 24 16:06:03 chinatsu kernel: netisr_dispatch_src() at netisr_dispatch_src+0x170 > Dec 24 16:06:03 chinatsu kernel: alc_int_task() at alc_int_task+0x2ff > Dec 24 16:06:03 chinatsu kernel: taskqueue_run_locked() at taskqueue_run_locked+0x93 > Dec 24 16:06:03 chinatsu kernel: taskqueue_thread_loop() at taskqueue_thread_loop+0x3e > Dec 24 16:06:03 chinatsu kernel: fork_exit() at fork_exit+0x133 > Dec 24 16:06:03 chinatsu kernel: fork_trampoline() at fork_trampoline+0xe > Dec 24 16:06:03 chinatsu kernel: --- trap 0, rip = 0, rsp = 0xffffff85fb2ebbb0, rbp = 0 --- > Dec 24 16:06:03 chinatsu kernel: No core dumps found. > Dec 24 16:06:04 chinatsu kernel: lock order reversal: > Dec 24 16:06:04 chinatsu kernel: 1st 0xffffff85b9cb8dd8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2677 > Dec 24 16:06:04 chinatsu kernel: 2nd 0xfffffe00092c5c00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 > Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace: > Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 > Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c > Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 > Dec 24 16:06:04 chinatsu kernel: _sx_xlock() at _sx_xlock+0x61 > Dec 24 16:06:04 chinatsu kernel: ufsdirhash_acquire() at ufsdirhash_acquire+0x33 > Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove() at > Dec 24 16:06:04 chinatsu kernel: ufsdirhash_remove+0x16 > Dec 24 16:06:04 chinatsu kernel: ufs_dirremove() at ufs_dirremove+0x1bb > Dec 24 16:06:04 chinatsu kernel: ufs_remove() at ufs_remove+0x92 > Dec 24 16:06:04 chinatsu kernel: VOP_REMOVE_APV() at VOP_REMOVE_APV+0xb7 > Dec 24 16:06:04 chinatsu kernel: kern_unlinkat() at kern_unlinkat+0x2eb > Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e > Dec 24 16:06:04 chinatsu kernel: Xfast_syscall() at Xfast_syscall+0xf7 > Dec 24 16:06:04 chinatsu kernel: --- syscall (10, FreeBSD ELF64, sys_unlink), rip = 0x80090a22c, rsp = 0x7fffffff > Dec 24 16:06:04 chinatsu kernel: ca88, rbp = 0x7fffffffdf20 --- > Dec 24 16:06:04 chinatsu kernel: lock order reversal: > Dec 24 16:06:04 chinatsu kernel: 1st 0xfffffe00266ddbd8 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:849 > Dec 24 16:06:04 chinatsu kernel: 2nd 0xfffffe002679a818 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2158 > Dec 24 16:06:04 chinatsu kernel: KDB: stack backtrace: > Dec 24 16:06:04 chinatsu kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > Dec 24 16:06:04 chinatsu kernel: kdb_backtrace() at kdb_backtrace+0x37 > Dec 24 16:06:04 chinatsu kernel: _witness_debugger() at _witness_debugger+0x2c > Dec 24 16:06:04 chinatsu kernel: witness_checkorder() at witness_checkorder+0x844 > Dec 24 16:06:04 chinatsu kernel: __lockmgr_args() at __lockmgr_args+0x10d9 > Dec 24 16:06:04 chinatsu kernel: vop_stdlock() at vop_stdlock+0x39 > Dec 24 16:06:04 chinatsu kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xbf > Dec 24 16:06:04 chinatsu kernel: _vn_lock() at _vn_lock+0x47 > Dec 24 16:06:04 chinatsu kernel: vget() at vget+0x7b > Dec 24 16:06:04 chinatsu kernel: devfs_allocv() at devfs_allocv+0x13f > Dec 24 16:06:04 chinatsu kernel: devfs_root() at devfs_root+0x4d > Dec 24 16:06:04 chinatsu kernel: vfs_donmount() at vfs_donmount+0xafa > Dec 24 16:06:04 chinatsu kernel: sys_nmount() at sys_nmount+0x66 > Dec 24 16:06:04 chinatsu kernel: amd64_syscall() at amd64_syscall+0x30e > Dec 24 16:06:04 chinatsu kernel: Xfast_syscall() at Xfast_syscall+0xf7 > Dec 24 16:06:04 chinatsu kernel: --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x800a8d71c, rsp = 0x7fffffffccc8, rbp = 0x801009048 --- > Dec 24 16:06:05 chinatsu named[1387]: starting BIND 9.8.3-P4 -t /var/named -u bind > Dec 24 16:06:05 chinatsu kernel: Starting named. > > > -- Andriy Gapon