From owner-freebsd-stable@FreeBSD.ORG Fri Nov 21 10:28:52 2014 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D9FD13ED; Fri, 21 Nov 2014 10:28:52 +0000 (UTC) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6CD78AB2; Fri, 21 Nov 2014 10:28:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id sALASM12063563; Fri, 21 Nov 2014 13:28:22 +0300 (MSK) (envelope-from marck@rinet.ru) Date: Fri, 21 Nov 2014 13:28:22 +0300 (MSK) From: Dmitry Morozovsky To: Steven Hartland Subject: Re: ZFS panic: [Re: stable/10 panic under disk load] In-Reply-To: Message-ID: References: <-7425247475772590723@unknownmsgid> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Fri, 21 Nov 2014 13:28:22 +0300 (MSK) Cc: "stable@freebsd.org" , "smh@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 10:28:52 -0000 Steven, colleagues, any news on this? I now have a bunch of cores, most from my own experiments, but also from daily find, all with similar sympthoms. On Tue, 18 Nov 2014, Dmitry Morozovsky wrote: > On Tue, 18 Nov 2014, Dmitry Morozovsky wrote: > > > On Tue, 18 Nov 2014, Steven Hartland wrote: > > > > > Can u plug a usb drive in to get a dump? > > > > Hm, will it work over USB stack? I can try this. > > > > BTW: it seems some internal ZFS locking trouble exists, as trere are 3 cases: > > > > pool/R/fs1 mounted as /fs1 > > pool/R/fs2 > > pool/R/fs3 > > > > tar cf - /fs1 >/dev/null works ok > > tar cf - /fs2 >/dev/null works ok > > rsync -avHP /fs1/ /fs2/ panics in few minutes > > > > will try to configure dump to USB SATA > > wow, it works ;) > > not on the first trial, but anyway, here we go: > > #0 doadump (textdump=1621911824) at pcpu.h:219 > #1 0xffffffff803471d5 in db_fncall (dummy1=, > dummy2=, dummy3=, > dummy4=) at /usr/src/sys/ddb/db_command.c:568 > #2 0xffffffff80346ebd in db_command (cmd_table=0x0) at > /usr/src/sys/ddb/db_command.c:440 > #3 0xffffffff80346c34 in db_command_loop () at > /usr/src/sys/ddb/db_command.c:493 > #4 0xffffffff80349580 in db_trap (type=, code=0) at > /usr/src/sys/ddb/db_main.c:231 > #5 0xffffffff80940cd9 in kdb_trap (type=3, code=0, tf=) > at /usr/src/sys/kern/subr_kdb.c:656 > #6 0xffffffff80ce8ca3 in trap (frame=0xfffffe0860ac6d40) at > /usr/src/sys/amd64/amd64/trap.c:556 > #7 0xffffffff80ccf492 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:232 > #8 0xffffffff8094043e in kdb_enter (why=0xffffffff80f5b27c "panic", msg= optimized out>) at cpufunc.h:63 > #9 0xffffffff80908f76 in vpanic (fmt=, ap= optimized out>) at /usr/src/sys/kern/kern_shutdown.c:752 > #10 0xffffffff80908fe3 in panic (fmt=0xffffffff8154f850 "\004") at > /usr/src/sys/kern/kern_shutdown.c:688 > #11 0xffffffff80b64502 in vm_fault_hold (map=, > vaddr=, fault_type=, > fault_flags=, m_hold=) at > /usr/src/sys/vm/vm_fault.c:341 > #12 0xffffffff80b62b87 in vm_fault (map=0xfffff80002000000, vaddr= optimized out>, fault_type=1 '\001', fault_flags=128) > at /usr/src/sys/vm/vm_fault.c:281 > #13 0xffffffff80ce9551 in trap_pfault (frame=0xfffffe0860ac7400, usermode=0) at > /usr/src/sys/amd64/amd64/trap.c:752 > #14 0xffffffff80ce8cba in trap (frame=0xfffffe0860ac7400) at > /usr/src/sys/amd64/amd64/trap.c:440 > #15 0xffffffff80ccf492 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:232 > #16 0xffffffff81b69d04 in zap_leaf_lookup_closest (l=0xfffff801bd6ec880, > h=1441072784640835584, cd=1, zeh=0xfffffe0860ac7518) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:466 > #17 0xffffffff81b688ee in fzap_cursor_retrieve (zap=0xfffff8001676ce80, > zc=0xfffffe0860ac77d8, za=0xfffffe0860ac76c0) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:1190 > #18 0xffffffff81b6dc97 in zap_cursor_retrieve (zc=0xfffffe0860ac77d8, > za=0xfffffe0860ac76c0) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1290 > #19 0xffffffff81ba8f16 in zfs_freebsd_readdir (ap=) > at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2565 > #20 0xffffffff80e03967 in VOP_READDIR_APV (vop=, a= optimized out>) at vnode_if.c:1821 > #21 0xffffffff809b1d1c in kern_getdirentries (td=0xfffff80025ff1490, fd= optimized out>, > buf=0x801428000
, count= out>, basep=0xfffffe0860ac7990, residp=0x0) > at vnode_if.h:758 > #22 0xffffffff809b1ad8 in sys_getdirentries (td=0xfffff801bd6ec880, > uap=0xfffffe0860ac7a40) at /usr/src/sys/kern/vfs_syscalls.c:4030 > #23 0xffffffff80ce9aca in amd64_syscall (td=0xfffff80025ff1490, traced=0) at > subr_syscall.c:134 > #24 0xffffffff80ccf77b in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:391 > #25 0x000000080091043a in ?? () > > #16 0xffffffff81b69d04 in zap_leaf_lookup_closest (l=0xfffff801bd6ec880, > h=1441072784640835584, cd=1, zeh=0xfffffe0860ac7518) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:466 > 466 if (HCD_GTEQ(le->le_hash, le->le_cd, h, cd) && > (kgdb) p *le->le_hash > No symbol "le" in current context. > (kgdb) p le > No symbol "le" in current context. > (kgdb) p h > $1 = 1441072784640835584 > (kgdb) p *h > Cannot access memory at address 0x13ffb81000000000 > (kgdb) p cd > $2 = 1 > > where now? > > > > > > > > > > > > > On 18 Nov 2014, at 16:57, Dmitry Morozovsky wrote: > > > > > > > >> On Tue, 18 Nov 2014, Dmitry Morozovsky wrote: > > > >> > > > >> my backup server after updrade to frest stable/10 > > > >> > > > >> start panicing on heavy disk load like rsync at > > > > > > > > Yes, it is reproducible easy and now I'm at ddb prompt with > > > > > > > > cpuid = 0 > > > > KDB: stack backtrace: > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0860864d60 > > > > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0860864e10 > > > > vpanic() at vpanic+0x126/frame 0xfffffe0860864e50 > > > > panic() at panic+0x43/frame 0xfffffe0860864eb0 > > > > vm_fault_hold() at vm_fault_hold+0x1932/frame 0xfffffe0860865100 > > > > vm_fault() at vm_fault+0x77/frame 0xfffffe0860865140 > > > > trap_pfault() at trap_pfault+0x201/frame 0xfffffe08608651e0 > > > > trap() at trap+0x47a/frame 0xfffffe08608653f0 > > > > calltrap() at calltrap+0x8/frame 0xfffffe08608653f0 > > > > --- trap 0xc, rip = 0xffffffff81b69d04, rsp = 0xfffffe08608654b0, rbp = > > > > 0xfffffe0860865500 --- > > > > zap_leaf_lookup_closest() at zap_leaf_lookup_closest+0xb4/frame > > > > 0xfffffe0860865500 > > > > fzap_cursor_retrieve() at fzap_cursor_retrieve+0x16e/frame 0xfffffe0860865570 > > > > zap_cursor_retrieve() at zap_cursor_retrieve+0x1f7/frame 0xfffffe0860865600 > > > > zfs_freebsd_readdir() at zfs_freebsd_readdir+0x426/frame 0xfffffe0860865840 > > > > VOP_READDIR_APV() at VOP_READDIR_APV+0xa7/frame 0xfffffe0860865870 > > > > kern_getdirentries() at kern_getdirentries+0x21c/frame 0xfffffe0860865970 > > > > sys_getdirentries() at sys_getdirentries+0x28/frame 0xfffffe08608659a0 > > > > amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe0860865ab0 > > > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0860865ab0 > > > > --- syscall (196, FreeBSD ELF64, sys_getdirentries), rip = 0x80091043a, rsp = > > > > 0x7fffffffb538, rbp = 0x7fffffffb560 --- > > > > KDB: enter: panic > > > > [ thread pid 1167 tid 100461 ] > > > > Stopped at kdb_enter+0x3e: movq $0,kdb_why > > > > db> > > > > > > > > Can I obtain somthing useful from here? I'm afraid it's not easy to attach > > > > additional disk for crash dumps to this server... > > > > > > > > > > > >> > > > >> FreeBSD whale.rinet.ru 10.1-STABLE FreeBSD 10.1-STABLE #195 r274646: Tue Nov 18 > > > >> 12:15:24 MSK 2014 > > > >> marck@castor.rinet.ru:/usr/obj/FreeBSD/pristine/src.10/sys/GENERIC amd64 > > > >> > > > >> > > > >> panic: vm_fault: fault on nofault entry, addr: fffffe001805b000 > > > >> cpuid = 0 > > > >> KDB: stack backtrace: > > > >> #0 0xffffffff80964fa0 at kdb_backtrace+0x60 > > > >> #1 0xffffffff8092a085 at panic+0x155 > > > >> #2 0xffffffff80ba168e at vm_fault_hold+0x1b6e > > > >> #3 0xffffffff80b9fad7 at vm_fault+0x77 > > > >> #4 0xffffffff80d2861c at trap_pfault+0x19c > > > >> #5 0xffffffff80d27dea at trap+0x47a > > > >> #6 0xffffffff80d0db92 at calltrap+0x8 > > > >> #7 0xffffffff819df8ee at fzap_cursor_retrieve+0x16e > > > >> #8 0xffffffff819e4c97 at zap_cursor_retrieve+0x1f7 > > > >> #9 0xffffffff81a1fed6 at zfs_freebsd_readdir+0x426 > > > >> #10 0xffffffff80e456b7 at VOP_READDIR_APV+0xa7 > > > >> #11 0xffffffff809d68cc at kern_getdirentries+0x21c > > > >> #12 0xffffffff809d6688 at sys_getdirentries+0x28 > > > >> #13 0xffffffff80d28da1 at amd64_syscall+0x351 > > > >> #14 0xffffffff80d0de7b at Xfast_syscall+0xfb > > > >> Uptime: 1m51s > > > >> > > > >> Unfortunately it's ZFS only, so I have no space to white panic dump. > > > >> > > > >> I'm now trying to rebuild kernel with debugger turned on, as luckily I have > > > >> working console@SOL... > > > >> > > > >> Any preliminary hints? > > > >> > > > >> > > > > > > > > -- > > > > Sincerely, > > > > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > > > > [ FreeBSD committer: marck@FreeBSD.org ] > > > > ------------------------------------------------------------------------ > > > > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > > > > ------------------------------------------------------------------------ > > > _______________________________________________ > > > freebsd-stable@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > > > > > > > -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------