Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Aug 2012 16:11:32 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Roger Hammerstein <cheeky.m@live.com>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: panic while zfs scrubbing
Message-ID:  <5034DA84.8050507@FreeBSD.org>
In-Reply-To: <BAY170-W8668C02B4DAF69B54EE657F9B80@phx.gbl>
References:  <BAY170-W8668C02B4DAF69B54EE657F9B80@phx.gbl>

next in thread | previous in thread | raw e-mail | index | archive | help
on 22/08/2012 00:09 Roger Hammerstein said the following:
> 
> 
> I have a zpool where scrub seems to cause panics.
> 
> I do not have zfs in rc.conf, but import manually
> on boot.
> 
> I start a scrub on a zpool, and some time through will get a panic
> and reboot.
> After panic and reboot, re-importing the pool and allowing
> the scrub to restart on its own will cause another panic.
> So I import and immediately stop the scrub for now.
> 
> ls -la *.{9,8,10}
> -rw-------  1 root  wheel      150744 Aug 21 16:46 core.txt.10
> -rw-------  1 root  wheel      147280 Aug 21 11:04 core.txt.8
> -rw-------  1 root  wheel      148572 Aug 21 14:53 core.txt.9
> -rw-------  1 root  wheel         457 Aug 21 16:45 info.10
> -rw-------  1 root  wheel         456 Aug 21 11:04 info.8
> -rw-------  1 root  wheel         458 Aug 21 14:52 info.9
> -rw-------  1 root  wheel   643919872 Aug 21 16:46 vmcore.10
> -rw-------  1 root  wheel   767168512 Aug 21 11:04 vmcore.8
> -rw-------  1 root  wheel  1097850880 Aug 21 14:53 vmcore.9
> 
> 
>  9.1-BETA1 FreeBSD 9.1-BETA1 #34: Thu Jul 12 05:57:44 EDT 2012
> amd64
> 4GB of ram, 4gb of swap.
> 
> 
> panic: integer divide fault
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 5; apic id = 05
> instruction pointer     = 0x20:0xffffffff81674a14
> stack pointer           = 0x28:0xffffff810c3d4520
> frame pointer           = 0x28:0xffffff810c3d4540
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 9480 (txg_thread_enter)
> trap number             = 18
> panic: integer divide fault
> cpuid = 5
> 
> KDB: stack backtrace:
> #0 0xffffffff80920346 at kdb_backtrace+0x66
> #1 0xffffffff808ea35e at panic+0x1ce
> #2 0xffffffff80bd7a30 at trap_fatal+0x290
> #3 0xffffffff80bd80c5 at trap+0x105
> #4 0xffffffff80bc295f at calltrap+0x8
> #5 0xffffffff816818cf at vdev_mirror_io_start+0x2bf
> #6 0xffffffff81699542 at zio_vdev_io_start+0x232
> #7 0xffffffff81698fe3 at zio_execute+0xc3
> #8 0xffffffff8165ea1c at dsl_scan_scrub_cb+0x3ec
> #9 0xffffffff8165fe14 at dsl_scan_visitbp+0x534
> #10 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> #11 0xffffffff81660c84 at dsl_scan_visitdnode+0x84
> #12 0xffffffff81660070 at dsl_scan_visitbp+0x790
> #13 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> #14 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> #15 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> #16 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> #17 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
> Uptime: 1h51m55s
> Dumping 614 out of 3818 MB:..3%..11%..21%..32%..42%..53%..63%..71%..81%..92%
> 
> 
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> 224     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> #1  0xffffffff808e9e41 in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:448
> #2  0xffffffff808ea337 in panic (fmt=0x1 <Address 0x1 out of bounds>)
>     at /usr/src/sys/kern/kern_shutdown.c:636
> #3  0xffffffff80bd7a30 in trap_fatal (frame=0x12, eva=Variable "eva" is not available.
> )
>     at /usr/src/sys/amd64/amd64/trap.c:857
> #4  0xffffffff80bd80c5 in trap (frame=0xffffff810c3d4470)
>     at /usr/src/sys/amd64/amd64/trap.c:599
> #5  0xffffffff80bc295f in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:228
> #6  0xffffffff81674a14 in spa_get_random (range=0)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:1165

Not sure what triggers this problem but it looks like zio is issued for a
block-pointer with no valid DVA.  It's either a result of some logical bug in ZFS
code or some severe on-disk corruption.

> #7  0xffffffff816818cf in vdev_mirror_io_start (zio=0xfffffe0037e5e000)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:89

Could you please print *zio and *zio->io_bp in this frame?
It might also be good idea to report this issue to zfs-discuss@opensolaris.org.

> #8  0xffffffff81699542 in zio_vdev_io_start (zio=0xfffffe0037e5e000)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2305
> #9  0xffffffff81698fe3 in zio_execute (zio=0xfffffe0037e5e000)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1196
> #10 0xffffffff8165ea1c in dsl_scan_scrub_cb (dp=0xffffff810c3d4538, 
>     bp=0xffffff8003c53480, zb=0xffffff810c3d4970)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1737

And *bp and *scn here too.

> #11 0xffffffff8165fe14 in dsl_scan_visitbp (bp=0xffffff8003c53480, 
>     zb=0xffffff810c3d4970, dnp=0xffffff8003642200, pbuf=Variable "pbuf" is not available.
> )
>   at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:858
> #12 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff8003642240, 
>     zb=0xffffff810c3d4a00, dnp=0xffffff8003642200, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #13 0xffffffff81660c84 in dsl_scan_visitdnode (scn=0xfffffe001523dc00, 
>     ds=0xfffffe0037abf400, ostype=DMU_OST_ZFS, dnp=0xffffff8003642200, 
>     buf=0xfffffe00befda9c0, object=291417, tx=0xfffffe00151fc400)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:770
> #14 0xffffffff81660070 in dsl_scan_visitbp (bp=0xffffff800359b900, 
>     zb=0xffffff810c3d4cb0, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:718
> #15 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033e5380, 
>     zb=0xffffff810c3d4e10, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #16 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033df000, 
>     zb=0xffffff810c3d4f70, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #17 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033db000, 
>     zb=0xffffff810c3d50d0, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #18 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff8003451000, 
>     zb=0xffffff810c3d5230, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #19 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xffffff80033d7000, 
>     zb=0xffffff810c3d5390, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #20 0xffffffff8165fd99 in dsl_scan_visitbp (bp=0xfffffe0008076040, 
>     zb=0xffffff810c3d5420, dnp=0xfffffe0008076000, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:684
> #21 0xffffffff81660c84 in dsl_scan_visitdnode (scn=0xfffffe001523dc00, 
>     ds=0xfffffe0037abf400, ostype=DMU_OST_ZFS, dnp=0xfffffe0008076000, 
>     buf=0xfffffe00375996e8, object=0, tx=0xfffffe00151fc400)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:770
> #22 0xffffffff8165ff9a in dsl_scan_visitbp (bp=0xfffffe003729e280, 
>     zb=0xffffff810c3d55f0, dnp=0x0, pbuf=Variable "pbuf" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:736
> #23 0xffffffff816600d7 in dsl_scan_visit_rootbp (scn=Variable "scn" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:872
> #24 0xffffffff81660172 in dsl_scan_visitds (scn=0xfffffe001523dc00, dsobj=21, 
>     tx=0xfffffe00151fc400)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1099
> #25 0xffffffff81660695 in dsl_scan_sync (dp=0xfffffe0037335000, 
>     tx=0xfffffe00151fc400)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_scan.c:1355
> #26 0xffffffff81667e30 in spa_sync (spa=0xfffffe0008161000, txg=97010)
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:5711
> #27 0xffffffff81678749 in txg_sync_thread (arg=Variable "arg" is not available.
> )
>     at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:423
> #28 0xffffffff808bb4cf in fork_exit (
>     callout=0xffffffff81678610 <txg_sync_thread>, arg=0xfffffe0037335000, 
>     frame=0xffffff810c3d5c40) at /usr/src/sys/kern/kern_fork.c:992
> #29 0xffffffff80bc2e8e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:602

[snip]

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5034DA84.8050507>