Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Aug 2012 17:09:28 -0400
From:      Roger Hammerstein <cheeky.m@live.com>
To:        <freebsd-fs@freebsd.org>
Subject:   panic while zfs scrubbing
Message-ID:  <BAY170-W8668C02B4DAF69B54EE657F9B80@phx.gbl>

next in thread | raw e-mail | index | archive | help


I have a zpool where scrub seems to cause panics.

I do not have zfs in rc.conf=2C but import manually
on boot.

I start a scrub on a zpool=2C and some time through will get a panic
and reboot.
After panic and reboot=2C re-importing the pool and allowing
the scrub to restart on its own will cause another panic.
So I import and immediately stop the scrub for now.

ls -la *.{9=2C8=2C10}
-rw-------  1 root  wheel      150744 Aug 21 16:46 core.txt.10
-rw-------  1 root  wheel      147280 Aug 21 11:04 core.txt.8
-rw-------  1 root  wheel      148572 Aug 21 14:53 core.txt.9
-rw-------  1 root  wheel         457 Aug 21 16:45 info.10
-rw-------  1 root  wheel         456 Aug 21 11:04 info.8
-rw-------  1 root  wheel         458 Aug 21 14:52 info.9
-rw-------  1 root  wheel   643919872 Aug 21 16:46 vmcore.10
-rw-------  1 root  wheel   767168512 Aug 21 11:04 vmcore.8
-rw-------  1 root  wheel  1097850880 Aug 21 14:53 vmcore.9


 9.1-BETA1 FreeBSD 9.1-BETA1 #34: Thu Jul 12 05:57:44 EDT 2012
amd64
4GB of ram=2C 4gb of swap.


panic: integer divide fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation=2C Inc.
GDB is free software=2C covered by the GNU General Public License=2C and yo=
u are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 18: integer divide fault while in kernel mode
cpuid =3D 5=3B apic id =3D 05
instruction pointer     =3D 0x20:0xffffffff81674a14
stack pointer           =3D 0x28:0xffffff810c3d4520
frame pointer           =3D 0x28:0xffffff810c3d4540
code segment            =3D base 0x0=2C limit 0xfffff=2C type 0x1b
                        =3D DPL 0=2C pres 1=2C long 1=2C def32 0=2C gran 1
processor eflags        =3D interrupt enabled=2C resume=2C IOPL =3D 0
current process         =3D 9480 (txg_thread_enter)
trap number             =3D 18
panic: integer divide fault
cpuid =3D 5

KDB: stack backtrace:
#0 0xffffffff80920346 at kdb_backtrace+0x66
#1 0xffffffff808ea35e at panic+0x1ce
#2 0xffffffff80bd7a30 at trap_fatal+0x290
#3 0xffffffff80bd80c5 at trap+0x105
#4 0xffffffff80bc295f at calltrap+0x8
#5 0xffffffff816818cf at vdev_mirror_io_start+0x2bf
#6 0xffffffff81699542 at zio_vdev_io_start+0x232
#7 0xffffffff81698fe3 at zio_execute+0xc3
#8 0xffffffff8165ea1c at dsl_scan_scrub_cb+0x3ec
#9 0xffffffff8165fe14 at dsl_scan_visitbp+0x534
#10 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#11 0xffffffff81660c84 at dsl_scan_visitdnode+0x84
#12 0xffffffff81660070 at dsl_scan_visitbp+0x790
#13 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#14 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#15 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#16 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
#17 0xffffffff8165fd99 at dsl_scan_visitbp+0x4b9
Uptime: 1h51m55s
Dumping 614 out of 3818 MB:..3%..11%..21%..32%..42%..53%..63%..71%..81%..92=
%


Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kerne=
l/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /bo=
ot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump (textdump=3DVariable "textdump" is not available.
) at pcpu.h:224
224     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=3DVariable "textdump" is not available.
) at pcpu.h:224
#1  0xffffffff808e9e41 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:448
#2  0xffffffff808ea337 in panic (fmt=3D0x1 <Address 0x1 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:636
#3  0xffffffff80bd7a30 in trap_fatal (frame=3D0x12=2C eva=3DVariable "eva" =
is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:857
#4  0xffffffff80bd80c5 in trap (frame=3D0xffffff810c3d4470)
    at /usr/src/sys/amd64/amd64/trap.c:599
#5  0xffffffff80bc295f in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#6  0xffffffff81674a14 in spa_get_random (range=3D0)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/spa_misc.c:1165
#7  0xffffffff816818cf in vdev_mirror_io_start (zio=3D0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/vdev_mirror.c:89
#8  0xffffffff81699542 in zio_vdev_io_start (zio=3D0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/zio.c:2305
#9  0xffffffff81698fe3 in zio_execute (zio=3D0xfffffe0037e5e000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/zio.c:1196
#10 0xffffffff8165ea1c in dsl_scan_scrub_cb (dp=3D0xffffff810c3d4538=2C=20
    bp=3D0xffffff8003c53480=2C zb=3D0xffffff810c3d4970)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:1737
#11 0xffffffff8165fe14 in dsl_scan_visitbp (bp=3D0xffffff8003c53480=2C=20
    zb=3D0xffffff810c3d4970=2C dnp=3D0xffffff8003642200=2C pbuf=3DVariable =
"pbuf" is not available.
)
  at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/=
zfs/dsl_scan.c:858
#12 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff8003642240=2C=20
    zb=3D0xffffff810c3d4a00=2C dnp=3D0xffffff8003642200=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#13 0xffffffff81660c84 in dsl_scan_visitdnode (scn=3D0xfffffe001523dc00=2C=
=20
    ds=3D0xfffffe0037abf400=2C ostype=3DDMU_OST_ZFS=2C dnp=3D0xffffff800364=
2200=2C=20
    buf=3D0xfffffe00befda9c0=2C object=3D291417=2C tx=3D0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:770
#14 0xffffffff81660070 in dsl_scan_visitbp (bp=3D0xffffff800359b900=2C=20
    zb=3D0xffffff810c3d4cb0=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:718
#15 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff80033e5380=2C=20
    zb=3D0xffffff810c3d4e10=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#16 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff80033df000=2C=20
    zb=3D0xffffff810c3d4f70=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#17 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff80033db000=2C=20
    zb=3D0xffffff810c3d50d0=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#18 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff8003451000=2C=20
    zb=3D0xffffff810c3d5230=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#19 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xffffff80033d7000=2C=20
    zb=3D0xffffff810c3d5390=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#20 0xffffffff8165fd99 in dsl_scan_visitbp (bp=3D0xfffffe0008076040=2C=20
    zb=3D0xffffff810c3d5420=2C dnp=3D0xfffffe0008076000=2C pbuf=3DVariable =
"pbuf" is not available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:684
#21 0xffffffff81660c84 in dsl_scan_visitdnode (scn=3D0xfffffe001523dc00=2C=
=20
    ds=3D0xfffffe0037abf400=2C ostype=3DDMU_OST_ZFS=2C dnp=3D0xfffffe000807=
6000=2C=20
    buf=3D0xfffffe00375996e8=2C object=3D0=2C tx=3D0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:770
#22 0xffffffff8165ff9a in dsl_scan_visitbp (bp=3D0xfffffe003729e280=2C=20
    zb=3D0xffffff810c3d55f0=2C dnp=3D0x0=2C pbuf=3DVariable "pbuf" is not a=
vailable.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:736
#23 0xffffffff816600d7 in dsl_scan_visit_rootbp (scn=3DVariable "scn" is no=
t available.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:872
#24 0xffffffff81660172 in dsl_scan_visitds (scn=3D0xfffffe001523dc00=2C dso=
bj=3D21=2C=20
    tx=3D0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:1099
#25 0xffffffff81660695 in dsl_scan_sync (dp=3D0xfffffe0037335000=2C=20
    tx=3D0xfffffe00151fc400)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/dsl_scan.c:1355
#26 0xffffffff81667e30 in spa_sync (spa=3D0xfffffe0008161000=2C txg=3D97010=
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/spa.c:5711
#27 0xffffffff81678749 in txg_sync_thread (arg=3DVariable "arg" is not avai=
lable.
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/f=
s/zfs/txg.c:423
#28 0xffffffff808bb4cf in fork_exit (
    callout=3D0xffffffff81678610 <txg_sync_thread>=2C arg=3D0xfffffe0037335=
000=2C=20
    frame=3D0xffffff810c3d5c40) at /usr/src/sys/kern/kern_fork.c:992
#29 0xffffffff80bc2e8e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000001 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000005 in ?? ()
#55 0xffffffff81242b00 in tdq_cpu ()
#56 0xfffffe0015e9d470 in ?? ()
#57 0x0000000000000000 in ?? ()
#58 0xffffff810c3d4580 in ?? ()
#59 0xffffff810c3d4528 in ?? ()
#60 0xfffffe00028848e0 in ?? ()
#61 0xffffffff80912fce in sched_switch (td=3D0xfffffe00370b1470=2C=20
    newtd=3D0xfffffe0037335000=2C flags=3DVariable "flags" is not available=
.
) at /usr/src/sys/kern/sched_ule.c:1921
Previous frame inner to this frame (corrupt stack?)
(kgdb)=20



 pool: zzzz
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub canceled on Tue Aug 21 16:53:03 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zzzz      ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada5    ONLINE       0     0     0

errors: 4 data errors=2C use '-v' for a list

The data errors will go away if the scrub completes=3B it has shown that be=
fore.

And yes=2C here: 'zpool clear zzzz'

  pool: zzzz
 state: ONLINE
  scan: scrub canceled on Tue Aug 21 17:02:53 2012
config:

    NAME        STATE     READ WRITE CKSUM
    zzzz      ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        ada3    ONLINE       0     0     0
        ada7    ONLINE       0     0     0
        ada6    ONLINE       0     0     0
        ada9    ONLINE       0     0     0
        ada4    ONLINE       0     0     0
        ada2    ONLINE       0     0     0
        ada5    ONLINE       0     0     0

errors: No known data errors



The machine passes 'memtest' memory check of over 12 hours.
Bad disk ? One of the disks has command errors=2C but no pending
sectors to reallocate in smartctl output=2C and there are no disk
errors in /var/log/messages. =20

Two sata port multipliers.
pmp0 at siisch0 bus 0 scbus6 target 15 lun 0
pmp0: <Port Multiplier 37261095 1706> ATA-0 device
pmp0: 300.000MB/s transfers (SATA 2.x=2C NONE=2C PIO 8192bytes)
pmp0: 5 fan-out ports

pmp1 at siisch4 bus 0 scbus10 target 15 lun 0
pmp1: <Port Multiplier 37261095 1706> ATA-0 device
pmp1: 300.000MB/s transfers (SATA 2.x=2C NONE=2C PIO 8192bytes)
pmp1: 5 fan-out ports





 		 	   		  =



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BAY170-W8668C02B4DAF69B54EE657F9B80>