From owner-freebsd-fs@FreeBSD.ORG Tue Aug 4 21:39:26 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C191210656DD for ; Tue, 4 Aug 2009 21:39:26 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: from acm.poly.edu (acm.poly.edu [128.238.9.200]) by mx1.freebsd.org (Postfix) with ESMTP id 868948FC19 for ; Tue, 4 Aug 2009 21:39:26 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: (qmail 72011 invoked from network); 4 Aug 2009 21:39:25 -0000 Received: from unknown (HELO ?192.168.0.137?) (spawk@128.238.9.199) by acm.poly.edu with AES256-SHA encrypted SMTP; 4 Aug 2009 21:39:25 -0000 Message-ID: <4A78AA71.9050107@acm.poly.edu> Date: Tue, 04 Aug 2009 17:38:57 -0400 From: Boris Kochergin User-Agent: Thunderbird 2.0.0.19 (X11/20090108) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: ZFS RAID-Z panic on vdev failure + subsequent panics and hangs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Aug 2009 21:39:28 -0000 Ahoy. I have a seven-disk RAID-Z pool in a 8-BETA2/amd64 machine. One of the disks (ad13) failed to write something today, and the system proceeded to panic. I couldn't get a dump or any otherwise useful information, but the panic made reference to "vdev_is_dead". Upon reboot, it panics again, probably when "zfs mount" is called by its rc.d script: Fatal trap 9: general protection fault while in kernel mode instruction pointer = 0x20:0xffffffff807cbdbb stack pointer = 0x28:0xffffff8077bf54c0 frame pointer = 0x28:0xffffff8077bf54d0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 82 (zfs) panic: from debugger Uptime: 13s Physical memory: 4081 MB Dumping 1245 MB: 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:223 #1 0xffffffff8058ff11 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0xffffffff805902eb in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:575 #3 0xffffffff801d9997 in db_panic (addr=Variable "addr" is not available. ) at /usr/src/sys/ddb/db_command.c:478 #4 0xffffffff801d9da1 in db_command (last_cmdp=0xffffffff80bd5120, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #5 0xffffffff801d9ff0 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #6 0xffffffff801dbf79 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #7 0xffffffff805bbd94 in kdb_trap (type=9, code=0, tf=Variable "tf" is not available. ) at /usr/src/sys/kern/subr_kdb.c:534 #8 0xffffffff8086dc5d in trap_fatal (frame=0xffffff8077bf5410, eva=0) at /usr/src/sys/amd64/amd64/trap.c:847 #9 0xffffffff8086e74d in trap (frame=0xffffff8077bf5410) at /usr/src/sys/amd64/amd64/trap.c:639 #10 0xffffffff80857403 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #11 0xffffffff807cbdbb in slab_alloc_item (zone=Variable "zone" is not available. ) at /usr/src/sys/vm/uma_core.c:2300 #12 0xffffffff807ce80e in zone_alloc_item (zone=0xffffff00dffae000, udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2475 #13 0xffffffff807cee03 in keg_alloc_slab (keg=0xffffff00dffad460, zone=0xffffff00dffac380, wait=259) at /usr/src/sys/vm/uma_core.c:826 #14 0xffffffff807cf177 in keg_fetch_slab (keg=0xffffff00dffad460, zone=0xffffff00dffac380, flags=259) at /usr/src/sys/vm/uma_core.c:2152 #15 0xffffffff807cf21e in zone_fetch_slab (zone=0xffffff00dffac380, keg=0xffffff00dffad460, flags=259) at /usr/src/sys/vm/uma_core.c:2212 #16 0xffffffff807d05eb in uma_zalloc_arg (zone=0xffffff00dffac380, udata=0x0, flags=259) at /usr/src/sys/vm/uma_core.c:2381 #17 0xffffffff8057e727 in malloc (size=Variable "size" is not available. ) at uma.h:305 #18 0xffffffff81060365 in metaslab_init (mg=0xffffff0004472980, smo=0xffffff8077bf5730, start=530428461056, size=2147483648, txg=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:294 #19 0xffffffff81071b3e in vdev_metaslab_init (vd=0xffffff0001ecf800, txg=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:796 #20 0xffffffff81071da5 in vdev_load (vd=0xffffff0001ecf800) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1531 #21 0xffffffff81071c75 in vdev_load (vd=0xffffff0001ed1800) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1526 #22 0xffffffff8106539c in spa_load (spa=0xffffff0001ff0000, config=Variable "config" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1361 #23 0xffffffff81064ee1 in spa_load (spa=0xffffff0001ff0000, config=Variable "config" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1189 #24 0xffffffff810658fd in spa_open_common (pool=Variable "pool" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1474 #25 0xffffffff81065a52 in spa_get_stats (name=0xffffff0001ff5000 "home", config=0xffffff8077bf59e0, altroot=0xffffff0001ff5400 "", buflen=1024) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1671 #26 0xffffffff81093e7c in zfs_ioc_pool_stats (zc=0xffffff0001ff5000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:914 #27 0xffffffff810941c4 in zfsdev_ioctl (dev=Variable "dev" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3022 #28 0xffffffff80511c76 in devfs_ioctl_f (fp=0xffffff0001f4bc80, com=3425196549, data=0xffffff0001ff5000, cred=Variable "cred" is not available. ) at /usr/src/sys/fs/devfs/devfs_vnops.c:659 #29 0xffffffff805cb166 in kern_ioctl (td=0xffffff0001f0c390, fd=3, com=3425196549, data=0xffffff0001ff5000 "home") at file.h:262 #30 0xffffffff805cb38e in ioctl (td=0xffffff0001f0c390, uap=0xffffff8077bf5bf0) at /usr/src/sys/kern/sys_generic.c:678 #31 0xffffffff8086e28f in syscall (frame=0xffffff8077bf5c80) at /usr/src/sys/amd64/amd64/trap.c:984 #32 0xffffffff808576e1 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:373 #33 0x0000000800fe1d0c in ?? () Booting the system without the disk causes any "zfs" or "zpool" commands to hang the system after a while. Breaking to DDB doesn't work using a keyboard and VGA (I don't have any other kind of gear here). In case it is relevant, the pool started life as version 6 and was upgraded using 7.2-STABLE shortly after the version 13 MFC. The output of "zdb" with all disks connected: home version=13 name='home' state=0 txg=16061492 pool_guid=14089219607492705674 hostid=413956888 hostname='unset' vdev_tree type='root' id=0 guid=14089219607492705674 children[0] type='raidz' id=0 guid=17899218839424019335 nparity=1 metaslab_array=14 metaslab_shift=31 ashift=9 asize=2800585539584 is_log=0 children[0] type='disk' id=0 guid=15839907043443901501 path='/dev/ad4' devid='ad:3QK08728' whole_disk=0 DTL=389 children[1] type='disk' id=1 guid=13623369126078337737 path='/dev/ad16' devid='ad:9QH04HJN' whole_disk=0 DTL=391 children[2] type='disk' id=2 guid=15619490422714555908 path='/dev/ad14' devid='ad:5NF1DDXR' whole_disk=0 DTL=390 children[3] type='disk' id=3 guid=6995275135550350664 path='/dev/ad15' devid='ad:9QG93JHX' whole_disk=0 DTL=386 children[4] type='disk' id=4 guid=10651992494569677081 path='/dev/ad13' devid='ad:9QH04GTY' whole_disk=0 DTL=388 children[5] type='disk' id=5 guid=10503557489947490214 path='/dev/ad18' devid='ad:5NF1DDVB' whole_disk=0 DTL=387 children[6] type='disk' id=6 guid=17574056058658811312 path='/dev/ad12' devid='ad:9QG90QA2' whole_disk=0 DTL=392 Can anyone help? I would be content to at least have access to the filesystem in degraded mode. -Boris