Date: Sat, 12 Jul 2008 18:37:11 +1000 From: Duncan Young <duncan.young@pobox.com> To: d@delphij.net Cc: freebsd-current@freebsd.org Subject: Re: Boot from ZFS Message-ID: <200807121837.11812.duncan.young@pobox.com> In-Reply-To: <48780181.2080905@delphij.net> References: <4877A343.2010602@ibctech.ca> <200807121043.10473.duncan.young@pobox.com> <48780181.2080905@delphij.net>
next in thread | previous in thread | raw e-mail | index | archive | help
I did use whole disks (the root disks (mirrored) are sliced into boot, swap, and root. zpool status pool: big state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM big ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad4 ONLINE 0 0 0 ad6 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 errors: No known data errors pool: rootzfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rootzfs ONLINE 0 0 0 mirror ONLINE 0 0 0 label/sysdisk_A ONLINE 0 0 0 label/sysdisk_B ONLINE 0 0 0 errors: No known data errors The zpool import -f didn't want to work. The problem with scrubbing is that the pool needs to be online first (I think). I couldn't import it, it just said that the metadata was corrupt. The frustrating thing was that the problem wasn't from the disks, but from the controller. Upon reboot all the disks were OK, but the meta data wasn't. Kind of frustrating. If you're interested, from /var/log/messages: root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=469395789824 size=512 kernel: hptrr: start channel [0,2] kernel: hptrr: channel [0,2] started successfully kernel: hptrr: start channel [0,2] kernel: hptrr: channel [0,2] started successfully kernel: hptrr: start channel [0,0] kernel: hptrr: start channel [0,2] kernel: hptrr: channel [0,2] started successfully kernel: hptrr: channel [0,0] started successfully root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971378176 size=512 root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971382272 size=512 root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971412480 size=512 kernel: hptrr: start channel [0,2] kernel: hptrr: [0 2 ] failed to perform Soft Reset kernel: hptrr: [0,2,0] device disconnected on channel root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035899904 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797371392 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=91641628160 size=1536 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035900928 size=4608 error=22 <100 lines snipped> root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035913216 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035914240 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856965120 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856964608 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797383680 size=1536 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797384704 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797383680 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035915264 size=512 error=22 kernel: (da0:hptrr0:0:0:0): Synchronize cache failed, status == 0x39, scsi status == 0x0 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035914752 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035915776 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856965632 size=2560 error=22 <90 lines snipped> root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797380608 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797385216 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797382656 size=512 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797386240 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=455680 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=193536 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=500027814912 size=1024 error=22 root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=500028077056 size=1024 error=22 root: ZFS: vdev failure, zpool=big type=vdev.open_failed kernel: hptrr: start channel [0,0] kernel: hptrr: [0 0 ] failed to perform Soft Reset kernel: hptrr: [0,0,0] device disconnected on channel root: ZFS: vdev I/O failure, zpool=big path=/dev/da1 offset=56768110080 size=512 error=22 syslogd: kernel boot file is /boot/kernel/kernel kernel: Copyright (c) 1992-2008 The FreeBSD Project. kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 etc regards Duncan On Sat, 12 Jul 2008 10:57:37 am Xin LI wrote: > Duncan Young wrote: > | Be carefull, I've just had a 6 disk raidz array die. Complete > > failure which > > | required restore from backup (the controler card which had access to 4 > > of the > > | disks, lost one disk, then a second (at which point the machine > > paniced, Upon > > | reboot the raidz array was useless (Metadata corrupted)). I'm also > > getting > > | reasonably frequent machine lockups (panics) in the zfs code. I'm > > going to > > | start collecting crash dumps see if anyone can help in the next week > > or two. > > That's really unfortunate. Some sort of automated disk monitoring stuff > would be essential for RAID, this includes RAID-Z. Did you used the > whole disk dedicatedly for the pool, or (g)labeled before adding it into > the zpool? Did 'zpool import -f' help? > > | I guess what I'm trying to say is, that you can still lose everything > > on an > > | entire pool, so backups are still essential, an a couple of smaller > > pools is > > | probably preferable to one big pool (restore time is less). zfs is > > not %100 > > | (yet?). The lack of any type of fsck still causes me concern. > > It's always true that backup is always important if data is valuable :) > ~ The benefit having larger pool is that the administrator would have the > ability to use larger disk space in one ZFS file system (which can not > come cross zpool boundary), but it is recommended that when creating the > zpool, we use smaller raid-z groups, e.g. don't use 48 disks within one > raid-z group, a few disks (like 3-5) within one raid-z group would be fine. > > Regarding to fsck, 'zpool scrub' is pretty much like a fsck plus data > integration check. It would be, however, almost impossible to recover > data if zpool is completely corrupt according to some Sun sources, but > my experience with bad disks within raid-z did not turned me into a > unrecoverable state (yet). > > Cheers,
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807121837.11812.duncan.young>