Date: Sun, 24 May 2009 21:33:53 +0200 From: Thomas Backman <serenity@exscape.org> To: freebsd-current@freebsd.org Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted) Message-ID: <4FE794E9-075D-4563-B395-BD5E459937DF@exscape.org> In-Reply-To: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org>
index | next in thread | previous in thread | raw e-mail
On May 24, 2009, at 09:02 PM, Thomas Backman wrote:
> So, I was playing around with RAID-Z and self-healing, when I
> decided to take it another step and corrupt the data on *two* disks
> (well, files via ggate) and see what happened. I obviously expected
> the pool to go offline, but I didn't expect a kernel panic to follow!
>
> What I did was something resembling:
> 1) create three 100MB files, ggatel create to create GEOM providers
> from them
> 2) zpool create test raidz ggate{1..3}
> 3) create a 100MB file inside the pool, md5 the file
> 4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/
> dev/random of=./disk2 bs=1000k count=20 skip=40, or so (I now know
> that I wanted *seek*, not *skip*, but it still shouldn't panic!)
> 5) Check if the md5 of file: everything OK, zpool status shows a
> degraded pool.
> 6) Repeat step #4, but with disk 3.
> 7) zpool scrub test
> 8) Panic!
> [...]
FWIW, I couldn't replicate this when using seek (i.e. corrupt the
middle of the "disk" rather than the beginning):
[root@clone ~/zfscrash]# zpool status test
pool: test
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub in progress for 0h0m, 7.72% done, 0h6m to go
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 18
raidz1 ONLINE 0 0 161
ggate0 ONLINE 0 0 0 512 repaired ## note that I
did *not* touch this "disk" at all, so why "512 repaired"?
ggate1 ONLINE 0 0 702 73K repaired
ggate2 ONLINE 0 0 62 64.5K repaired
errors: 9 data errors, use '-v' for a list
After overwriting the *beginning* of disk2 and disk3 as well, "zpool
scrub" appears to hang. Two vdev failures on the console, and zpool
status hangs as well. No panic this time around (I've waited 5 minutes
and nothing appears to happen, but the computer is usable on other
ttys). The failmode property was set to the default, i.e. wait, in
both cases.
Regards,
Thomas
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE794E9-075D-4563-B395-BD5E459937DF>
