Date: Wed, 6 Mar 2013 12:10:18 -0500 From: Nathaniel W Filardo <nwf@cs.jhu.edu> To: freebsd-fs@freebsd.org, rercola@acm.jhu.edu Subject: Cyclic permutations of "zpool replace" on raidz devices lead to corrupt data? Message-ID: <20130306171017.GP17094@gradx.cs.jhu.edu>
next in thread | raw e-mail | index | archive | help
--b5Ibld7S3Mj9Y6Fc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Greetings freebsd-fs, I had a zpool that looked like this: NAME STATE READ WRITE CKSUM tank0 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada1 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 ada2a ONLINE 0 0 0 ada4a ONLINE 0 0 0 cache ada2d ONLINE 0 0 0 ada4d ONLINE 0 0 0 and, in a fit of OCD, I decided to attach a spare disk on ata6 and use it to reorder the disks so that they were ada{0,1,3,5}. I had thought this would be painless, by running (and waiting for each resilver to complete) zpool replace tank0 ada5 ada6 zpool replace tank0 ada1 ada5 zpool replace tank0 ada0 ada1 zpool replace tank0 ada6 ada0 Nothing funny, just a cyclic permutation. I realize now that I should have run a "zpool scrub" between each pass, but I didn't, so, oops. (The last of these commands has run to completion, but never removed the replacing-0 node in the vdev tree; the pool is currently resilvering itself again after the panic reported later in this mail.) In any case, while I do not have exact numbers to report, the following symptoms occurred during this chain of events. "zpool replace tank0 ada5 ada6" seemed to run without problem. "zpool replace tank0 ada1 ada5" discovered 170-something checksum errors on ada6. "zpool replace tank0 ada0 ada1" discovered 35-ish checksum errors on ada5. "zpool replace tank0 ada6 ada0" discovered 9 checksum errors on ada1 and reported 8 checksum errors for the raidz1 vdev, including the corruption a file in my freebsd svn mirror. I then removed the svn mirror, which seemed to go off without a hitch, and started to rebuild it. Much later, having decided to wait on rebuilding the mirror, when shuffling files off of its host filesystem to another (from tank0/mirrors/freebsd to tank0/mirrors/misc, in prepraration for deleting the former, though this has not been done), I was met with panic: trap: fast data access mmu miss (kernel) cpuid = 0 KDB: stack backtrace: panic() at panic+0x290 trap() at trap+0x554 -- fast data access mmu miss tar=0 %o7=0xc09b8df4 -- userland() at ddt_phys_decref user trace: trap %o7=0xc09b8df4 pc 0xc0948e00, sp 0xf3a38b21 done Uptime: 76d13h22m31s Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... As a wild guess, this seems likely to be http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/050972.html in which a corrupt DDT yields a NULL pointer dereference when a DDT entry is not found. My suspicion (and it is just a guess at this point) is that somebody somewhere in the stack is holding on to the "old" zpool configuration across replace operations and issuing writes to the incorrect device(s). A bit about the machine, in case it matters: It's a Sun V240 running 9-CURRENT (git rev id 1b82c3b) with 16GB of RAM. All the devices in this pool are connected by mvs0, a "Marvell 88SX6081 SATA controller". There has been no prior indication of checksum errors on any of the devices, despite routine scrubbing every two weeks for as long as I can remember. The disks themselves are all WDC WD7500AADS-00L5B1; ada2 is an OCZ-VERTEX2 and ada4 is an OCZ-SOLID3. At no point during this (including across the panic reboot) did the disks ever lose power. A friend is helping me to test my hypothesis, but on Illumos (we do not have easy access to another FBSD machine with sufficient spare disks). We shall report our findings. Thoughts? Thanks in advance. --nwf; --b5Ibld7S3Mj9Y6Fc Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iEYEARECAAYFAlE3eHkACgkQTeQabvr9Tc8TLQCdFH5GtRwqF5J62AmZRqugcHvT J9wAni3HHkjuF7eO4f5/tkdZSFI0nNgn =fAzy -----END PGP SIGNATURE----- --b5Ibld7S3Mj9Y6Fc--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130306171017.GP17094>