Date: Thu, 26 Aug 2010 14:08:29 -0400 From: Adam Stylinski <kungfujesus06@gmail.com> To: freebsd-fs@freebsd.org Subject: Problem with zpool Message-ID: <AANLkTimJSw30SSnt%2B49%2BpfqgZxrtoW6yYpgr7WvFo-qB@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Ok so I was sending snapshots into the pool and I may have sent one that was bad because I got a "bad magic number" error at some point. At the time I figured it was my disk (ad6) which was going bad. It would not allow me no matter what I to offline this ad6 device, it claimed insufficient replicas existed. So I removed the device and put a device on the same port on the same controller to replace it. I then ran the replace command only to have it sit there pretending to replace but zpool would sit on top with status g_wait. I googled around and found a guy on one of the mailing lists with a similar bug, they said it was fixed in a revision to zfs (can't remember which mfc) but upgrading to fbsd 8.1 would fix the problem. This is a v13 pool, and now that I've upgraded to 8.1, I'm running a scrub which forced the resilver and it's claiming to "replace". Well as it turns out I had a cronjob for freebsd-update cron which happened to pop in for 8.0 some time during freebsd-update. So the result was I was on an 8.1 kernel but an 8.0 userland, which did allow me to scrub (I guess the ABIs didn't break this update between zpool and libzpool). It finished the scrub, but it still claims this: 8991447011275450347 UNAVAIL 0 10.0K 0 was /dev/ad6/old ada2 ONLINE 0 0 0 14.1G resilvered Meaning it's trying to write to the old device which it won't let me offline, while still resilvering to ada2. While it did tell me that it resilvered successfully with no checksum errors or read or write errors, it's still there. I am now scrubbing with 8.1's userland and kernel, do you guys think it will finally allow me to remove the device it's replacing? Also, it claims that there is corruption in the pool, but the files that are "affected" are 100% fine, as I've md5'd them against remote copies. I realize the metadata could be bad, so I'm not sure how to go about fixing that. Anyway, please don't tl;dr and let me know whatever advice you can give. I have some moderate level backups of some of the data (the entire pool has 2.7TB occupied), but I'd like to avoid a destroy and create process. So far this is what zpool status reports: [adam@nasbox ~]$ zpool status pool: share state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h9m, 2.16% done, 6h53m to go config: NAME STATE READ WRITE CKSUM share DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ada1 ONLINE 0 0 0 60.9M resilvered ada3 ONLINE 0 0 0 60.9M resilvered replacing DEGRADED 0 0 0 8991447011275450347 UNAVAIL 0 10.0K 0 was /dev/ad6/old ada2 ONLINE 0 0 0 14.1G resilvered ada4 ONLINE 0 0 0 56.7M resilvered raidz1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da1 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 aacd0 ONLINE 0 0 0 aacd1 ONLINE 0 0 0 aacd2 ONLINE 0 0 0 aacd3 ONLINE 0 0 0 logs DEGRADED 0 0 0 ada0 ONLINE 0 0 0
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimJSw30SSnt%2B49%2BpfqgZxrtoW6yYpgr7WvFo-qB>