Date: Wed, 31 Oct 2012 17:25:09 -0000 From: "Steven Hartland" <steven@multiplay.co.uk> To: <freebsd-stable@freebsd.org>, <freebsd-fs@FreeBSD.ORG> Subject: ZFS corruption due to lack of space? Message-ID: <27087376D1C14132A3CC1B4016912F6D@multiplay.co.uk>
next in thread | raw e-mail | index | archive | help
Been running some tests on new hardware here to verify all is good. One of the tests was to fill the zfs array which seems like its totally corrupted the tank. The HW is 7 x 3TB disks in RAIDZ2 with dual 13GB ZIL partitions and dual 100GB L2ARC on Enterprise SSD's. All disks are connected to an LSI 2208 RAID controller run by mfi driver. HD's via a SAS2X28 backplane and SSD's via a passive blackplane backplane. The file system has 31 test files most random data from /dev/random and one blank from /dev/zero. The test running was multiple ~20 dd's under screen with all but one from /dev/random and to final one from /dev/zero e.g. dd if=/dev/random bs=1m of=/tank2/random10 No hardware errors have raised, so no disk timeouts etc. On completion each dd reported no space as you would expect e.g. dd if=/dev/random bs=1m of=/tank2/random13 dd: /tank2/random13: No space left on device 503478+0 records in 503477+0 records out 527933898752 bytes transferred in 126718.731762 secs (4166187 bytes/sec) You have new mail. At that point with the test seemingly successful I went to delete test files which resulted in:- rm random* rm: random1: Unknown error: 122 rm: random10: Unknown error: 122 rm: random11: Unknown error: 122 rm: random12: Unknown error: 122 rm: random13: Unknown error: 122 rm: random14: Unknown error: 122 rm: random18: Unknown error: 122 rm: random2: Unknown error: 122 rm: random3: Unknown error: 122 rm: random4: Unknown error: 122 rm: random5: Unknown error: 122 rm: random6: Unknown error: 122 rm: random7: Unknown error: 122 rm: random9: Unknown error: 122 Error 122 I assume is ECKSUM At this point the pool was showing checksum errors zpool status pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/41fb7e5c-21cf-11e2-92a3-002590881138 ONLINE 0 0 0 gptid/42a1b53c-21cf-11e2-92a3-002590881138 ONLINE 0 0 0 errors: No known data errors pool: tank2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: none requested config: NAME STATE READ WRITE CKSUM tank2 ONLINE 0 0 4.22K raidz2-0 ONLINE 0 0 16.9K mfisyspd0 ONLINE 0 0 0 mfisyspd1 ONLINE 0 0 0 mfisyspd2 ONLINE 0 0 0 mfisyspd3 ONLINE 0 0 0 mfisyspd4 ONLINE 0 0 0 mfisyspd5 ONLINE 0 0 0 mfisyspd6 ONLINE 0 0 0 logs mfisyspd7p3 ONLINE 0 0 0 mfisyspd8p3 ONLINE 0 0 0 cache mfisyspd9 ONLINE 0 0 0 mfisyspd10 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: tank2:<0x3> tank2:<0x8> tank2:<0x9> tank2:<0xa> tank2:<0xb> tank2:<0xf> tank2:<0x10> tank2:<0x11> tank2:<0x12> tank2:<0x13> tank2:<0x14> tank2:<0x15> So I tried a scrub, which looks like its going to take 5 days to complete and is reporting many many more errors:- pool: tank2 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Wed Oct 31 16:13:53 2012 118G scanned out of 18.7T at 42.2M/s, 128h19m to go 49.0M repaired, 0.62% done config: NAME STATE READ WRITE CKSUM tank2 ONLINE 0 0 596K raidz2-0 ONLINE 0 0 1.20M mfisyspd0 ONLINE 0 0 0 (repairing) mfisyspd1 ONLINE 0 0 0 (repairing) mfisyspd2 ONLINE 0 0 0 (repairing) mfisyspd3 ONLINE 0 0 2 (repairing) mfisyspd4 ONLINE 0 0 1 (repairing) mfisyspd5 ONLINE 0 0 0 (repairing) mfisyspd6 ONLINE 0 0 1 (repairing) logs mfisyspd7p3 ONLINE 0 0 0 mfisyspd8p3 ONLINE 0 0 0 cache mfisyspd9 ONLINE 0 0 0 mfisyspd10 ONLINE 0 0 0 errors: 596965 data errors, use '-v' for a list At this point I decided to cancel the scrub but no joy on that zpool scrub -s tank2 cannot cancel scrubbing tank2: out of space So questions:- 1. Given the information it seems like the multiple writes filling the disk may have caused metadata corruption? 2. Is there anyway to stop the scrub? 3. Surely low space should never prevent stopping a scrub? Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?27087376D1C14132A3CC1B4016912F6D>