Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Oct 2012 17:55:43 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        <freebsd-stable@freebsd.org>, <freebsd-fs@FreeBSD.ORG>
Subject:   Re: ZFS corruption due to lack of space?
Message-ID:  <A394192F694F49488291020AB9FBF00E@multiplay.co.uk>
References:  <27087376D1C14132A3CC1B4016912F6D@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Other info:
zpool list tank2
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank2    19T  18.7T   304G    98%  1.00x  ONLINE  -

zfs list tank2
NAME    USED  AVAIL  REFER  MOUNTPOINT
tank2  13.3T      0  13.3T  /tank2

Running: 8.3-RELEASE-p4, zpool: v28, zfs: v5


----- Original Message ----- 
From: "Steven Hartland" <steven@multiplay.co.uk>
To: <freebsd-stable@freebsd.org>; <freebsd-fs@FreeBSD.ORG>
Sent: Wednesday, October 31, 2012 5:25 PM
Subject: ZFS corruption due to lack of space?


> Been running some tests on new hardware here to verify all
> is good. One of the tests was to fill the zfs array which
> seems like its totally corrupted the tank.
> 
> The HW is 7 x 3TB disks in RAIDZ2 with dual 13GB ZIL
> partitions and dual 100GB L2ARC on Enterprise SSD's.
> 
> All disks are connected to an LSI 2208 RAID controller
> run by mfi driver. HD's via a SAS2X28 backplane and
> SSD's via a passive blackplane backplane.
> 
> The file system has 31 test files most random data from
> /dev/random and one blank from /dev/zero.
> 
> The test running was multiple ~20 dd's under screen with
> all but one from /dev/random and to final one from /dev/zero
> 
> e.g. dd if=/dev/random bs=1m of=/tank2/random10
> 
> No hardware errors have raised, so no disk timeouts etc.
> 
> On completion each dd reported no space as you would expect
> e.g. dd if=/dev/random bs=1m of=/tank2/random13
> dd: /tank2/random13: No space left on device
> 503478+0 records in
> 503477+0 records out
> 527933898752 bytes transferred in 126718.731762 secs (4166187 bytes/sec)
> You have new mail.
> 
> At that point with the test seemingly successful I went
> to delete test files which resulted in:-
> rm random*
> rm: random1: Unknown error: 122
> rm: random10: Unknown error: 122
> rm: random11: Unknown error: 122
> rm: random12: Unknown error: 122
> rm: random13: Unknown error: 122
> rm: random14: Unknown error: 122
> rm: random18: Unknown error: 122
> rm: random2: Unknown error: 122
> rm: random3: Unknown error: 122
> rm: random4: Unknown error: 122
> rm: random5: Unknown error: 122
> rm: random6: Unknown error: 122
> rm: random7: Unknown error: 122
> rm: random9: Unknown error: 122
> 
> Error 122 I assume is ECKSUM
> 
> At this point the pool was showing checksum errors
> zpool status
>  pool: tank
> state: ONLINE
>  scan: none requested
> config:
> 
>        NAME                                            STATE     READ WRITE CKSUM
>        tank                                            ONLINE       0     0     0
>          mirror-0                                      ONLINE       0     0     0
>            gptid/41fb7e5c-21cf-11e2-92a3-002590881138  ONLINE       0     0     0
>            gptid/42a1b53c-21cf-11e2-92a3-002590881138  ONLINE       0     0     0
> 
> errors: No known data errors
> 
>  pool: tank2
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
>        corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>        entire pool from backup.
>   see: http://www.sun.com/msg/ZFS-8000-8A
>  scan: none requested
> config:
> 
>        NAME           STATE     READ WRITE CKSUM
>        tank2          ONLINE       0     0 4.22K
>          raidz2-0     ONLINE       0     0 16.9K
>            mfisyspd0  ONLINE       0     0     0
>            mfisyspd1  ONLINE       0     0     0
>            mfisyspd2  ONLINE       0     0     0
>            mfisyspd3  ONLINE       0     0     0
>            mfisyspd4  ONLINE       0     0     0
>            mfisyspd5  ONLINE       0     0     0
>            mfisyspd6  ONLINE       0     0     0
>        logs
>          mfisyspd7p3  ONLINE       0     0     0
>          mfisyspd8p3  ONLINE       0     0     0
>        cache
>          mfisyspd9    ONLINE       0     0     0
>          mfisyspd10   ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 
>        tank2:<0x3>
>        tank2:<0x8>
>        tank2:<0x9>
>        tank2:<0xa>
>        tank2:<0xb>
>        tank2:<0xf>
>        tank2:<0x10>
>        tank2:<0x11>
>        tank2:<0x12>
>        tank2:<0x13>
>        tank2:<0x14>
>        tank2:<0x15>
> 
> So I tried a scrub, which looks like its going to
> take 5 days to complete and is reporting many many more
> errors:-
> 
>  pool: tank2
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
>        corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>        entire pool from backup.
>   see: http://www.sun.com/msg/ZFS-8000-8A
>  scan: scrub in progress since Wed Oct 31 16:13:53 2012
>        118G scanned out of 18.7T at 42.2M/s, 128h19m to go
>        49.0M repaired, 0.62% done
> config:
> 
>        NAME           STATE     READ WRITE CKSUM
>        tank2          ONLINE       0     0  596K
>          raidz2-0     ONLINE       0     0 1.20M
>            mfisyspd0  ONLINE       0     0     0  (repairing)
>            mfisyspd1  ONLINE       0     0     0  (repairing)
>            mfisyspd2  ONLINE       0     0     0  (repairing)
>            mfisyspd3  ONLINE       0     0     2  (repairing)
>            mfisyspd4  ONLINE       0     0     1  (repairing)
>            mfisyspd5  ONLINE       0     0     0  (repairing)
>            mfisyspd6  ONLINE       0     0     1  (repairing)
>        logs
>          mfisyspd7p3  ONLINE       0     0     0
>          mfisyspd8p3  ONLINE       0     0     0
>        cache
>          mfisyspd9    ONLINE       0     0     0
>          mfisyspd10   ONLINE       0     0     0
> 
> errors: 596965 data errors, use '-v' for a list
> 
> 
> At this point I decided to cancel the scrub but no joy on that
> 
> zpool scrub -s tank2
> cannot cancel scrubbing tank2: out of space
> 
> So questions:-
> 
> 1. Given the information it seems like the multiple writes filling
> the disk may have caused metadata corruption?
> 2. Is there anyway to stop the scrub?
> 3. Surely low space should never prevent stopping a scrub?


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A394192F694F49488291020AB9FBF00E>