Date: Mon, 3 Oct 2011 14:21:05 -0400 From: Paul Mather <paul@gromit.dlib.vt.edu> To: freebsd-current@freebsd.org Subject: Strange ZFS filesystem corruption Message-ID: <8B59D754-9062-4499-9873-7C2167622032@gromit.dlib.vt.edu>
next in thread | raw e-mail | index | archive | help
I wasn't sure whether to post this here or on stable@freebsd.org. The = system now runs RELENG_9, but the ZFS pool exhibiting problems was = created, IIRC, under 9-CURRENT. I believe RELENG_9 is sufficiently = close to HEAD at this stage that this list is probably the correct place = for this message. I have a raidz2 ZFS pool on a system that I have recently been using as = a mirror for about 6.5 TiB of data. The data are mirrored nightly using = rsync. I noticed during these nightly rsync copies I would get some = errors like this: =3D=3D=3D=3D=3D file has vanished: "/backups/storage/san/DLA/DLA_Records/05DLAAdmin" rsync: stat "/backups/storage/san/DLA/DLA_Records/05DLAAdmin" failed: No = such file or directory (2) rsync: recv_generator: mkdir = "/backups/storage/san/DLA/DLA_Records/05DLAAdmin/05DI_business copy" = failed: No such file or directory (2) *** Skipping any contents from this failed directory *** =3D=3D=3D=3D=3D It appears that 05DLAAdmin is a directory that is corrupted. It shows = in an "ls" but any attempt to descend into that directory or discern its = attributes fails with a "No such file or directory" error. Furthermore, = I cannot delete this directory (even with "rm -rf"). E.g.: =3D=3D=3D=3D=3D tape# pwd /backups/storage/san/DLA tape# whoami root tape# rm -rf DLA_Records rm: DLA_Records/07DLAAdmin/07Digital_Imaging_Work: Directory not empty rm: DLA_Records/07DLAAdmin/FY07IAWAprep: Directory not empty rm: DLA_Records/07DLAAdmin: Directory not empty rm: DLA_Records: Directory not empty tape# cd DLA_Records tape# ls 05DLAAdmin 07DLAAdmin tape# ls -l ls: 05DLAAdmin: No such file or directory total 3 drwxrws--- 4 500 501 4 Oct 3 11:53 07DLAAdmin tape# file 05DLAAdmin 05DLAAdmin: cannot open `05DLAAdmin' (No such file or directory) tape# ls -R 07DLAAdmin 07Digital_Imaging_Work FY07IAWAprep 07DLAAdmin/07Digital_Imaging_Work: ls: 07Proposals: No such file or directory 07DLAAdmin/FY07IAWAprep: ls: Budget: No such file or directory tape# ls 07DLAAdmin 07Digital_Imaging_Work FY07IAWAprep tape# ls 07DLAAdmin/07Digital_Imaging_Work 07Proposals tape# ls -l 07DLAAdmin/07Digital_Imaging_Work/07Proposals ls: 07DLAAdmin/07Digital_Imaging_Work/07Proposals: No such file or = directory tape# ls 07DLAAdmin/FY07IAWAprep Budget tape# ls 07DLAAdmin/FY07IAWAprep/Budget ls: 07DLAAdmin/FY07IAWAprep/Budget: No such file or directory tape# file 07DLAAdmin/FY07IAWAprep/Budget 07DLAAdmin/FY07IAWAprep/Budget: cannot open = `07DLAAdmin/FY07IAWAprep/Budget' (No such file or directory) tape# cd 05DLAAdmin 05DLAAdmin: No such file or directory. =3D=3D=3D=3D=3D The pool itself reports no errors. I performed a scrub on the pool yet = this bizarre filesystem corruption persists: =3D=3D=3D=3D=3D tape# zpool status backups pool: backups state: ONLINE scan: scrub repaired 15K in 7h33m with 0 errors on Sat Oct 1 19:22:35 = 2011 config: NAME STATE READ WRITE CKSUM backups ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 gpt/disk06 ONLINE 0 0 0 gpt/disk07 ONLINE 0 0 0 errors: No known data errors tape# uname -a FreeBSD tape.private.lib.vt.edu 9.0-BETA3 FreeBSD 9.0-BETA3 #0: Wed Sep = 28 15:18:59 EDT 2011 = pmather@tape.private.lib.vt.edu:/usr/obj/usr/src/sys/TAPE amd64 tape# zpool get all backups NAME PROPERTY VALUE SOURCE backups size 10.9T - backups capacity 62% - backups altroot - default backups health ONLINE - backups guid 1352318175125790395 default backups version 28 default backups bootfs - default backups delegation on default backups autoreplace off default backups cachefile - default backups failmode wait default backups listsnapshots off default backups autoexpand off default backups dedupditto 0 default backups dedupratio 1.00x - backups free 4.07T - backups allocated 6.80T - backups readonly off - tape# zfs get all backups/storage NAME PROPERTY VALUE SOURCE backups/storage type filesystem - backups/storage creation Fri Sep 2 14:43 2011 - backups/storage used 4.26T - backups/storage available 2.60T - backups/storage referenced 4.26T - backups/storage compressratio 1.51x - backups/storage mounted yes - backups/storage quota none default backups/storage reservation none default backups/storage recordsize 128K default backups/storage mountpoint /backups/storage default backups/storage sharenfs off default backups/storage checksum fletcher4 local backups/storage compression gzip-9 local backups/storage atime on default backups/storage devices on default backups/storage exec off local backups/storage setuid on default backups/storage readonly off default backups/storage jailed off default backups/storage snapdir hidden default backups/storage aclmode discard default backups/storage aclinherit restricted default backups/storage canmount on default backups/storage xattr off temporary backups/storage copies 1 default backups/storage version 5 - backups/storage utf8only off - backups/storage normalization none - backups/storage casesensitivity sensitive - backups/storage vscan off default backups/storage nbmand off default backups/storage sharesmb off default backups/storage refquota none default backups/storage refreservation none default backups/storage primarycache all default backups/storage secondarycache all default backups/storage usedbysnapshots 0 - backups/storage usedbydataset 4.26T - backups/storage usedbychildren 0 - backups/storage usedbyrefreservation 0 - backups/storage logbias latency default backups/storage dedup off default backups/storage mlslabel - backups/storage sync standard default backups/storage refcompressratio 1.51x - =3D=3D=3D=3D=3D I know ZFS does not have a fsck utility ("because it doesn't need = one":), but does anyone know of any way of fixing this corruption short = of destroying the pool, creating a new one, and restoring from backup? = Is there some way of exporting and re-importing the pool that has the = side-effect of doing some kind of fsck-like repairing of subtle = corruption like this? Cheers, Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8B59D754-9062-4499-9873-7C2167622032>