From owner-freebsd-current@FreeBSD.ORG Mon Oct 3 19:19:32 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B84C81065670 for ; Mon, 3 Oct 2011 19:19:32 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from lennier.cc.vt.edu (lennier.cc.vt.edu [198.82.162.213]) by mx1.freebsd.org (Postfix) with ESMTP id 73F168FC16 for ; Mon, 3 Oct 2011 19:19:32 +0000 (UTC) Received: from dagger.cc.vt.edu (dagger.cc.vt.edu [198.82.163.114]) by lennier.cc.vt.edu (8.13.8/8.13.8) with ESMTP id p93IKXJP020400 for ; Mon, 3 Oct 2011 14:21:05 -0400 Received: from auth3.smtp.vt.edu (EHLO auth3.smtp.vt.edu) ([198.82.161.152]) by dagger.cc.vt.edu (MOS 4.2.2-FCS FastPath queued) with ESMTP id SMH59296; Mon, 03 Oct 2011 14:21:05 -0400 (EDT) Received: from pmather.tower.lib.vt.edu (pmather.tower.lib.vt.edu [128.173.51.28]) (authenticated bits=0) by auth3.smtp.vt.edu (8.13.8/8.13.8) with ESMTP id p93IL58f016993 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Mon, 3 Oct 2011 14:21:05 -0400 From: Paul Mather Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Mon, 3 Oct 2011 14:21:05 -0400 Message-Id: <8B59D754-9062-4499-9873-7C2167622032@gromit.dlib.vt.edu> To: freebsd-current@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-Mirapoint-Received-SPF: 198.82.161.152 auth3.smtp.vt.edu paul@gromit.dlib.vt.edu 5 none X-Junkmail-Status: score=10/50, host=dagger.cc.vt.edu X-Junkmail-Signature-Raw: score=unknown, refid=str=0001.0A020208.4E89FD11.0110,ss=1,fgs=0, ip=0.0.0.0, so=2010-07-22 22:03:31, dmn=2009-09-10 00:05:08, mode=single engine X-Junkmail-IWF: false Subject: Strange ZFS filesystem corruption X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Oct 2011 19:19:32 -0000 I wasn't sure whether to post this here or on stable@freebsd.org. The = system now runs RELENG_9, but the ZFS pool exhibiting problems was = created, IIRC, under 9-CURRENT. I believe RELENG_9 is sufficiently = close to HEAD at this stage that this list is probably the correct place = for this message. I have a raidz2 ZFS pool on a system that I have recently been using as = a mirror for about 6.5 TiB of data. The data are mirrored nightly using = rsync. I noticed during these nightly rsync copies I would get some = errors like this: =3D=3D=3D=3D=3D file has vanished: "/backups/storage/san/DLA/DLA_Records/05DLAAdmin" rsync: stat "/backups/storage/san/DLA/DLA_Records/05DLAAdmin" failed: No = such file or directory (2) rsync: recv_generator: mkdir = "/backups/storage/san/DLA/DLA_Records/05DLAAdmin/05DI_business copy" = failed: No such file or directory (2) *** Skipping any contents from this failed directory *** =3D=3D=3D=3D=3D It appears that 05DLAAdmin is a directory that is corrupted. It shows = in an "ls" but any attempt to descend into that directory or discern its = attributes fails with a "No such file or directory" error. Furthermore, = I cannot delete this directory (even with "rm -rf"). E.g.: =3D=3D=3D=3D=3D tape# pwd /backups/storage/san/DLA tape# whoami root tape# rm -rf DLA_Records rm: DLA_Records/07DLAAdmin/07Digital_Imaging_Work: Directory not empty rm: DLA_Records/07DLAAdmin/FY07IAWAprep: Directory not empty rm: DLA_Records/07DLAAdmin: Directory not empty rm: DLA_Records: Directory not empty tape# cd DLA_Records tape# ls 05DLAAdmin 07DLAAdmin tape# ls -l ls: 05DLAAdmin: No such file or directory total 3 drwxrws--- 4 500 501 4 Oct 3 11:53 07DLAAdmin tape# file 05DLAAdmin 05DLAAdmin: cannot open `05DLAAdmin' (No such file or directory) tape# ls -R 07DLAAdmin 07Digital_Imaging_Work FY07IAWAprep 07DLAAdmin/07Digital_Imaging_Work: ls: 07Proposals: No such file or directory 07DLAAdmin/FY07IAWAprep: ls: Budget: No such file or directory tape# ls 07DLAAdmin 07Digital_Imaging_Work FY07IAWAprep tape# ls 07DLAAdmin/07Digital_Imaging_Work 07Proposals tape# ls -l 07DLAAdmin/07Digital_Imaging_Work/07Proposals ls: 07DLAAdmin/07Digital_Imaging_Work/07Proposals: No such file or = directory tape# ls 07DLAAdmin/FY07IAWAprep Budget tape# ls 07DLAAdmin/FY07IAWAprep/Budget ls: 07DLAAdmin/FY07IAWAprep/Budget: No such file or directory tape# file 07DLAAdmin/FY07IAWAprep/Budget 07DLAAdmin/FY07IAWAprep/Budget: cannot open = `07DLAAdmin/FY07IAWAprep/Budget' (No such file or directory) tape# cd 05DLAAdmin 05DLAAdmin: No such file or directory. =3D=3D=3D=3D=3D The pool itself reports no errors. I performed a scrub on the pool yet = this bizarre filesystem corruption persists: =3D=3D=3D=3D=3D tape# zpool status backups pool: backups state: ONLINE scan: scrub repaired 15K in 7h33m with 0 errors on Sat Oct 1 19:22:35 = 2011 config: NAME STATE READ WRITE CKSUM backups ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/disk02 ONLINE 0 0 0 gpt/disk03 ONLINE 0 0 0 gpt/disk04 ONLINE 0 0 0 gpt/disk05 ONLINE 0 0 0 gpt/disk06 ONLINE 0 0 0 gpt/disk07 ONLINE 0 0 0 errors: No known data errors tape# uname -a FreeBSD tape.private.lib.vt.edu 9.0-BETA3 FreeBSD 9.0-BETA3 #0: Wed Sep = 28 15:18:59 EDT 2011 = pmather@tape.private.lib.vt.edu:/usr/obj/usr/src/sys/TAPE amd64 tape# zpool get all backups NAME PROPERTY VALUE SOURCE backups size 10.9T - backups capacity 62% - backups altroot - default backups health ONLINE - backups guid 1352318175125790395 default backups version 28 default backups bootfs - default backups delegation on default backups autoreplace off default backups cachefile - default backups failmode wait default backups listsnapshots off default backups autoexpand off default backups dedupditto 0 default backups dedupratio 1.00x - backups free 4.07T - backups allocated 6.80T - backups readonly off - tape# zfs get all backups/storage NAME PROPERTY VALUE SOURCE backups/storage type filesystem - backups/storage creation Fri Sep 2 14:43 2011 - backups/storage used 4.26T - backups/storage available 2.60T - backups/storage referenced 4.26T - backups/storage compressratio 1.51x - backups/storage mounted yes - backups/storage quota none default backups/storage reservation none default backups/storage recordsize 128K default backups/storage mountpoint /backups/storage default backups/storage sharenfs off default backups/storage checksum fletcher4 local backups/storage compression gzip-9 local backups/storage atime on default backups/storage devices on default backups/storage exec off local backups/storage setuid on default backups/storage readonly off default backups/storage jailed off default backups/storage snapdir hidden default backups/storage aclmode discard default backups/storage aclinherit restricted default backups/storage canmount on default backups/storage xattr off temporary backups/storage copies 1 default backups/storage version 5 - backups/storage utf8only off - backups/storage normalization none - backups/storage casesensitivity sensitive - backups/storage vscan off default backups/storage nbmand off default backups/storage sharesmb off default backups/storage refquota none default backups/storage refreservation none default backups/storage primarycache all default backups/storage secondarycache all default backups/storage usedbysnapshots 0 - backups/storage usedbydataset 4.26T - backups/storage usedbychildren 0 - backups/storage usedbyrefreservation 0 - backups/storage logbias latency default backups/storage dedup off default backups/storage mlslabel - backups/storage sync standard default backups/storage refcompressratio 1.51x - =3D=3D=3D=3D=3D I know ZFS does not have a fsck utility ("because it doesn't need = one":), but does anyone know of any way of fixing this corruption short = of destroying the pool, creating a new one, and restoring from backup? = Is there some way of exporting and re-importing the pool that has the = side-effect of doing some kind of fsck-like repairing of subtle = corruption like this? Cheers, Paul.