Date: Fri, 16 Dec 2005 11:16:10 -0800 (PST) From: Nate Eldredge <nge@cs.hmc.edu> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/90512: Snapshot corruption after fs activity Message-ID: <Pine.GSO.4.63.0512161112560.13263@turing> Resent-Message-ID: <200512161920.jBGJK4nJ080461@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 90512 >Category: kern >Synopsis: Snapshot corruption after fs activity >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Dec 16 19:20:03 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Nate Eldredge >Release: FreeBSD 6.0-RELEASE amd64 >Organization: >Environment: System: FreeBSD vulcan.lan 6.0-RELEASE FreeBSD 6.0-RELEASE #0: Wed Dec 14 20:08:57 PST 2005 nate@vulcan.lan:/usr/obj/usr/src/sys/VULCAN amd64 >Description: When you use mksnap_ffs to make a snapshot on a filesystem which then has a lot of stuff deleted and re-created, the snapshot becomes corrupt. I think this is fairly serious since snapshots may be used for backup purposes. That's how I originally discovered the problem; I made a snapshot on /usr before making a bunch of changes, during which I accidentally moved most of /usr/local to another partition :). I moved it back but wanted to verify that everything was back as it was, which is when I discovered my snapshot was no good. Note this is on amd64. I have not tried i386. >How-To-Repeat: # dd if=/dev/zero of=snaptest.img bs=1024k count=1000 # mdconfig -a -t vnode -f snaptest.img md0 # newfs /dev/md0 # mount /dev/md0 /mnt/md0 # cd /mnt/md0 # tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 # mksnap_ffs /mnt/md0 /mnt/md0/.snap/snap1 # mdconfig -a -t vnode -f .snap/snap1 WARNING: opening backing store: /mnt/md0/.snap/snap1 readonly md1 # mount -r /dev/md1 /mnt/md1 ###### inspecting /mnt/md1 reveals the snapshot is apparently okay # rm -r gap4r4 ###### snapshot still apparently okay # !tar tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 # ls -l /mnt/md1/gap4r4 ls: Makefile.in: Bad file descriptor ls: bin: Bad file descriptor ls: cnf: Bad file descriptor ls: configure: Bad file descriptor ls: doc: Bad file descriptor ls: etc: Bad file descriptor ls: gap.shi: Bad file descriptor ls: grp: Bad file descriptor ls: pkg: Bad file descriptor ls: prim: Bad file descriptor ls: small: Bad file descriptor ls: src: Bad file descriptor ls: sysinfo.in: Bad file descriptor ls: trans: Bad file descriptor ls: tst: Bad file descriptor total 38 -rw-r--r-- 1 nate nate 4782 Aug 29 06:19 README -rw-r--r-- 1 nate nate 9725 May 11 2005 description4r4p5 -rw-r--r-- 1 nate nate 11660 Aug 29 06:05 description4r4p6 drwxr-xr-x 2 nate nate 9728 Aug 30 06:27 lib Doing truss on ls reveals that lstat() is returning EBADF on the offending files (which doesn't make any sense as there is no file descriptor involved; EIO might be better). Also, umounting and then fscking /dev/md1 produces a cornucopia of errors, including as a representative sample: PARTIALLY TRUNCATED INODE I=70662 3689066227402421815 BAD I=70662 4121129229942796344 BAD I=70662 3833180345978203193 BAD I=70662 4051046384641915184 BAD I=70662 3688509874569295664 BAD I=70662 3472592161990062385 BAD I=70662 3906084542581519160 BAD I=70662 4049637910162848049 BAD I=70662 4123381021216356400 BAD I=70662 3979273551213759020 BAD I=70662 4051327820913194809 BAD I=70662 EXCESSIVE BAD BLKS I=70662 INCORRECT BLOCK COUNT I=70662 (960 should be 736) PARTIALLY TRUNCATED INODE I=70719 UNALLOCATED I=23552 OWNER=nate MODE=0 DIRECTORY CORRUPTED I=70660 OWNER=nate MODE=40755 MISSING '.' I=71129 OWNER=nate MODE=40755 SIZE=1536 MTIME=Aug 30 06:27 2005 UNREF DIR I=117760 OWNER=nate MODE=40755 SIZE=512 MTIME=Aug 30 06:27 2005 LINK COUNT DIR I=2 OWNER=root MODE=40755 SIZE=512 MTIME=Dec 16 10:34 2005 COUNT 4 SHOULD BE 3 The original filesystem /dev/md0 apparently remains okay and fsck reports no errors for it. There are no kernel error messages this time, though a previous attempt (when the snapshot was on /dev/md0) yielded /mnt/md0: bad dir ino 3182535 at offset 0: mangled entry /mnt/md0: bad dir ino 2953 at offset 0: mangled entry ...4 or 5 more... Also at that time there were directories which changed to files of size 1 which dumped many, many bytes of garbage when cat'ted. >Fix: Unknown. Thanks! >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0512161112560.13263>