Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Dec 2005 11:16:10 -0800 (PST)
From:      Nate Eldredge <nge@cs.hmc.edu>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/90512: Snapshot corruption after fs activity
Message-ID:  <Pine.GSO.4.63.0512161112560.13263@turing>
Resent-Message-ID: <200512161920.jBGJK4nJ080461@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         90512
>Category:       kern
>Synopsis:       Snapshot corruption after fs activity
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Dec 16 19:20:03 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Nate Eldredge
>Release:        FreeBSD 6.0-RELEASE amd64
>Organization:
>Environment:
System: FreeBSD vulcan.lan 6.0-RELEASE FreeBSD 6.0-RELEASE #0: Wed Dec 14 20:08:57 PST 2005 nate@vulcan.lan:/usr/obj/usr/src/sys/VULCAN amd64



>Description:
When you use mksnap_ffs to make a snapshot on a filesystem which then
has a lot of stuff deleted and re-created, the snapshot becomes corrupt.

I think this is fairly serious since snapshots may be used for backup
purposes.  That's how I originally discovered the problem; I made a
snapshot on /usr before making a bunch of changes, during which I
accidentally moved most of /usr/local to another partition :).  I moved
it back but wanted to verify that everything was back as it was,
which is when I discovered my snapshot was no good.

Note this is on amd64.  I have not tried i386.
>How-To-Repeat:
# dd if=/dev/zero of=snaptest.img bs=1024k count=1000
# mdconfig -a -t vnode -f snaptest.img
md0
# newfs /dev/md0
# mount /dev/md0 /mnt/md0
# cd /mnt/md0
# tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 
# mksnap_ffs /mnt/md0 /mnt/md0/.snap/snap1
# mdconfig -a -t vnode -f .snap/snap1
WARNING: opening backing store: /mnt/md0/.snap/snap1 readonly
md1
# mount -r /dev/md1 /mnt/md1
###### inspecting /mnt/md1 reveals the snapshot is apparently okay
# rm -r gap4r4
###### snapshot still apparently okay
# !tar
tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2
# ls -l /mnt/md1/gap4r4
ls: Makefile.in: Bad file descriptor
ls: bin: Bad file descriptor
ls: cnf: Bad file descriptor
ls: configure: Bad file descriptor
ls: doc: Bad file descriptor
ls: etc: Bad file descriptor
ls: gap.shi: Bad file descriptor
ls: grp: Bad file descriptor
ls: pkg: Bad file descriptor
ls: prim: Bad file descriptor
ls: small: Bad file descriptor
ls: src: Bad file descriptor
ls: sysinfo.in: Bad file descriptor
ls: trans: Bad file descriptor
ls: tst: Bad file descriptor
total 38
-rw-r--r--  1 nate  nate   4782 Aug 29 06:19 README
-rw-r--r--  1 nate  nate   9725 May 11  2005 description4r4p5
-rw-r--r--  1 nate  nate  11660 Aug 29 06:05 description4r4p6
drwxr-xr-x  2 nate  nate   9728 Aug 30 06:27 lib


Doing truss on ls reveals that lstat() is returning EBADF on the offending
files (which doesn't make any sense as there is no file descriptor involved;
EIO might be better).  Also, umounting and then fscking /dev/md1
produces a cornucopia of errors, including as a representative sample:

PARTIALLY TRUNCATED INODE I=70662
3689066227402421815 BAD I=70662
4121129229942796344 BAD I=70662
3833180345978203193 BAD I=70662
4051046384641915184 BAD I=70662
3688509874569295664 BAD I=70662
3472592161990062385 BAD I=70662
3906084542581519160 BAD I=70662
4049637910162848049 BAD I=70662
4123381021216356400 BAD I=70662
3979273551213759020 BAD I=70662
4051327820913194809 BAD I=70662
EXCESSIVE BAD BLKS I=70662
INCORRECT BLOCK COUNT I=70662 (960 should be 736)
PARTIALLY TRUNCATED INODE I=70719
UNALLOCATED  I=23552  OWNER=nate MODE=0
DIRECTORY CORRUPTED  I=70660  OWNER=nate MODE=40755
MISSING '.'  I=71129  OWNER=nate MODE=40755
SIZE=1536 MTIME=Aug 30 06:27 2005 
UNREF DIR  I=117760  OWNER=nate MODE=40755
SIZE=512 MTIME=Aug 30 06:27 2005 
LINK COUNT DIR I=2  OWNER=root MODE=40755
SIZE=512 MTIME=Dec 16 10:34 2005  COUNT 4 SHOULD BE 3

The original filesystem /dev/md0 apparently
remains okay and fsck reports no errors for it.

There are no kernel error messages this time, though a previous attempt
(when the snapshot was on /dev/md0) yielded

/mnt/md0: bad dir ino 3182535 at offset 0: mangled entry
/mnt/md0: bad dir ino 2953 at offset 0: mangled entry
...4 or 5 more...

Also at that time there were directories which changed to files of size 1 
which dumped many, many bytes of garbage when cat'ted.

>Fix:
Unknown.

Thanks!

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.0512161112560.13263>