Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 May 2006 14:22:05 +0400
From:      Yar Tikhiy <yar@comp.chem.msu.su>
To:        freebsd-current@freebsd.org
Subject:   Re: Root FS corruption
Message-ID:  <20060521102204.GB78879@comp.chem.msu.su>
In-Reply-To: <20060519085408.GB51604@comp.chem.msu.su>
References:  <20060518151232.GA37743@comp.chem.msu.su> <200605181819.k4IIJHL7001150@hardy.tmseck.homedns.org> <20060519085408.GB51604@comp.chem.msu.su>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 19, 2006 at 12:54:08PM +0400, Yar Tikhiy wrote:
> On Thu, May 18, 2006 at 08:19:17PM +0200, Thomas-Martin Seck wrote:
> > * Yar Tikhiy <yar@comp.chem.msu.su>:
> > 
> > > I saw the following / corruption in a fresh CURRENT when using
> > > nextboot.  Of course, it wasn't the fault of nextboot itself,
> > > nextboot simply was the only utility to modify / in my case.
> > > 
> > > I found the contents of nextboot.conf once in my custom /root/supfile,
> > > the other time in the stock /etc/protocols.  /etc/protocols was
> > > large enough to see how the corruption had happened: the first
> > > fragment, 2048 bytes, of the file was replaced by the contents of
> > > nextboot.conf, zero padded.
> > >
> > > The / was a usual 2048/16384 UFS2 without soft-updates.  The kernel
> > > was GENERIC.  Forced fsck reported no problems at all.  The / had
> > > never been dirty because I used nextboot to boot single-user with
> > > all FSen read-only and investigate a panic unrelated to FS.
> > > 
> > > Did any one see a similar problem of fragment mis-allocation?
> > 
> > I experienced the exact same corruption some months ago with a RELENG_6
> > test system I update regularly. Unfortunately, this corruption happened
> > only once, I was never able to reproduce it since.
> > 
> > The kernel is a stripped down GENERIC, /root is a 2048/16384 UFS2 fs.
> 
> Thank you for your reply!  Apropos, today /boot/kernel/ng_fec.ko
> fell a victim to the corruption in exactly the same way: its first
> fragment was replaced by the nextboot.conf contents.  The system
> was updated last time on the day before yesterday.
> 
> Of course, more / corruption is likely.  The case of nextboot.conf
> is just detectable easily.  Thank Daemon, it's a test machine and
> not a production server.  I'm still trying to find a pattern in the
> corruption.

I've just tried to and reproduced the corruption immediately.  The
victim is ng_fec.ko again.  Perhaps the file lies at a vulnerable
spot.  Attached is the typescript of my post-facto debug session
involving ls -i and fsdb.  A point to note is the proximity of
ng_fec.ko and nextboot.conf: They are at adjacent blocks.  Looks
like an off-by-one bug, doesn't it?  However, I can't imagine how
the nextboot stuff gets written to _both_ /boot/nextboot.conf and
the first fragment of ng_fec.ko.  The fragment appears to have been
written twice, to adjacent, yet different, locations.

-- 
Yar

# hd /boot/kernel/ng_fec.ko|head
00000000  6e 65 78 74 62 6f 6f 74  5f 65 6e 61 62 6c 65 3d  |nextboot_enable=|
00000010  22 4e 4f 22 20 0a 6b 65  72 6e 65 6c 3d 22 54 45  |"NO" .kernel="TE|
00000020  53 54 22 0a 6b 65 72 6e  65 6c 5f 6f 70 74 69 6f  |ST".kernel_optio|
00000030  6e 73 3d 22 2d 73 22 0a  00 00 00 00 00 00 00 00  |ns="-s".........|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  54 47 52 41 50 48 5f 4d  53 47 00 6d 6f 64 75 6c  |TGRAPH_MSG.modul|
00000810  65 5f 72 65 67 69 73 74  65 72 5f 69 6e 69 74 00  |e_register_init.|
00000820  62 7a 65 72 6f 00 6e 67  5f 66 72 65 65 5f 69 74  |bzero.ng_free_it|
00000830  65 6d 00 69 66 5f 66 72  65 65 00 69 66 75 6e 69  |em.if_free.ifuni|
# fsck -n /
** /dev/ad0s3a (NO WRITE)
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1522 files, 53031 used, 73808 free (1016 frags, 9099 blocks, 0.8% fragmentation)
# ls -li1 /boot/nextboot.conf /boot/kernel/ng_fec.ko
25693 /boot/kernel/ng_fec.ko
25610 /boot/nextboot.conf
# fsdb -r /dev/ad0s3a
** /dev/ad0s3a (NO WRITE)
Examining file system `/dev/ad0s3a'
Last Mounted on /
current inode: directory
I=2 MODE=40755 SIZE=512
        MTIME=May 19 17:44:35 2006 [0 nsec]
        CTIME=May 19 17:44:35 2006 [0 nsec]
        ATIME=May 21 13:36:54 2006 [0 nsec]
OWNER=root GRP=wheel LINKCNT=20 FLAGS=0 BLKCNT=4 GEN=6c8c4dd5
fsdb (inum: 2)> inode 25610
current inode: regular file
I=25610 MODE=100644 SIZE=56
        MTIME=May 21 13:38:02 2006 [0 nsec]
        CTIME=May 21 13:38:02 2006 [0 nsec]
        ATIME=May 21 13:38:02 2006 [0 nsec]
OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=4 GEN=7f9710d0
fsdb (inum: 25610)> blocks
Blocks for inode 25610:
Direct blocks:
129599 (1 frag)
fsdb (inum: 25610)> inode 25693
current inode: regular file
I=25693 MODE=100555 SIZE=12922
        MTIME=May 19 13:36:50 2006 [0 nsec]
        CTIME=May 19 13:36:50 2006 [0 nsec]
        ATIME=May 21 13:36:56 2006 [0 nsec]
OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=1c GEN=8b771c4
fsdb (inum: 25693)> blocks
Blocks for inode 25693:
Direct blocks:
129600 (7 frags)
fsdb (inum: 25693)> q



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060521102204.GB78879>