Date: Sun, 21 May 2006 14:22:05 +0400 From: Yar Tikhiy <yar@comp.chem.msu.su> To: freebsd-current@freebsd.org Subject: Re: Root FS corruption Message-ID: <20060521102204.GB78879@comp.chem.msu.su> In-Reply-To: <20060519085408.GB51604@comp.chem.msu.su> References: <20060518151232.GA37743@comp.chem.msu.su> <200605181819.k4IIJHL7001150@hardy.tmseck.homedns.org> <20060519085408.GB51604@comp.chem.msu.su>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 19, 2006 at 12:54:08PM +0400, Yar Tikhiy wrote: > On Thu, May 18, 2006 at 08:19:17PM +0200, Thomas-Martin Seck wrote: > > * Yar Tikhiy <yar@comp.chem.msu.su>: > > > > > I saw the following / corruption in a fresh CURRENT when using > > > nextboot. Of course, it wasn't the fault of nextboot itself, > > > nextboot simply was the only utility to modify / in my case. > > > > > > I found the contents of nextboot.conf once in my custom /root/supfile, > > > the other time in the stock /etc/protocols. /etc/protocols was > > > large enough to see how the corruption had happened: the first > > > fragment, 2048 bytes, of the file was replaced by the contents of > > > nextboot.conf, zero padded. > > > > > > The / was a usual 2048/16384 UFS2 without soft-updates. The kernel > > > was GENERIC. Forced fsck reported no problems at all. The / had > > > never been dirty because I used nextboot to boot single-user with > > > all FSen read-only and investigate a panic unrelated to FS. > > > > > > Did any one see a similar problem of fragment mis-allocation? > > > > I experienced the exact same corruption some months ago with a RELENG_6 > > test system I update regularly. Unfortunately, this corruption happened > > only once, I was never able to reproduce it since. > > > > The kernel is a stripped down GENERIC, /root is a 2048/16384 UFS2 fs. > > Thank you for your reply! Apropos, today /boot/kernel/ng_fec.ko > fell a victim to the corruption in exactly the same way: its first > fragment was replaced by the nextboot.conf contents. The system > was updated last time on the day before yesterday. > > Of course, more / corruption is likely. The case of nextboot.conf > is just detectable easily. Thank Daemon, it's a test machine and > not a production server. I'm still trying to find a pattern in the > corruption. I've just tried to and reproduced the corruption immediately. The victim is ng_fec.ko again. Perhaps the file lies at a vulnerable spot. Attached is the typescript of my post-facto debug session involving ls -i and fsdb. A point to note is the proximity of ng_fec.ko and nextboot.conf: They are at adjacent blocks. Looks like an off-by-one bug, doesn't it? However, I can't imagine how the nextboot stuff gets written to _both_ /boot/nextboot.conf and the first fragment of ng_fec.ko. The fragment appears to have been written twice, to adjacent, yet different, locations. -- Yar # hd /boot/kernel/ng_fec.ko|head 00000000 6e 65 78 74 62 6f 6f 74 5f 65 6e 61 62 6c 65 3d |nextboot_enable=| 00000010 22 4e 4f 22 20 0a 6b 65 72 6e 65 6c 3d 22 54 45 |"NO" .kernel="TE| 00000020 53 54 22 0a 6b 65 72 6e 65 6c 5f 6f 70 74 69 6f |ST".kernel_optio| 00000030 6e 73 3d 22 2d 73 22 0a 00 00 00 00 00 00 00 00 |ns="-s".........| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000800 54 47 52 41 50 48 5f 4d 53 47 00 6d 6f 64 75 6c |TGRAPH_MSG.modul| 00000810 65 5f 72 65 67 69 73 74 65 72 5f 69 6e 69 74 00 |e_register_init.| 00000820 62 7a 65 72 6f 00 6e 67 5f 66 72 65 65 5f 69 74 |bzero.ng_free_it| 00000830 65 6d 00 69 66 5f 66 72 65 65 00 69 66 75 6e 69 |em.if_free.ifuni| # fsck -n / ** /dev/ad0s3a (NO WRITE) ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1522 files, 53031 used, 73808 free (1016 frags, 9099 blocks, 0.8% fragmentation) # ls -li1 /boot/nextboot.conf /boot/kernel/ng_fec.ko 25693 /boot/kernel/ng_fec.ko 25610 /boot/nextboot.conf # fsdb -r /dev/ad0s3a ** /dev/ad0s3a (NO WRITE) Examining file system `/dev/ad0s3a' Last Mounted on / current inode: directory I=2 MODE=40755 SIZE=512 MTIME=May 19 17:44:35 2006 [0 nsec] CTIME=May 19 17:44:35 2006 [0 nsec] ATIME=May 21 13:36:54 2006 [0 nsec] OWNER=root GRP=wheel LINKCNT=20 FLAGS=0 BLKCNT=4 GEN=6c8c4dd5 fsdb (inum: 2)> inode 25610 current inode: regular file I=25610 MODE=100644 SIZE=56 MTIME=May 21 13:38:02 2006 [0 nsec] CTIME=May 21 13:38:02 2006 [0 nsec] ATIME=May 21 13:38:02 2006 [0 nsec] OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=4 GEN=7f9710d0 fsdb (inum: 25610)> blocks Blocks for inode 25610: Direct blocks: 129599 (1 frag) fsdb (inum: 25610)> inode 25693 current inode: regular file I=25693 MODE=100555 SIZE=12922 MTIME=May 19 13:36:50 2006 [0 nsec] CTIME=May 19 13:36:50 2006 [0 nsec] ATIME=May 21 13:36:56 2006 [0 nsec] OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=1c GEN=8b771c4 fsdb (inum: 25693)> blocks Blocks for inode 25693: Direct blocks: 129600 (7 frags) fsdb (inum: 25693)> q
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060521102204.GB78879>