Date: Sat, 27 May 2006 22:43:07 -0700 From: Julian Elischer <julian@elischer.org> To: Yar Tikhiy <yar@comp.chem.msu.su> Cc: freebsd-current@freebsd.org Subject: Re: Root FS corruption Message-ID: <4479386B.6030008@elischer.org> In-Reply-To: <20060527110415.GA63440@comp.chem.msu.su> References: <20060518151232.GA37743@comp.chem.msu.su> <200605181819.k4IIJHL7001150@hardy.tmseck.homedns.org> <20060519085408.GB51604@comp.chem.msu.su> <20060521102204.GB78879@comp.chem.msu.su> <20060526072458.GA47499@comp.chem.msu.su> <20060527110415.GA63440@comp.chem.msu.su>
next in thread | previous in thread | raw e-mail | index | archive | help
Yar Tikhiy wrote: > On Fri, May 26, 2006 at 11:24:58AM +0400, Yar Tikhiy wrote: > >>I still can damage a file on the root FS by running nextboot. This >>seems very reproducible. A subsequent reboot is needed for the >>damage to happen actually. The pattern is the same: A fragment >>is allocated to nextboot.conf in the block immediately preceding >>another file's block. The nextboot.conf contents are written out >>later (when syncing disks before the reboot?) to the neighbour >>file's first fragment. Nextboot.conf itself has correct contents, >>which means that the contents are written out twice for some reason. >> >>Nextboot is a simple shell script just writing out nextboot.conf, >>which means that any file write following the same scenario (creat >>and write a small file, then reboot) should result in damage to >>anothe file on the same FS. Of course, the FS fill pattern may >>affect this. In my case, the FS is only half full, which apparently >>allows for allocating a new block to the small file, not a fragment >>in a partially occupied block. > > > Folks, I have good news for all of us: This kind of corruption > isn't done by the kernel. Thanks to Ian Dowse, I found out that > /boot/loader would rewrite nextboot.conf through libufs or whatever. > This is done in support.4th, the word is rewrite_nextboot_file. > Initially I missed a clear sign of the problem being caused by the > loader: The corrupted data started with `nextboot_enable="NO" \n', > which is the string written from support.4th. The actual bug must > be hiding in libufs, or whatever loader uses to access UFS. > > Recent technical details of my investigation have been filed > in PR bin/98005: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=98005 > > The conclusion is: Avoid nextboot(8) for now. > the current nextboot fails to provide all the designed functionality of the previous nextboot. (which is why we still use the old one at ironport) One day I'll get around to reimplementing the old one.. (the design criteria were:) Store the nextboot info "not in a filesystem". (the filesystem may be corrupt or there ma be several types of filesystem available). Change that info from boot0 without writing to a filesystem. (to note that it was used) Be able to store different stuff on different disks at the same time. Be able to ensure that you could specify how many times the information was used before falling back to something else.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4479386B.6030008>