From owner-freebsd-current@FreeBSD.ORG Sat May 27 11:04:35 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CDBE616AAC2 for ; Sat, 27 May 2006 11:04:35 +0000 (UTC) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (comp.chem.msu.su [158.250.32.97]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD67643D5A for ; Sat, 27 May 2006 11:04:25 +0000 (GMT) (envelope-from yar@comp.chem.msu.su) Received: from comp.chem.msu.su (localhost [127.0.0.1]) by comp.chem.msu.su (8.13.4/8.13.3) with ESMTP id k4RB4GpL063660 for ; Sat, 27 May 2006 15:04:16 +0400 (MSD) (envelope-from yar@comp.chem.msu.su) Received: (from yar@localhost) by comp.chem.msu.su (8.13.4/8.13.3/Submit) id k4RB4GBu063659 for freebsd-current@freebsd.org; Sat, 27 May 2006 15:04:16 +0400 (MSD) (envelope-from yar) Date: Sat, 27 May 2006 15:04:16 +0400 From: Yar Tikhiy To: freebsd-current@freebsd.org Message-ID: <20060527110415.GA63440@comp.chem.msu.su> References: <20060518151232.GA37743@comp.chem.msu.su> <200605181819.k4IIJHL7001150@hardy.tmseck.homedns.org> <20060519085408.GB51604@comp.chem.msu.su> <20060521102204.GB78879@comp.chem.msu.su> <20060526072458.GA47499@comp.chem.msu.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060526072458.GA47499@comp.chem.msu.su> User-Agent: Mutt/1.5.9i Subject: Re: Root FS corruption X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 May 2006 11:05:05 -0000 On Fri, May 26, 2006 at 11:24:58AM +0400, Yar Tikhiy wrote: > > I still can damage a file on the root FS by running nextboot. This > seems very reproducible. A subsequent reboot is needed for the > damage to happen actually. The pattern is the same: A fragment > is allocated to nextboot.conf in the block immediately preceding > another file's block. The nextboot.conf contents are written out > later (when syncing disks before the reboot?) to the neighbour > file's first fragment. Nextboot.conf itself has correct contents, > which means that the contents are written out twice for some reason. > > Nextboot is a simple shell script just writing out nextboot.conf, > which means that any file write following the same scenario (creat > and write a small file, then reboot) should result in damage to > anothe file on the same FS. Of course, the FS fill pattern may > affect this. In my case, the FS is only half full, which apparently > allows for allocating a new block to the small file, not a fragment > in a partially occupied block. Folks, I have good news for all of us: This kind of corruption isn't done by the kernel. Thanks to Ian Dowse, I found out that /boot/loader would rewrite nextboot.conf through libufs or whatever. This is done in support.4th, the word is rewrite_nextboot_file. Initially I missed a clear sign of the problem being caused by the loader: The corrupted data started with `nextboot_enable="NO" \n', which is the string written from support.4th. The actual bug must be hiding in libufs, or whatever loader uses to access UFS. Recent technical details of my investigation have been filed in PR bin/98005: http://www.freebsd.org/cgi/query-pr.cgi?pr=98005 The conclusion is: Avoid nextboot(8) for now. -- Yar