From owner-freebsd-questions@FreeBSD.ORG Sat Jun 28 12:12:08 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 463BE37B404 for ; Sat, 28 Jun 2003 12:12:08 -0700 (PDT) Received: from mta11.adelphia.net (mta11.adelphia.net [64.8.50.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id F11DC4400E for ; Sat, 28 Jun 2003 12:12:06 -0700 (PDT) (envelope-from wmoran@potentialtech.com) Received: from potentialtech.com ([24.53.179.151]) by mta11.adelphia.net (InterMail vM.5.01.05.32 201-253-122-126-132-20030307) with ESMTP id <20030628191206.WHTT1549.mta11.adelphia.net@potentialtech.com>; Sat, 28 Jun 2003 15:12:06 -0400 Message-ID: <3EFDE885.4050905@potentialtech.com> Date: Sat, 28 Jun 2003 15:12:05 -0400 From: Bill Moran User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3) Gecko/20030429 X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Ekins References: <20030627220033.5586e86b.john.ekins@brightview.com> <3EFD113A.3060402@potentialtech.com> <20030628192512.7165a3bf.john.ekins@brightview.com> In-Reply-To: <20030628192512.7165a3bf.john.ekins@brightview.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: questions@freebsd.org Subject: Re: Softupdates: df, du, sync and fsck [quite long] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Jun 2003 19:12:08 -0000 John Ekins wrote: > Hello Bill, > > On Fri, 27 Jun 2003 23:53:30 -0400 > Bill Moran wrote: > > -> I don't know what's wrong, but does unmounting and remounting the partition > -> reclaim the lost space? > > Alas, I can't umount the partition, my guess is because it is unable to sync > (nothing to do with open files, and no error message saying "device busy"). The > command just doesn't return after I've issued it. Hmmm ... not good. A little more research might qualify this problem for a PR. > -> If there's a LOT of inodes with problems, it could easily take a while to fix. > -> Also, if you run fsck without specifying a filesystem to fix, it exhaustively > -> checks all filesystems. So even if the problem is on /var, it might spend a > -> long time checking /usr as well. You can work around this by calling fsck > -> with the filesystem to check. > > I don't think it's to do with inodes or block size, etc. There's about 2M inodes > on /var. A manual fsck on a dirty shutdown on this partition (ignoring the problem > in hand) takes a couple of minutes. Hmmm ... > -> If these are production boxes, I'd recommend turning it off until you resolve > -> the problem. > > Indeed, I tried that last night on one machine and it put the load through the > roof(48). Yikes! Is the machine still responsive? Sometimes you can put the load that high and still have a functional box. I'm guessing by the way the conversation is going that you're able to grab one of these boxes and make some tweaks. Possibly try putting the spool directory on a dedicated partition and mounting it async? If the box shuts down dirty, you'll probably have to newfs the partition before you can use it again. At least make sure the spool partition is seperate from your log partition, that should help to mitigate the problem (although you may already have done that). > -> I don't know if this would qualify as "advice", but since nobody else > -> seems to have any suggestions, I figured I'd throw my thoughts in. > > -> Are you using ATA or SCSI drives? > > SCSI. > > -> Does issuing a manual "sync" once you've stopped the spooling process help > -> any? > > No. I'd already tried numerous syncs, and of course a clean shutdown tries that > too. I was wondering if maybe the syncs were taking longer than the shutdown process was willing to wait. > -> Are these all identical mobos ... possibly a BIOS update available? > > Haven't looked for an update, but I think they're all identical. Hmmm ... but the fact that you're using SCSI makes this less of an issue, unless it's onboard SCSI. Possibly an update to the SCSI BIOS? > -> These aren't IBM ATA drives are they? I had one of those give me grief for > -> months (if you look in the archives, you should be able to find details on > -> which drives caused problems). > > Alas not! They're straightforward Seagates, which in other machines we use (much > lighter load) don't have this problem. > > -> Have you tried updating one of the machines to 4.8 to see if the problem > -> has been fixed? > > I haven't tried that yet but will do so. I'm also going to test a 5.1R machine, > perhaps the background fsck will help when I alas come to reboot. It may save you some time to look in CVS under the files for the drivers for the SCSI subsystem as well as the drivers for you specific cards to see if any commit messages talk about fixing problems like this. My experience with background fsck is that the machine is slow as hell while the background fsck is running. Whether or not this is better or worse than what you're experiencing with 4.7 is a question only you can answer. > -> Like I said, not good advice, just some ideas for you. > > All advice and ideas are welcome. Well ... I'm really shooting in the dark with these suggestions, but hopefully there will be something useful. -- Bill Moran Potential Technologies http://www.potentialtech.com