Date: Fri, 27 Jun 2003 22:00:33 +0100 From: John Ekins <john.ekins@brightview.com> To: questions@freebsd.org Subject: Softupdates: df, du, sync and fsck [quite long] Message-ID: <20030627220033.5586e86b.john.ekins@brightview.com>
next in thread | raw e-mail | index | archive | help
Hello, I've a couple of questions about soft updates. I've Googled heavily on this but not really found a satisfactory answer. The story: I'm running on numerous FreeBSD 4.7 SMP machines as primary MX machines. The mail is not stored on the FreeBSD machines but on NetApps via NFS. However the mail is temporarily spooled on the FreeBSD machines during normal MTA handling and passing to an anti-virus scanner. I have one large partition /var on each machine where basically all the work and temporary/transient files for the MTA and AV scanner takes place. These machines are heavily utilised, running quite "hot" with a load average of anything from 2 to 8. Many thousands of temporary files are thus created and deleted a minute. I have no problem with this as nearly all email is delivered in under 1 minute whatever. I notice that after a while the amount of free space as shown by df considerably varies from a du on /var. I'm aware of why this happens with soft updates, but that's not the whole story. If I turn off incoming email on a machine, the space does not seem to sync back to what it should be. No matter how long I turn off the MTA, the space is simply not returned, and df/du show differences of about 5:1. Nothing else is writing/holding open files on that partition (even turned off syslog, cron, etc. and checked using lsof). In comparison, if, for example, on my normal desktop machine I create a 500MB file, then delete it, the space shortly afterwards is returned to me when I run df. The only way I've been able to recover this space to what it should be is to reboot the machine. Which brings me to the next problem... As an example, here is a snippet from the console from when I rebooted an affected machine: boot() called on cpu#2 Waiting (max 60 seconds) for system process `vnlru' to stop...stopped Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped Waiting (max 60 seconds) for system process `syncer' to stop...timed out syncing disks... 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 giving up on 22 buffers Uptime: 27d23h1m27s Rebooting... As you can see the file system is unable to sync. When the machine reboots it literally takes hours to fsck the /var partition (only about 15GB). And the fsck output is full of messages like this: UNEXPECTED SOFT UPDATE INCONSISTENCY Now, is there a problem here with soft updates "losing track" of what is going on on this busy partition? It would appear to be so as quietening the machine does not lead to a proper sync. Secondly, why does the fsck take such an inordinate amount of time for a smallish partition? I really like the performance benefits of soft updates, but it seems that I'm going to have to turn it off on /var because of the problems that eventually occur. If anyone has some advice I'd be grateful. Cheers, John.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030627220033.5586e86b.john.ekins>