Date: Mon, 17 Oct 2005 03:26:30 +0200 From: Brad Knowles <brad@stop.mail-abuse.org> To: "Will Saxon" <WillS@housing.ufl.edu> Cc: stable@freebsd.org Subject: Re: Disk 100% busy Message-ID: <p06200716bf78aa876114@[10.0.1.210]> In-Reply-To: <0E972CEE334BFE4291CD07E056C76ED807738005@bragi.housing.ufl.edu> References: <0E972CEE334BFE4291CD07E056C76ED807738005@bragi.housing.ufl.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
At 9:16 AM -0400 2005-10-16, Will Saxon wrote: > In this case, my mail gateway is is a dual 3.06GHz Xeon with 1GB of ram > and 2 36GB 15krpm drives in a raid-1 on a smart array 6i (cciss) > controller. I am running FreeBSD 5.4-RELEASE-p1. > > Systat -vmstat reports the disk mirror is 100% busy at all times on this > machine, with an average of around 300 tps at 15KB/t. Note that RAID-1 is the second worst-case for mail server performance -- it accelerates reads (if you have mirror load-balancing), but all writes are required to be held until complete on both disks. The only worse case would be RAID-5, where you have to write (or re-write) an entire RAID block at once, plus the parity information. For mail servers, you really want to watch your synchronous meta-data updates. FreeBSD is a good choice here, if you've got Soft Updates enabled (I think that FreeBSD 5.x does that by default). But, you also want to watch your directory sizes. If the directory size gets too large, then it takes too long to lock the directory against any other updates, scan through the entire directory to make sure there aren't any collisions, create/delete the file, then unlock the directory -- a process which has to be done every time a file is created or deleted. This is why most modern mail servers use a "hashed queue" scheme, so that you can greatly increase the chances of multiple processes working simultaneously without stepping all over each others toes. However, with regards to directory size issues, keep in mind that even if the directory does not currently have 100,000 files in it, if it ever had 100k files in it in the past, it's still got all those empty directory slots laying around and that still slows things down a lot. If you suspect that this may have happened in the past, you need to stop the offending program, move the old directories aside, create new directories with the same ownership/permissions, then restart the program. And don't forget to make sure to clean out the old directories you had moved aside, either by creating some manual queue runners, or whatever. In your case, while the MTA may be configured in a way to avoid most of these issues, the anti-virus scanning solution may not. So, you may need to find a way to go in and deal with this. If you want to find out how all these issues affect the MTA, you need to read the book "sendmail Performance Tuning" by Nick Christenson (see <http://www.jetcafe.org/npc/book/sendmail/>). Once you read this book, you will hopefully have a better idea of how these same issues may affect your anti-virus scanning solution, and what you may need to do about it. I also recommend the slides from Nick's "Performance Tuning Sendmail Systems" paper at <http://www.jetcafe.org/npc/doc/performance_tuning.pdf>, as well as my own slides on the same general subject at <http://www.shub-internet.org/brad/papers/sendmail-tuning/>. > This seems wrong > to me, as these numbers are maintained even when the system doesn't > otherwise appear busy. We don't seem to be swamped by log writes. How can you be sure? How are you logging information today? Is that being logged to a separate filesystem on a separate disk system? > How > can I tell what's generating these disk writes? At the moment the 100% > disk utilization is the only thing I can see that would cause the > scanning delay. The machine overall is sluggish with file operations. You have a certain amount of information available to you from tools like vmstat and iostat, as well as systat. However, in order to understand how to use them to see where your problems really lie, you need information such as provided in Nick's book. You should also read other books on overall system performance tuning. The O'Reilly book on this subject (see <http://www.oreilly.com/catalog/spt2/>) is a good start, even though it is a few years old. -- Brad Knowles, <brad@stop.mail-abuse.org> "Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin (1706-1790), reply of the Pennsylvania Assembly to the Governor, November 11, 1755 SAGE member since 1995. See <http://www.sage.org/> for more info.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?p06200716bf78aa876114>