Date: Sun, 01 Jan 2006 15:10:36 -0500 From: Francisco Reyes <lists@stringsutils.com> To: Marc =?ISO-8859-1?B?Ry4=?= Fournier <scrappy@hub.org> Cc: freebsd-questions@freebsd.org Subject: Re: "Load Balancing": How Busy are the servers? Message-ID: <cone.1136146236.889316.12360.1000@zoraida.natserv.net> References: <20051227211433.J1087@ganymede.hub.org> <cone.1136049494.118589.27817.1000@zoraida.natserv.net> <20060101145325.X1088@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Marc G. Fournier writes: > For all the technology, I was kinda hoping for some 'scientific formula' > :) There are.. > Now, I really hate to ask, but how do you use vmstat to get a feel for how > busy the disk subsystem is? For me, reading "Absolute BSD" by Michael Lucas was very helpfull. In particular Chapter 18, System performance. The three columns I look at are for vmstat "r" and "b" on the left, and "fault". "r" shows how many processes are waiting for CPU, "b" shows how many processes are waiting for disk. The fault column(s) show how badly your system is accesing swap. Quick example: r b w 2 5 0 1 5 0 2 4 0 2 5 0 3 4 0 1 5 0 1 5 0 That's from my home machine as I am doing some backups. The machine at this point is more disk bound than CPU bound with 4 to 5 disk operations at any point in time waiting for disk access I am also falling behind in CPU, but not as bad. On the far right of vmsat you also have CPU stats.. in my case the vmstat from the above lines showed 70% to 90% iddle which confirmed I was disk bound at that point. The fault column show you how actively you are using swap. The lines above had between 30 and 200 approximately. If you look at swapinfo and you have a large amount of swap in use and then you see a high number in vmstat for fault, the machine is short on RAM for the load you have on it. So far in my experience nothing hurts a machine as badly as hitting swap (given that you have adequate CPU/disks). Once you start to hit swap heavily you need to do something (if you can...) such as moving services to another machine or putting in more memory. Instead of looking for fixed number I think that relative figures are more important.. like looking at your machines at their lowest usage and then at their busiest.. or at spikes.. If at slow times of activity the machines are already falling behind on "b", "r" on vmstat.. then that machine is overloaded. One possible quick way to start benchmarking your machines, until you can do something better is to capture snapshots of vmstat every 15 to 30 minutes and take a look.. perhaps even write a short script to summarize it. On my list of things to do.. is to do a simple setup of that nature.. just because it would be easy to setup and can provide very valuable information until you setup something more feature rich. "top" in 5.X branch and up is also very userfull. If you hit "m" it shows you disk processes so you can see what programs are doing the most I/O. One thing to watch out for in top when using 'm' is if you see all low numbers ( hit 'o' to sort and then type 'total').. is that you may have lots of programs doing little I/O, but their combined load is a problem for your disk subsystem.... like having 200+ IMAP connections. Each single IMAP connection may not be doing more than a handfull of transactions per second, but all of them combined can give a disk subsystem a pretty good workout. The load averages from 'w' are also good figures to do comparative tests. I started to wokr on a script (but needs more work) that dumps 'w' and 'vmstat' .. next have to work on parsing them and giving summaries. In particular one wants to know peak times.. since that is the best time to determine if the machine can handle it's load.. and more importantly spikes. If a machine is usually under 2.. and it spikes at 5+.. that machine is possibly able to do "normal" loads, but may not be able to handle spikes in traffic (ie a customer doing a mailing list, or a site just got press.. and there are a larger number than usual of people going to their URL). I still thinkg I have MUCH, MUCH to learn.. but I would be glad to expand on anything mentioned above.. or anything else. Ultimately each machine/company is unique enough that absolute numbers from other people (ie what is a good value for 'r' and 'b' to be around most of the time) may be less important than learning what are the different figures for your different machines under "normal" operation.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?cone.1136146236.889316.12360.1000>