From owner-freebsd-questions@FreeBSD.ORG  Mon Jan  2 05:40:41 2006
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6DE4816A41F
	for <freebsd-questions@freebsd.org>;
	Mon,  2 Jan 2006 05:40:41 +0000 (GMT) (envelope-from scrappy@hub.org)
Received: from hub.org (hub.org [200.46.204.220])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 83DAF43D49
	for <freebsd-questions@freebsd.org>;
	Mon,  2 Jan 2006 05:40:40 +0000 (GMT) (envelope-from scrappy@hub.org)
Received: from localhost (unknown [200.46.204.144])
	by hub.org (Postfix) with ESMTP id 97F5A62C87E;
	Mon,  2 Jan 2006 01:40:39 -0400 (AST)
Received: from hub.org ([200.46.204.220])
	by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
	with ESMTP id 26302-10; Mon,  2 Jan 2006 01:40:39 -0400 (AST)
Received: from ganymede.hub.org (blk-222-82-85.eastlink.ca [24.222.82.85])
	by hub.org (Postfix) with ESMTP id 001D262C844;
	Mon,  2 Jan 2006 01:40:38 -0400 (AST)
Received: by ganymede.hub.org (Postfix, from userid 1000)
	id 39FFA394CF; Mon,  2 Jan 2006 01:40:38 -0400 (AST)
Received: from localhost (localhost [127.0.0.1])
	by ganymede.hub.org (Postfix) with ESMTP id 39025394AA;
	Mon,  2 Jan 2006 01:40:38 -0400 (AST)
Date: Mon, 2 Jan 2006 01:40:38 -0400 (AST)
From: "Marc G. Fournier" <scrappy@hub.org>
To: Francisco Reyes <lists@stringsutils.com>
In-Reply-To: <cone.1136146236.889316.12360.1000@zoraida.natserv.net>
Message-ID: <20060102013941.A1088@ganymede.hub.org>
References: <20051227211433.J1087@ganymede.hub.org>
	<cone.1136049494.118589.27817.1000@zoraida.natserv.net>
	<20060101145325.X1088@ganymede.hub.org>
	<cone.1136146236.889316.12360.1000@zoraida.natserv.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Virus-Scanned: by amavisd-new at hub.org
Cc: freebsd-questions@freebsd.org
Subject: Re: "Load Balancing": How Busy are the      servers?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jan 2006 05:40:41 -0000


I just installed cacti, which seems fairly useful for 'long term views' of 
how a server is doing ... now I have to figure out what SNMP MIBs related 
to all of the "important things" :(


On Sun, 1 Jan 2006, Francisco Reyes wrote:

> Marc G. Fournier writes:
>
>> For all the technology, I was kinda hoping for some 'scientific formula' :)
>
> There are..
>
>> Now, I really hate to ask, but how do you use vmstat to get a feel for how 
>> busy the disk subsystem is?
>
> For me, reading "Absolute BSD" by Michael Lucas was very helpfull.
> In particular Chapter 18, System performance.
>
> The three columns I look at are for vmstat "r" and "b" on the left, and 
> "fault".
>
> "r" shows how many processes are waiting for CPU, "b" shows how many 
> processes are waiting for disk. The fault column(s) show how badly your 
> system is accesing swap.
>
> Quick example:
> r b w
> 2 5 0
> 1 5 0
> 2 4 0
> 2 5 0
> 3 4 0
> 1 5 0
> 1 5 0
>
>
> That's from my home machine as I am doing some backups.
> The machine at this point is more disk bound than CPU bound with 4 to 5 disk 
> operations at any point in time waiting for disk access
>
> I am also falling behind in CPU, but not as bad.
>
> On the far right of vmsat you also have CPU stats.. in my case the vmstat 
> from the above lines showed 70% to 90% iddle which confirmed I was disk bound 
> at that point. 
> The fault column show you how actively you are using swap. The lines above 
> had between 30 and 200 approximately. If you look at swapinfo and you have a 
> large amount of swap in use and then you see a high number in vmstat for 
> fault, the machine is short on RAM for the load you have on it.
>
> So far in my experience nothing hurts a machine as badly as hitting swap 
> (given that you have adequate CPU/disks). Once you start to hit swap heavily 
> you need to do something (if you can...) such as moving services to another 
> machine or putting in more memory.
>
> Instead of looking for fixed number I think that relative figures are more 
> important.. like looking at your machines at their lowest usage and then at 
> their busiest.. or at spikes.. If at slow times of activity the machines are 
> already falling behind on "b", "r" on vmstat.. then that machine is 
> overloaded.
>
> One possible quick way to start benchmarking your machines, until you can do 
> something better is to capture snapshots of vmstat every 15 to 30 minutes and 
> take a look.. perhaps even write a short script to summarize it. On my list 
> of things to do.. is to do a simple setup of that nature.. just because it 
> would be easy to setup and can provide very valuable information until you 
> setup something more feature rich. 
>
> "top" in 5.X branch and up is also very userfull. If you hit "m" it shows you 
> disk processes so you can see what programs are doing the most I/O.
>
> One thing to watch out for in top when using 'm' is if you see all low 
> numbers ( hit 'o' to sort and then type 'total').. is that you may have lots 
> of programs doing little I/O, but their combined load is a problem for your 
> disk subsystem.... like having 200+ IMAP connections. Each single IMAP 
> connection may not be doing more than a handfull of transactions per second, 
> but all of them combined can give a disk subsystem a pretty good workout.
>
> The load averages from 'w' are also good figures to do comparative tests. I 
> started to wokr on a script (but needs more work) that dumps 'w' and 'vmstat' 
> .. next have to work on parsing them and giving summaries. In particular one 
> wants to know peak times.. since that is the best time to determine if the 
> machine can handle it's load.. and more importantly spikes. If a machine is 
> usually under 2.. and it spikes at 5+.. that machine is possibly able to do 
> "normal" loads, but may not be able to handle spikes in traffic (ie a 
> customer doing  a mailing list, or a site just got press.. and there are a 
> larger number than usual of people going to their URL).
>
> I still thinkg I have MUCH, MUCH to learn.. but I would be glad to expand on 
> anything mentioned above.. or anything else. Ultimately each machine/company 
> is unique enough that absolute numbers from other people (ie what is a good 
> value for 'r' and 'b' to be around most of the time) may be less important 
> than learning what are the different figures for your different machines 
> under "normal" operation.
>
>

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664