From owner-freebsd-questions@FreeBSD.ORG Mon Jan 2 05:40:41 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6DE4816A41F for ; Mon, 2 Jan 2006 05:40:41 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.FreeBSD.org (Postfix) with ESMTP id 83DAF43D49 for ; Mon, 2 Jan 2006 05:40:40 +0000 (GMT) (envelope-from scrappy@hub.org) Received: from localhost (unknown [200.46.204.144]) by hub.org (Postfix) with ESMTP id 97F5A62C87E; Mon, 2 Jan 2006 01:40:39 -0400 (AST) Received: from hub.org ([200.46.204.220]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 26302-10; Mon, 2 Jan 2006 01:40:39 -0400 (AST) Received: from ganymede.hub.org (blk-222-82-85.eastlink.ca [24.222.82.85]) by hub.org (Postfix) with ESMTP id 001D262C844; Mon, 2 Jan 2006 01:40:38 -0400 (AST) Received: by ganymede.hub.org (Postfix, from userid 1000) id 39FFA394CF; Mon, 2 Jan 2006 01:40:38 -0400 (AST) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 39025394AA; Mon, 2 Jan 2006 01:40:38 -0400 (AST) Date: Mon, 2 Jan 2006 01:40:38 -0400 (AST) From: "Marc G. Fournier" To: Francisco Reyes In-Reply-To: Message-ID: <20060102013941.A1088@ganymede.hub.org> References: <20051227211433.J1087@ganymede.hub.org> <20060101145325.X1088@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: by amavisd-new at hub.org Cc: freebsd-questions@freebsd.org Subject: Re: "Load Balancing": How Busy are the servers? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jan 2006 05:40:41 -0000 I just installed cacti, which seems fairly useful for 'long term views' of how a server is doing ... now I have to figure out what SNMP MIBs related to all of the "important things" :( On Sun, 1 Jan 2006, Francisco Reyes wrote: > Marc G. Fournier writes: > >> For all the technology, I was kinda hoping for some 'scientific formula' :) > > There are.. > >> Now, I really hate to ask, but how do you use vmstat to get a feel for how >> busy the disk subsystem is? > > For me, reading "Absolute BSD" by Michael Lucas was very helpfull. > In particular Chapter 18, System performance. > > The three columns I look at are for vmstat "r" and "b" on the left, and > "fault". > > "r" shows how many processes are waiting for CPU, "b" shows how many > processes are waiting for disk. The fault column(s) show how badly your > system is accesing swap. > > Quick example: > r b w > 2 5 0 > 1 5 0 > 2 4 0 > 2 5 0 > 3 4 0 > 1 5 0 > 1 5 0 > > > That's from my home machine as I am doing some backups. > The machine at this point is more disk bound than CPU bound with 4 to 5 disk > operations at any point in time waiting for disk access > > I am also falling behind in CPU, but not as bad. > > On the far right of vmsat you also have CPU stats.. in my case the vmstat > from the above lines showed 70% to 90% iddle which confirmed I was disk bound > at that point. > The fault column show you how actively you are using swap. The lines above > had between 30 and 200 approximately. If you look at swapinfo and you have a > large amount of swap in use and then you see a high number in vmstat for > fault, the machine is short on RAM for the load you have on it. > > So far in my experience nothing hurts a machine as badly as hitting swap > (given that you have adequate CPU/disks). Once you start to hit swap heavily > you need to do something (if you can...) such as moving services to another > machine or putting in more memory. > > Instead of looking for fixed number I think that relative figures are more > important.. like looking at your machines at their lowest usage and then at > their busiest.. or at spikes.. If at slow times of activity the machines are > already falling behind on "b", "r" on vmstat.. then that machine is > overloaded. > > One possible quick way to start benchmarking your machines, until you can do > something better is to capture snapshots of vmstat every 15 to 30 minutes and > take a look.. perhaps even write a short script to summarize it. On my list > of things to do.. is to do a simple setup of that nature.. just because it > would be easy to setup and can provide very valuable information until you > setup something more feature rich. > > "top" in 5.X branch and up is also very userfull. If you hit "m" it shows you > disk processes so you can see what programs are doing the most I/O. > > One thing to watch out for in top when using 'm' is if you see all low > numbers ( hit 'o' to sort and then type 'total').. is that you may have lots > of programs doing little I/O, but their combined load is a problem for your > disk subsystem.... like having 200+ IMAP connections. Each single IMAP > connection may not be doing more than a handfull of transactions per second, > but all of them combined can give a disk subsystem a pretty good workout. > > The load averages from 'w' are also good figures to do comparative tests. I > started to wokr on a script (but needs more work) that dumps 'w' and 'vmstat' > .. next have to work on parsing them and giving summaries. In particular one > wants to know peak times.. since that is the best time to determine if the > machine can handle it's load.. and more importantly spikes. If a machine is > usually under 2.. and it spikes at 5+.. that machine is possibly able to do > "normal" loads, but may not be able to handle spikes in traffic (ie a > customer doing a mailing list, or a site just got press.. and there are a > larger number than usual of people going to their URL). > > I still thinkg I have MUCH, MUCH to learn.. but I would be glad to expand on > anything mentioned above.. or anything else. Ultimately each machine/company > is unique enough that absolute numbers from other people (ie what is a good > value for 'r' and 'b' to be around most of the time) may be less important > than learning what are the different figures for your different machines > under "normal" operation. > > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664