From owner-freebsd-current@FreeBSD.ORG  Mon Jul  7 21:49:14 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 47E4337B401; Mon,  7 Jul 2003 21:49:14 -0700 (PDT)
Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 8FA0A43FD7; Mon,  7 Jul 2003 21:49:13 -0700 (PDT)
	(envelope-from dan@dan.emsphone.com)
Received: (from dan@localhost)
	by dan.emsphone.com (8.12.9/8.12.9) id h684nDBJ084013;
	Mon, 7 Jul 2003 23:49:13 -0500 (CDT)
	(envelope-from dan)
Date: Mon, 7 Jul 2003 23:49:13 -0500
From: Dan Nelson <dnelson@allantgroup.com>
To: Andy Farkas <andyf@speednet.com.au>
Message-ID: <20030708044912.GF87950@dan.emsphone.com>
References: <20030708035309.GE87950@dan.emsphone.com>
	<20030708135908.I6312-100000@hewey.af.speednet.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030708135908.I6312-100000@hewey.af.speednet.com.au>
X-OS: FreeBSD 5.1-CURRENT
X-message-flag: Outlook Error
User-Agent: Mutt/1.5.4i
cc: freebsd-current@freebsd.org
cc: freebsd-smp@freebsd.org
Subject: Re: whats going on with the scheduler?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jul 2003 04:49:14 -0000

In the last episode (Jul 08), Andy Farkas said:
> On Mon, 7 Jul 2003, Dan Nelson wrote:
> > > I bet those *Giants have something to do with it...
> >
> > Most likely.  That means they're waiting for some other process to
> > release the big Giant kernel lock.  Paste in top's header so we can see
> > how many processes are locked, and what the system cpu percentage is.
> 
> This is what top looks like (up to the 1st 0.00% process) when sitting
> idle* with 3 setiathomes:
> 
> 97 processes:  9 running, 71 sleeping, 4 zombie, 12 waiting, 1 lock
> CPU states:  4.0% user, 72.0% nice,  4.6% system,  0.7% interrupt, 18.8% idle
> 
>   PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> 42946 setiathome 139   15 16552K 15984K RUN    0  43.8H 98.00% 98.00% setiathome
> 42945 setiathome 139   15 16944K 15732K CPU1   1  43.0H 97.56% 97.56% setiathome
> 42947 setiathome 139   15 15524K 14956K CPU0   2  42.9H 94.14% 94.14% setiathome
> 
> Note how the seti procs are getting 94-98% cpu time.
> 
> When I do my scp thing, top looks like this:
> 
> 98 processes:  8 running, 71 sleeping, 4 zombie, 12 waiting, 3 lock
> CPU states:  1.7% user, 33.7% nice, 20.1% system,  0.6% interrupt, 43.9% idle
> 
>   PID USERNAME   PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> 42946 setiathome 139   15 16552K 15984K CPU3   2  44.0H 68.41% 68.41% setiathome
> 50296 andyf      125    0  3084K  2176K RUN    2   7:55 64.21% 64.21% ssh
>    12 root       -16    0     0K    12K CPU2   2 153.6H 48.78% 48.78% idle: cpu2
>    11 root       -16    0     0K    12K CPU3   3 153.6H 48.63% 48.63% idle: cpu3
>    13 root       -16    0     0K    12K RUN    1 150.2H 48.44% 48.44% idle: cpu1
>    14 root       -16    0     0K    12K RUN    0 144.8H 45.31% 45.31% idle: cpu0
> 42947 setiathome 130   15 15524K 14956K RUN    2  43.1H 28.56% 28.56% setiathome
> 42945 setiathome 125   15 15916K 14700K RUN    0  43.2H 25.05% 25.05% setiathome
> 
> Notice how 'nice' has gone to 33.7% and 'idle' to 43.9%, and the seti
> procs have dropped well below 94%.
> 
> > A truss of one of the seti processes may be useful too.  setiathome
> > really shouldn't be doing many syscalls at all.
> 
> If setiathome is making lots of syscalls, then running the 3 instanses
> should already show a problem, no?

Not if it's ssh that's holding Giant for longer than it should.   The
setiathome processes may be calling some really fast syscall 500 times
a second which doesn't cause a problem until ssh comes along and calls
some other syscall that takes .1 ms to return but also locks Giant long
enough to cause the other processes to all back up behind it.

-- 
	Dan Nelson
	dnelson@allantgroup.com