From owner-freebsd-stable@FreeBSD.ORG Sat Jun 24 19:56:29 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA3C516A49A for ; Sat, 24 Jun 2006 19:56:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (ll-227.216.82.212.sovam.net.ua [212.82.216.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8FF6743D6A for ; Sat, 24 Jun 2006 19:56:14 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k5OJu9dk098153 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 24 Jun 2006 22:56:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k5OJu9KQ001715; Sat, 24 Jun 2006 22:56:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k5OJu81D001714; Sat, 24 Jun 2006 22:56:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 24 Jun 2006 22:56:08 +0300 From: Kostik Belousov To: "Marc G. Fournier" Message-ID: <20060624195608.GE79678@deviant.kiev.zoral.com.ua> References: <20060623172557.H1114@ganymede.hub.org> <261AD16B-C3FE-4671-996E-563053508CE8@mac.com> <20060624022227.X1114@ganymede.hub.org> <20060624115505.E14669@woozle.rinet.ru> <20060624090656.GB79678@deviant.kiev.zoral.com.ua> <20060624145432.A1114@ganymede.hub.org> <20060624185203.GC79678@deviant.kiev.zoral.com.ua> <20060624190912.GD79678@deviant.kiev.zoral.com.ua> <20060624164343.V1114@ganymede.hub.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DrWhICOqskFTAXiy" Content-Disposition: inline In-Reply-To: <20060624164343.V1114@ganymede.hub.org> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: freebsd-stable@freebsd.org, Dmitry Morozovsky Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Jun 2006 19:56:29 -0000 --DrWhICOqskFTAXiy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 24, 2006 at 04:45:49PM -0300, Marc G. Fournier wrote: > On Sat, 24 Jun 2006, Kostik Belousov wrote: >=20 > >On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote: > >>On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: > >>>On Sat, 24 Jun 2006, Kostik Belousov wrote: > >>> > >>>>On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: > >>>>>On Sat, 24 Jun 2006, Marc G. Fournier wrote: > >>>>> > >>>>>MGF> > 'b' stands for "blocked", not "busy". Judging by your page= =20 > >>>>>fault > >>>>>rate > >>>>>MGF> > and the high number of frees and pages being scanned, you're > >>>>>probably > >>>>>MGF> > swapping tasks in and out and are waiting on disk. Take a lo= ok=20 > >>>>>at > >>>>>MGF> > "vmstat -s", and consider adding more RAM if this is correct.= .. > >>>>>MGF> > >>>>>MGF> is there a way of finding out what processes are blocked? > >>>>> > >>>>>Aren't they in 'D' status by ps? > >>>>Use ps axlww. In this way, at least actual blocking points are shown. > >>> > >>>'k, stupid question then ... what am I searching for? > >>> > >>># ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr > >>> 654 select > >>> 230 lockf > >>> 166 wait > >>> 85 - > >>> 80 piperd > >>> 71 nanslp > >>> 33 kserel > >>> 22 user > >>> 10 pause > >>> 9 ttyin > >>> 5 sbwait > >>> 3 psleep > >>> 3 accept > >>> 2 kqread > >>> 2 Giant > >>> 1 vlruwt > >>> 1 syncer > >>> 1 sdflus > >>> 1 ppwait > >>> 1 ktrace > >>> 1 MWCHAN > >>> > >>>According to vmstat, I'm holding at '4 blocked' for the most part ... > >>>sbwwait is socket related, not disk ... and none of the others look ri= ght > >>>... > >>I would say, using big magic cristall ball, that you problems are > >>not kernel-related. I see only too suspicious points: > >> > >>1. high number of pipe readers and waiters for file locks. It may be > >>normal for your load. > >> > >>2. 2 Giant holders/lockers. Is it constant ? Are the processes=20 > >>holding/waiting > >>for Giant are the same ? > >> > >>Anyway, being in your shoes, I would start looking at applications. > >> > >>Ah, and does dmesg show anything ? > > > >And another question: what are the processes in the state "user" ? > >I never see that state. More, search thru the sources does not show > >what this could be. >=20 > Odd, I'm not finding any, but, I did get a Giant on a grep of the ps=20 > listing:: >=20 > pluto# ps axlww | grep " user " > 0 93055 46540 0 96 0 348 212 Giant L+ p4 0:00.00 grep = =20 > user >=20 > Not sure where those 'user' came from though ... just ran the above again: >=20 > # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr > 603 select > 231 lockf > 71 nanslp > 33 - > 30 kserel > 23 wait > 9 ttyin > 9 sbwait > 7 pause > 6 accept > 4 piperd > 3 psleep > 3 kqread > 3 Giant > 1 syncer > 1 sdflus > 1 ppwait > 1 pgzero > 1 ktrace > 1 MWCHAN >=20 > And nothing ... >=20 > Got a Giant lock on sshd too? >=20 > pluto# ps axlww | grep Giant > 0 693 556 1 96 0 6096 2080 Giant Ls ?? 0:02.18 sshd:= =20 > root@ttyp0 (sshd) > 0 94334 46540 0 96 0 348 208 - R+ p4 0:00.00 grep= =20 > Giant Everything looks normal, transient Giant aquire/contention is quite normal, esp. when you have several Giant-locked kernel parts. I strongly suggest to move point of investigation to the application(s) itself. Kernel seems to be innocent. [Deadlock due to disk driver/Giant/fs immediately shows as HUGE number of processes in D state with completely different set of wait states. All your processes do select/wait for file lock/read from pipe/something threaded.] --DrWhICOqskFTAXiy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEnZjXC3+MBN1Mb4gRAp/nAJ0V6e9QPMbviL/6i16aFiqfLFF1ZwCg6UTk TAowGAe4xjgBMEpbsztNquQ= =KWPd -----END PGP SIGNATURE----- --DrWhICOqskFTAXiy--