Date: Sat, 24 Jun 2006 22:56:08 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: "Marc G. Fournier" <scrappy@hub.org> Cc: freebsd-stable@freebsd.org, Dmitry Morozovsky <marck@rinet.ru> Subject: Re: vmstat 'b' (disk busy?) field keeps climbing ... Message-ID: <20060624195608.GE79678@deviant.kiev.zoral.com.ua> In-Reply-To: <20060624164343.V1114@ganymede.hub.org> References: <20060623172557.H1114@ganymede.hub.org> <261AD16B-C3FE-4671-996E-563053508CE8@mac.com> <20060624022227.X1114@ganymede.hub.org> <20060624115505.E14669@woozle.rinet.ru> <20060624090656.GB79678@deviant.kiev.zoral.com.ua> <20060624145432.A1114@ganymede.hub.org> <20060624185203.GC79678@deviant.kiev.zoral.com.ua> <20060624190912.GD79678@deviant.kiev.zoral.com.ua> <20060624164343.V1114@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--DrWhICOqskFTAXiy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jun 24, 2006 at 04:45:49PM -0300, Marc G. Fournier wrote: > On Sat, 24 Jun 2006, Kostik Belousov wrote: >=20 > >On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote: > >>On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote: > >>>On Sat, 24 Jun 2006, Kostik Belousov wrote: > >>> > >>>>On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote: > >>>>>On Sat, 24 Jun 2006, Marc G. Fournier wrote: > >>>>> > >>>>>MGF> > 'b' stands for "blocked", not "busy". Judging by your page= =20 > >>>>>fault > >>>>>rate > >>>>>MGF> > and the high number of frees and pages being scanned, you're > >>>>>probably > >>>>>MGF> > swapping tasks in and out and are waiting on disk. Take a lo= ok=20 > >>>>>at > >>>>>MGF> > "vmstat -s", and consider adding more RAM if this is correct.= .. > >>>>>MGF> > >>>>>MGF> is there a way of finding out what processes are blocked? > >>>>> > >>>>>Aren't they in 'D' status by ps? > >>>>Use ps axlww. In this way, at least actual blocking points are shown. > >>> > >>>'k, stupid question then ... what am I searching for? > >>> > >>># ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr > >>> 654 select > >>> 230 lockf > >>> 166 wait > >>> 85 - > >>> 80 piperd > >>> 71 nanslp > >>> 33 kserel > >>> 22 user > >>> 10 pause > >>> 9 ttyin > >>> 5 sbwait > >>> 3 psleep > >>> 3 accept > >>> 2 kqread > >>> 2 Giant > >>> 1 vlruwt > >>> 1 syncer > >>> 1 sdflus > >>> 1 ppwait > >>> 1 ktrace > >>> 1 MWCHAN > >>> > >>>According to vmstat, I'm holding at '4 blocked' for the most part ... > >>>sbwwait is socket related, not disk ... and none of the others look ri= ght > >>>... > >>I would say, using big magic cristall ball, that you problems are > >>not kernel-related. I see only too suspicious points: > >> > >>1. high number of pipe readers and waiters for file locks. It may be > >>normal for your load. > >> > >>2. 2 Giant holders/lockers. Is it constant ? Are the processes=20 > >>holding/waiting > >>for Giant are the same ? > >> > >>Anyway, being in your shoes, I would start looking at applications. > >> > >>Ah, and does dmesg show anything ? > > > >And another question: what are the processes in the state "user" ? > >I never see that state. More, search thru the sources does not show > >what this could be. >=20 > Odd, I'm not finding any, but, I did get a Giant on a grep of the ps=20 > listing:: >=20 > pluto# ps axlww | grep " user " > 0 93055 46540 0 96 0 348 212 Giant L+ p4 0:00.00 grep = =20 > user >=20 > Not sure where those 'user' came from though ... just ran the above again: >=20 > # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr > 603 select > 231 lockf > 71 nanslp > 33 - > 30 kserel > 23 wait > 9 ttyin > 9 sbwait > 7 pause > 6 accept > 4 piperd > 3 psleep > 3 kqread > 3 Giant > 1 syncer > 1 sdflus > 1 ppwait > 1 pgzero > 1 ktrace > 1 MWCHAN >=20 > And nothing ... >=20 > Got a Giant lock on sshd too? >=20 > pluto# ps axlww | grep Giant > 0 693 556 1 96 0 6096 2080 Giant Ls ?? 0:02.18 sshd:= =20 > root@ttyp0 (sshd) > 0 94334 46540 0 96 0 348 208 - R+ p4 0:00.00 grep= =20 > Giant Everything looks normal, transient Giant aquire/contention is quite normal, esp. when you have several Giant-locked kernel parts. I strongly suggest to move point of investigation to the application(s) itself. Kernel seems to be innocent. [Deadlock due to disk driver/Giant/fs immediately shows as HUGE number of processes in D state with completely different set of wait states. All your processes do select/wait for file lock/read from pipe/something threaded.] --DrWhICOqskFTAXiy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEnZjXC3+MBN1Mb4gRAp/nAJ0V6e9QPMbviL/6i16aFiqfLFF1ZwCg6UTk TAowGAe4xjgBMEpbsztNquQ= =KWPd -----END PGP SIGNATURE----- --DrWhICOqskFTAXiy--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060624195608.GE79678>