Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Jun 2006 22:56:08 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-stable@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>
Subject:   Re: vmstat 'b' (disk busy?) field keeps climbing ...
Message-ID:  <20060624195608.GE79678@deviant.kiev.zoral.com.ua>
In-Reply-To: <20060624164343.V1114@ganymede.hub.org>
References:  <20060623172557.H1114@ganymede.hub.org> <261AD16B-C3FE-4671-996E-563053508CE8@mac.com> <20060624022227.X1114@ganymede.hub.org> <20060624115505.E14669@woozle.rinet.ru> <20060624090656.GB79678@deviant.kiev.zoral.com.ua> <20060624145432.A1114@ganymede.hub.org> <20060624185203.GC79678@deviant.kiev.zoral.com.ua> <20060624190912.GD79678@deviant.kiev.zoral.com.ua> <20060624164343.V1114@ganymede.hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--DrWhICOqskFTAXiy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jun 24, 2006 at 04:45:49PM -0300, Marc G. Fournier wrote:
> On Sat, 24 Jun 2006, Kostik Belousov wrote:
>=20
> >On Sat, Jun 24, 2006 at 09:52:03PM +0300, Kostik Belousov wrote:
> >>On Sat, Jun 24, 2006 at 02:57:27PM -0300, Marc G. Fournier wrote:
> >>>On Sat, 24 Jun 2006, Kostik Belousov wrote:
> >>>
> >>>>On Sat, Jun 24, 2006 at 11:55:26AM +0400, Dmitry Morozovsky wrote:
> >>>>>On Sat, 24 Jun 2006, Marc G. Fournier wrote:
> >>>>>
> >>>>>MGF> > 'b' stands for "blocked", not "busy".  Judging by your page=
=20
> >>>>>fault
> >>>>>rate
> >>>>>MGF> > and the high number of frees and pages being scanned, you're
> >>>>>probably
> >>>>>MGF> > swapping tasks in and out and are waiting on disk.  Take a lo=
ok=20
> >>>>>at
> >>>>>MGF> > "vmstat -s", and consider adding more RAM if this is correct.=
..
> >>>>>MGF>
> >>>>>MGF> is there a way of finding out what processes are blocked?
> >>>>>
> >>>>>Aren't they in 'D' status by ps?
> >>>>Use ps axlww. In this way, at least actual blocking points are shown.
> >>>
> >>>'k, stupid question then ... what am I searching for?
> >>>
> >>># ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr
> >>> 654 select
> >>> 230 lockf
> >>> 166 wait
> >>>  85 -
> >>>  80 piperd
> >>>  71 nanslp
> >>>  33 kserel
> >>>  22 user
> >>>  10 pause
> >>>   9 ttyin
> >>>   5 sbwait
> >>>   3 psleep
> >>>   3 accept
> >>>   2 kqread
> >>>   2 Giant
> >>>   1 vlruwt
> >>>   1 syncer
> >>>   1 sdflus
> >>>   1 ppwait
> >>>   1 ktrace
> >>>   1 MWCHAN
> >>>
> >>>According to vmstat, I'm holding at '4 blocked' for the most part ...
> >>>sbwwait is socket related, not disk ... and none of the others look ri=
ght
> >>>...
> >>I would say, using big magic cristall ball, that you problems are
> >>not kernel-related. I see only too suspicious points:
> >>
> >>1. high number of pipe readers and waiters for file locks. It may be
> >>normal for your load.
> >>
> >>2. 2 Giant holders/lockers. Is it constant ? Are the processes=20
> >>holding/waiting
> >>for Giant are the same ?
> >>
> >>Anyway, being in your shoes, I would start looking at applications.
> >>
> >>Ah, and does dmesg show anything ?
> >
> >And another question: what are the processes in the state "user" ?
> >I never see that state. More, search thru the sources does not show
> >what this could be.
>=20
> Odd, I'm not finding any, but, I did get a Giant on a grep of the ps=20
> listing::
>=20
> pluto# ps axlww | grep " user "
>     0 93055 46540   0  96  0   348   212 Giant  L+    p4    0:00.00 grep =
=20
>     user
>=20
> Not sure where those 'user' came from though ... just ran the above again:
>=20
> # ps axlww | awk '{print $9}' | sort | uniq -c | sort -nr
>  603 select
>  231 lockf
>   71 nanslp
>   33 -
>   30 kserel
>   23 wait
>    9 ttyin
>    9 sbwait
>    7 pause
>    6 accept
>    4 piperd
>    3 psleep
>    3 kqread
>    3 Giant
>    1 syncer
>    1 sdflus
>    1 ppwait
>    1 pgzero
>    1 ktrace
>    1 MWCHAN
>=20
> And nothing ...
>=20
> Got a Giant lock on sshd too?
>=20
> pluto# ps axlww | grep Giant
>     0   693   556   1  96  0  6096  2080 Giant  Ls    ??    0:02.18 sshd:=
=20
>     root@ttyp0 (sshd)
>     0 94334 46540   0  96  0   348   208 -      R+    p4    0:00.00 grep=
=20
>     Giant
Everything looks normal, transient Giant aquire/contention is quite normal,
esp. when you have several Giant-locked kernel parts.

I strongly suggest to move point of investigation to the application(s)
itself. Kernel seems to be innocent.

[Deadlock due to disk driver/Giant/fs immediately shows as HUGE number
of processes in D state with completely different set of wait states.
All your processes do select/wait for file lock/read from pipe/something
threaded.]

--DrWhICOqskFTAXiy
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEnZjXC3+MBN1Mb4gRAp/nAJ0V6e9QPMbviL/6i16aFiqfLFF1ZwCg6UTk
TAowGAe4xjgBMEpbsztNquQ=
=KWPd
-----END PGP SIGNATURE-----

--DrWhICOqskFTAXiy--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060624195608.GE79678>