Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Jan 2011 17:10:30 +0100
From:      Ivan Voras <ivoras@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Namecache lock contention?
Message-ID:  <AANLkTimyFXopbVvJuTYH0Ck2Z4ze5s8F_nb1KFn00FnG@mail.gmail.com>
In-Reply-To: <201101281015.36218.jhb@freebsd.org>
References:  <ihuhav$qso$1@dough.gmane.org> <201101281015.36218.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28 January 2011 16:15, John Baldwin <jhb@freebsd.org> wrote:
> On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote:
>> I have this situation on a PHP server:
>>
>> 36623 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A076 =C2=A0 =C2=A00 =C2=A0 2=
37M 30600K *Name =C2=A0 6 =C2=A0 0:14 47.27% php-cgi
>> 36638 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A076 =C2=A0 =C2=A00 =C2=A0 2=
37M 30600K *Name =C2=A0 3 =C2=A0 0:14 46.97% php-cgi
>> 36628 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 2 =C2=A0 0:14 46.88% php-cgi
>> 36627 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 0 =C2=A0 0:14 46.78% php-cgi
>> 36639 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 5 =C2=A0 0:14 46.58% php-cgi
>> 36643 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 7 =C2=A0 0:14 46.39% php-cgi
>> 36629 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A076 =C2=A0 =C2=A00 =C2=A0 2=
37M 30600K *Name =C2=A0 1 =C2=A0 0:14 46.39% php-cgi
>> 36642 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 2 =C2=A0 0:14 46.39% php-cgi
>> 36626 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 5 =C2=A0 0:14 46.19% php-cgi
>> 36654 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 7 =C2=A0 0:13 46.19% php-cgi
>> 36645 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 1 =C2=A0 0:14 45.75% php-cgi
>> 36625 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 0 =C2=A0 0:14 45.56% php-cgi
>> 36624 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 6 =C2=A0 0:14 45.56% php-cgi
>> 36630 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A076 =C2=A0 =C2=A00 =C2=A0 2=
37M 30600K *Name =C2=A0 7 =C2=A0 0:14 45.17% php-cgi
>> 36631 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K RUN =C2=A0 =C2=A0 4 =C2=A0 0:14 45.17% php-cgi
>> 36636 www =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 105 =C2=A0 =C2=A00 =C2=A0 237M 3=
0600K *Name =C2=A0 3 =C2=A0 0:14 44.87% php-cgi
>>
>> It looks like periodically most or all of the php-cgi processes are
>> blocked in "*Name" for long enough that "top" notices, then continue,
>> probably in a "thundering herd" way. From grepping inside /sys the most
>> likely suspect seems to be something in the namecache, but I can't find
>> exactly a symbol named "Name" or string beginning with "Name" that would
>> be connected to a lock.
>
> In vfs_cache.c:
>
> static struct rwlock cache_lock;
> RW_SYSINIT(vfscache, &cache_lock, "Name Cache");

You're right, I misread it as SYSCTL at a glance.

> What are the php scripts doing? =C2=A0Do they all try to create and delet=
e files at
> the same time (or do renames)?

Right again - they do simultaneously create session files and in rare
occasions (1%) delete them. These are "sharded" into a two-level
directory structure by single letter (/storage/a/b/file, i.e. 32^2
directories); dirhash is large enough.

During all this, the web server did around 60 PHP pages per second so
it doesn't look to me like there should be such noticable contention
(i.e. at most, there are 60 files/s created and on average 60/100
deletes). The file system is on softupdates, there's only light IO.

Typical vmstat is:

 procs      memory      page                    disks     faults         cp=
u
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy
cs us sy id

17 0 0   8730M  1240M     3   0   0   0   206   0   1   0 1948 266928
15079 65 34  1
19 0 0   8730M  1240M     0   0   0   0   290   0   1  24 1835 260618
15132 63 35  2
 7 0 0   8730M  1239M     0   0   0   0   200   0   0   0 1822 260783
14851 63 35  2
16 0 0   8730M  1239M     0   0   0   0   199   0 788   0 2744 259902
20465 61 37  2
16 0 0   8730M  1239M     0   0   0   0   210   0   0   0 1755 265081
17564 61 37  2

(8 cores; around 35% sys load across them - I'm trying to find out why).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimyFXopbVvJuTYH0Ck2Z4ze5s8F_nb1KFn00FnG>