Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Dec 2007 23:36:50 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-net@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: Packet loss every 30.999 seconds
Message-ID:  <20071218233644.U756@besplex.bde.org>
In-Reply-To: <47676E96.4030708@samsco.org>
References:  <D50B5BA8-5A80-4370-8F20-6B3A531C2E9B@eng.oar.net> <20071217103936.GR25053@tnn.dglawrence.com> <20071218170133.X32807@delplex.bde.org> <47676E96.4030708@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 17 Dec 2007, Scott Long wrote:

> Bruce Evans wrote:
>> On Mon, 17 Dec 2007, David G Lawrence wrote:
>> 
>>>   One more comment on my last email... The patch that I included is not
>>> meant as a real fix - it is just a bandaid. The real problem appears to
>>> be that a very large number of vnodes (all of them?) are getting synced
>>> (i.e. calling ffs_syncvnode()) every time. This should normally only
>>> happen for dirty vnodes. I suspect that something is broken with this
>>> check:
>>> 
>>>        if (vp->v_type == VNON || ((ip->i_flag &
>>>            (IN_ACCESS | IN_CHANGE | IN_MODIFIED | IN_UPDATE)) == 0 &&
>>>             vp->v_bufobj.bo_dirty.bv_cnt == 0)) {
>>>                VI_UNLOCK(vp);
>>>                continue;
>>>        }
>> 
>> Isn't it just the O(N) algorithm with N quite large?  Under ~5.2, on

> Right, it's a non-optimal loop when N is very large, and that's a fairly
> well understood problem.  I think what DG was getting at, though, is
> that this massive flush happens every time the syncer runs, which
> doesn't seem correct.  Sure, maybe you just rsynced 100,000 files 20
> seconds ago, so the upcoming flush is going to be expensive.  But the
> next flush 30 seconds after that shouldn't be just as expensive, yet it
> appears to be so.

I'm sure it doesn't cause many bogus flushes.  iostat shows zero writes
caused by calling this incessantly using "while :; do sync; done".

> This is further supported by the original poster's
> claim that it takes many hours of uptime before the problem becomes
> noticeable.  If vnodes are never truly getting cleaned, or never getting
> their flags cleared so that this loop knows that they are clean, then
> it's feasible that they'll accumulate over time, keep on getting flushed
> every 30 seconds, keep on bogging down the loop, and so on.

Using "find / >/dev/null" to grow the problem and make it bad after a
few seconds of uptime, and profiling of a single sync(2) call to show
that nothing much is done except the loop containing the above:

under ~5.2, on a 2.2GHz A64 UP ini386 mode:

after booting, with about 700 vnodes:

%   %   cumulative   self              self     total 
%  time   seconds   seconds    calls  ns/call  ns/call  name 
%  30.8      0.000    0.000        0  100.00%           mcount [4]
%  14.9      0.001    0.000        0  100.00%           mexitcount [5]
%   5.5      0.001    0.000        0  100.00%           cputime [16]
%   5.0      0.001    0.000        6    13312    13312  vfs_msync [18]
%   4.3      0.001    0.000        0  100.00%           user [21]
%   3.5      0.001    0.000        5    11321    11993  ffs_sync [23]

after "find / >/dev/null" was stopped after saturating at 64000 vnodes
(desiredvodes is 70240):

%   %   cumulative   self              self     total 
%  time   seconds   seconds    calls  ns/call  ns/call  name 
%  50.7      0.008    0.008        5  1666427  1667246  ffs_sync [5]
%  38.0      0.015    0.006        6  1041217  1041217  vfs_msync [6]
%   3.1      0.015    0.001        0  100.00%           mcount [7]
%   1.5      0.015    0.000        0  100.00%           mexitcount [8]
%   0.6      0.015    0.000        0  100.00%           cputime [22]
%   0.6      0.016    0.000       34     2660     2660  generic_bcopy [24]
%   0.5      0.016    0.000        0  100.00%           user [26]

vfs_msync() is a problem too.  It uses an almost identical loop for
the case where the vnode is not dirty (but has a different condition
for being dirty).  ffs_sync() is called 5 times because there are 5
ffs file systems mounted r/w.  There is another ffs file system mounted
r/o and that combined with a missing r/o optimization might give the
extra call to vfs_msync().  With 64000 vnodes, the calls take 1-2 ms
each.  That is already quite a lot, and there are many calls.  Each
call only looks at vnodes under the mount point so the number of mounted
file systems doesn't affect the total time much.

ffs_sync() i taking 125 ns per vnode.  That is a more than I would have
expected.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071218233644.U756>