From owner-freebsd-hackers@FreeBSD.ORG Wed Mar 21 20:38:37 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 394991065670 for ; Wed, 21 Mar 2012 20:38:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6DD928FC16 for ; Wed, 21 Mar 2012 20:38:36 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q2LKcT90068141; Wed, 21 Mar 2012 22:38:29 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q2LKcTq0055106; Wed, 21 Mar 2012 22:38:29 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q2LKcSsk055105; Wed, 21 Mar 2012 22:38:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 21 Mar 2012 22:38:28 +0200 From: Konstantin Belousov To: Svatopluk Kraus Message-ID: <20120321203828.GW2358@deviant.kiev.zoral.com.ua> References: <20120312181921.GF75778@deviant.kiev.zoral.com.ua> <20120315112959.GP75778@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="j2Klb18PAKd8hQ5U" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: hackers@freebsd.org Subject: Re: [vfs] buf_daemon() slows down write() severely on low-speed CPU X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Mar 2012 20:38:37 -0000 --j2Klb18PAKd8hQ5U Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 15, 2012 at 08:00:41PM +0100, Svatopluk Kraus wrote: > 2012/3/15 Konstantin Belousov : > > On Tue, Mar 13, 2012 at 01:54:38PM +0100, Svatopluk Kraus wrote: > >> On Mon, Mar 12, 2012 at 7:19 PM, Konstantin Belousov > >> wrote: > >> > On Mon, Mar 12, 2012 at 04:00:58PM +0100, Svatopluk Kraus wrote: > >> >> Hi, > >> >> > >> >> =9A =9AI have solved a following problem. If a big file (according = to > >> >> 'hidirtybuffers') is being written, the write speed is very poor. > >> >> > >> >> =9A =9AIt's observed on system with elan 486 and 32MB RAM (i.e., lo= w speed > >> >> CPU and not too much memory) running FreeBSD-9. > >> >> > >> >> =9A =9AAnalysis: A file is being written. All or almost all dirty b= uffers > >> >> belong to the file. The file vnode is almost all time locked by > >> >> writing process. The buf_daemon() can not flush any dirty buffer as= a > >> >> chance to acquire the file vnode lock is very low. A number of dirty > >> >> buffers grows up very slow and with each new dirty buffer slower, > >> >> because buf_daemon() eats more and more CPU time by looping on dirty > >> >> buffers queue (with very low or no effect). > >> >> > >> >> =9A =9AThis slowing down effect is started by buf_daemon() itself, = when > >> >> 'numdirtybuffers' reaches 'lodirtybuffers' threshold and buf_daemon= () > >> >> is waked up by own timeout. The timeout fires at 'hz' period, but > >> >> starts to fire at 'hz/10' immediately as buf_daemon() fails to reach > >> >> 'lodirtybuffers' threshold. When 'numdirtybuffers' (now slowly) > >> >> reaches ((lodirtybuffers + hidirtybuffers) / 2) threshold, the > >> >> buf_daemon() can be waked up within bdwrite() too and it's much wor= se. > >> >> Finally and with very slow speed, the 'hidirtybuffers' or > >> >> 'dirtybufthresh' is reached, the dirty buffers are flushed, and > >> >> everything starts from beginning... > >> > Note that for some time, bufdaemon work is distributed among bufdaem= on > >> > thread itself and any thread that fails to allocate a buffer, esp. > >> > a thread that owns vnode lock and covers long queue of dirty buffers. > >> > >> However, the problem starts when numdirtybuffers reaches > >> lodirtybuffers count and ends around hidirtybuffers count. There are > >> still plenty of free buffers in system. > >> > >> >> > >> >> =9A =9AOn the system, a buffer size is 512 bytes and the default > >> >> thresholds are following: > >> >> > >> >> =9A =9Avfs.hidirtybuffers =3D 134 > >> >> =9A =9Avfs.lodirtybuffers =3D 67 > >> >> =9A =9Avfs.dirtybufthresh =3D 120 > >> >> > >> >> =9A =9AFor example, a 2MB file is copied into flash disk in about 3 > >> >> minutes and 15 second. If dirtybufthresh is set to 40, the copy time > >> >> is about 20 seconds. > >> >> > >> >> =9A =9AMy solution is a mix of three things: > >> >> =9A =9A1. Suppresion of buf_daemon() wakeup by setting bd_request t= o 1 in > >> >> the main buf_daemon() loop. > >> > I cannot understand this. Please provide a patch that shows what do > >> > you mean there. > >> > > >> =9A =9A =9A curthread->td_pflags |=3D TDP_NORUNNINGBUF | TDP_BUFNEED; > >> =9A =9A =9A mtx_lock(&bdlock); > >> =9A =9A =9A for (;;) { > >> - =9A =9A =9A =9A =9A =9A bd_request =3D 0; > >> + =9A =9A =9A =9A =9A =9A bd_request =3D 1; > >> =9A =9A =9A =9A =9A =9A =9A mtx_unlock(&bdlock); > > Is this a complete patch ? The change just causes lost wakeups for bufd= aemon, > > nothing more. > Yes, it's a complete patch. And exactly, it causes lost wakeups which are: > 1. !! UNREASONABLE !!, because bufdaemon is not sleeping, > 2. not wanted, because it looks that it's correct behaviour for the > sleep with hz/10 period. However, if the sleep with hz/10 period is > expected to be waked up by bd_wakeup(), then bd_request should be set > to 0 just before sleep() call, and then bufdaemon behaviour will be > clear. No, your description is wrong. If bufdaemon is unable to flush enough buffers and numdirtybuffers still greater then lodirtybuffers, then bufdaemon enters qsleep state without resetting bd_request, with timeouts of one tens of second. Your patch will cause all wakeups for this case to be lost. This is exactly the situation when we want bufdaemon to run harder to avoid possible deadlocks, not to slow down. >=20 > All stuff around bd_request and bufdaemon sleep is under bd_lock, so > if bd_request is 0 and bufdaemon is not sleeping, then all wakeups are > unreasonable! The patch is about that mainly. Wakeups itself are very cheap for the running process. Mostly, it comes down to locking sleepq and waking all threads that are present in the sleepq blocked queue. If there is no threads in queue, nothing is done. >=20 > > > >> > >> I read description of bd_request variable. However, bd_request should > >> serve as an indicator that buf_daemon() is in sleep. I.e., the > >> following paradigma should be used: > >> > >> mtx_lock(&bdlock); > >> bd_request =3D 0; =9A =9A/* now, it's only time when wakeup() will be = meaningful */ > >> sleep(&bd_request, ..., hz/10); > >> bd_request =3D 1; =9A /* in case of timeout, we must set it (bd_wakeup= () > >> already set it) */ > >> mtx_unlock(&bdlock); > >> > >> My patch follows the paradigma. What happens without the patch in > >> described problem: buf_daemon() fails in its job and goes to sleep > >> with hz/10 period. It supposes that next early wakeup will do nothing > >> too. bd_request is untouched but buf_daemon() doesn't know if its last > >> wakeup was made by bd_wakeup() or by timeout. So, bd_request could be > >> 0 and buf_daemon() can be waked up before hz/10 just by bd_wakeup(). > >> Moreover, setting bd_request to 0 when buf_daemon() is not in sleep > >> can cause time consuming and useless wakeup() calls without effect. --j2Klb18PAKd8hQ5U Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9qPEQACgkQC3+MBN1Mb4gF3gCglmQi1NLIKxyk7VPkLC3Ug6q4 VbkAoJm/kWup/q+dhb2JI7I5JJud2HTd =gBvo -----END PGP SIGNATURE----- --j2Klb18PAKd8hQ5U--