From owner-freebsd-hackers@FreeBSD.ORG Thu Mar 15 19:00:42 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3CD3E1065680 for ; Thu, 15 Mar 2012 19:00:42 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id E7F478FC0C for ; Thu, 15 Mar 2012 19:00:41 +0000 (UTC) Received: by ghrr20 with SMTP id r20so4129845ghr.13 for ; Thu, 15 Mar 2012 12:00:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=rsE7EHMMoHNPIAPOySUpNzdTlM6sPWS8+dlD5IglgLc=; b=GRDrxdfTqlfvWO+F3w859wvifeIVnA5R1OmpqCZvBj4YziruyKhUICA4yrCN0ieLDa PXo1qDDsWRUMG8FSL4Ueg0O2SO6Uiyf3xVDSZSt6i1DVyRctIml7s5FzRGi0W46Qje+J XkOmNNuCzaCY/Unm/RosAoYqzdw0sCPSfJcN5cOOwZgS1E04MfVygOs5SqFT1/ul32wS D/LMv+Aj4DGvUYIntCZK5ZzuDuEjqk+fT6o7wLXj03YZSypqOPS4wDgf541DsO5Mwa5c UOfVZvW0buD7HlO6FAnvwRUVKRAoj5kMxt9UIagmJdhCh845Eexwuw74XUEWV3ddiuHW fXDA== MIME-Version: 1.0 Received: by 10.236.184.167 with SMTP id s27mr9845225yhm.8.1331838041241; Thu, 15 Mar 2012 12:00:41 -0700 (PDT) Received: by 10.236.75.162 with HTTP; Thu, 15 Mar 2012 12:00:41 -0700 (PDT) In-Reply-To: <20120315112959.GP75778@deviant.kiev.zoral.com.ua> References: <20120312181921.GF75778@deviant.kiev.zoral.com.ua> <20120315112959.GP75778@deviant.kiev.zoral.com.ua> Date: Thu, 15 Mar 2012 20:00:41 +0100 Message-ID: From: Svatopluk Kraus To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: hackers@freebsd.org Subject: Re: [vfs] buf_daemon() slows down write() severely on low-speed CPU X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Mar 2012 19:00:42 -0000 2012/3/15 Konstantin Belousov : > On Tue, Mar 13, 2012 at 01:54:38PM +0100, Svatopluk Kraus wrote: >> On Mon, Mar 12, 2012 at 7:19 PM, Konstantin Belousov >> wrote: >> > On Mon, Mar 12, 2012 at 04:00:58PM +0100, Svatopluk Kraus wrote: >> >> Hi, >> >> >> >> =A0 =A0I have solved a following problem. If a big file (according to >> >> 'hidirtybuffers') is being written, the write speed is very poor. >> >> >> >> =A0 =A0It's observed on system with elan 486 and 32MB RAM (i.e., low = speed >> >> CPU and not too much memory) running FreeBSD-9. >> >> >> >> =A0 =A0Analysis: A file is being written. All or almost all dirty buf= fers >> >> belong to the file. The file vnode is almost all time locked by >> >> writing process. The buf_daemon() can not flush any dirty buffer as a >> >> chance to acquire the file vnode lock is very low. A number of dirty >> >> buffers grows up very slow and with each new dirty buffer slower, >> >> because buf_daemon() eats more and more CPU time by looping on dirty >> >> buffers queue (with very low or no effect). >> >> >> >> =A0 =A0This slowing down effect is started by buf_daemon() itself, wh= en >> >> 'numdirtybuffers' reaches 'lodirtybuffers' threshold and buf_daemon() >> >> is waked up by own timeout. The timeout fires at 'hz' period, but >> >> starts to fire at 'hz/10' immediately as buf_daemon() fails to reach >> >> 'lodirtybuffers' threshold. When 'numdirtybuffers' (now slowly) >> >> reaches ((lodirtybuffers + hidirtybuffers) / 2) threshold, the >> >> buf_daemon() can be waked up within bdwrite() too and it's much worse= . >> >> Finally and with very slow speed, the 'hidirtybuffers' or >> >> 'dirtybufthresh' is reached, the dirty buffers are flushed, and >> >> everything starts from beginning... >> > Note that for some time, bufdaemon work is distributed among bufdaemon >> > thread itself and any thread that fails to allocate a buffer, esp. >> > a thread that owns vnode lock and covers long queue of dirty buffers. >> >> However, the problem starts when numdirtybuffers reaches >> lodirtybuffers count and ends around hidirtybuffers count. There are >> still plenty of free buffers in system. >> >> >> >> >> =A0 =A0On the system, a buffer size is 512 bytes and the default >> >> thresholds are following: >> >> >> >> =A0 =A0vfs.hidirtybuffers =3D 134 >> >> =A0 =A0vfs.lodirtybuffers =3D 67 >> >> =A0 =A0vfs.dirtybufthresh =3D 120 >> >> >> >> =A0 =A0For example, a 2MB file is copied into flash disk in about 3 >> >> minutes and 15 second. If dirtybufthresh is set to 40, the copy time >> >> is about 20 seconds. >> >> >> >> =A0 =A0My solution is a mix of three things: >> >> =A0 =A01. Suppresion of buf_daemon() wakeup by setting bd_request to = 1 in >> >> the main buf_daemon() loop. >> > I cannot understand this. Please provide a patch that shows what do >> > you mean there. >> > >> =A0 =A0 =A0 curthread->td_pflags |=3D TDP_NORUNNINGBUF | TDP_BUFNEED; >> =A0 =A0 =A0 mtx_lock(&bdlock); >> =A0 =A0 =A0 for (;;) { >> - =A0 =A0 =A0 =A0 =A0 =A0 bd_request =3D 0; >> + =A0 =A0 =A0 =A0 =A0 =A0 bd_request =3D 1; >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtx_unlock(&bdlock); > Is this a complete patch ? The change just causes lost wakeups for bufdae= mon, > nothing more. Yes, it's a complete patch. And exactly, it causes lost wakeups which are: 1. !! UNREASONABLE !!, because bufdaemon is not sleeping, 2. not wanted, because it looks that it's correct behaviour for the sleep with hz/10 period. However, if the sleep with hz/10 period is expected to be waked up by bd_wakeup(), then bd_request should be set to 0 just before sleep() call, and then bufdaemon behaviour will be clear. All stuff around bd_request and bufdaemon sleep is under bd_lock, so if bd_request is 0 and bufdaemon is not sleeping, then all wakeups are unreasonable! The patch is about that mainly. > >> >> I read description of bd_request variable. However, bd_request should >> serve as an indicator that buf_daemon() is in sleep. I.e., the >> following paradigma should be used: >> >> mtx_lock(&bdlock); >> bd_request =3D 0; =A0 =A0/* now, it's only time when wakeup() will be me= aningful */ >> sleep(&bd_request, ..., hz/10); >> bd_request =3D 1; =A0 /* in case of timeout, we must set it (bd_wakeup() >> already set it) */ >> mtx_unlock(&bdlock); >> >> My patch follows the paradigma. What happens without the patch in >> described problem: buf_daemon() fails in its job and goes to sleep >> with hz/10 period. It supposes that next early wakeup will do nothing >> too. bd_request is untouched but buf_daemon() doesn't know if its last >> wakeup was made by bd_wakeup() or by timeout. So, bd_request could be >> 0 and buf_daemon() can be waked up before hz/10 just by bd_wakeup(). >> Moreover, setting bd_request to 0 when buf_daemon() is not in sleep >> can cause time consuming and useless wakeup() calls without effect.