From owner-freebsd-hackers@FreeBSD.ORG Tue Mar 13 12:54:39 2012 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 348941065670 for ; Tue, 13 Mar 2012 12:54:39 +0000 (UTC) (envelope-from onwahe@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id E57638FC12 for ; Tue, 13 Mar 2012 12:54:38 +0000 (UTC) Received: by yhgm50 with SMTP id m50so634011yhg.13 for ; Tue, 13 Mar 2012 05:54:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Dj+vW6DuqOxtVLMaVdNZIWtM3Su+sx7jjDbElP233zs=; b=rPcDU8ms42djc3HePGxp7U7JNfueeKWCZ+r29z1+Mx0cOIBLuIawgNsKasCxWSTkHn zK5gS+YbcIJuw4fYxwQ/TOtM7HbIkcApwbh0sbG/C1qlpDNE0BVZ6BuLTbhsHFdN5AU+ T7uFIj0DRO1nCQHMIuOgbEMdnBRfjAjUIb09RDs7PYrqzj7fUrustDN5u09SYo52JLu+ KRblq5X6q59v/i6n0Ynlryt7Z2pKtKaUN2Ciyir74eS+I/ZwpYVxswgEJrSlkBAm7LKA kP59KLn9KBnwd0qMpZ0OynoHrbZdz0I5s1WmHVhboyd7ldNky5+nJcGapDlrECENeyUD JvyQ== MIME-Version: 1.0 Received: by 10.236.154.168 with SMTP id h28mr16915474yhk.59.1331643278289; Tue, 13 Mar 2012 05:54:38 -0700 (PDT) Received: by 10.236.75.162 with HTTP; Tue, 13 Mar 2012 05:54:38 -0700 (PDT) In-Reply-To: <20120312181921.GF75778@deviant.kiev.zoral.com.ua> References: <20120312181921.GF75778@deviant.kiev.zoral.com.ua> Date: Tue, 13 Mar 2012 13:54:38 +0100 Message-ID: From: Svatopluk Kraus To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: hackers@freebsd.org Subject: Re: [vfs] buf_daemon() slows down write() severely on low-speed CPU X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Mar 2012 12:54:39 -0000 On Mon, Mar 12, 2012 at 7:19 PM, Konstantin Belousov wrote: > On Mon, Mar 12, 2012 at 04:00:58PM +0100, Svatopluk Kraus wrote: >> Hi, >> >> =A0 =A0I have solved a following problem. If a big file (according to >> 'hidirtybuffers') is being written, the write speed is very poor. >> >> =A0 =A0It's observed on system with elan 486 and 32MB RAM (i.e., low spe= ed >> CPU and not too much memory) running FreeBSD-9. >> >> =A0 =A0Analysis: A file is being written. All or almost all dirty buffer= s >> belong to the file. The file vnode is almost all time locked by >> writing process. The buf_daemon() can not flush any dirty buffer as a >> chance to acquire the file vnode lock is very low. A number of dirty >> buffers grows up very slow and with each new dirty buffer slower, >> because buf_daemon() eats more and more CPU time by looping on dirty >> buffers queue (with very low or no effect). >> >> =A0 =A0This slowing down effect is started by buf_daemon() itself, when >> 'numdirtybuffers' reaches 'lodirtybuffers' threshold and buf_daemon() >> is waked up by own timeout. The timeout fires at 'hz' period, but >> starts to fire at 'hz/10' immediately as buf_daemon() fails to reach >> 'lodirtybuffers' threshold. When 'numdirtybuffers' (now slowly) >> reaches ((lodirtybuffers + hidirtybuffers) / 2) threshold, the >> buf_daemon() can be waked up within bdwrite() too and it's much worse. >> Finally and with very slow speed, the 'hidirtybuffers' or >> 'dirtybufthresh' is reached, the dirty buffers are flushed, and >> everything starts from beginning... > Note that for some time, bufdaemon work is distributed among bufdaemon > thread itself and any thread that fails to allocate a buffer, esp. > a thread that owns vnode lock and covers long queue of dirty buffers. However, the problem starts when numdirtybuffers reaches lodirtybuffers count and ends around hidirtybuffers count. There are still plenty of free buffers in system. >> >> =A0 =A0On the system, a buffer size is 512 bytes and the default >> thresholds are following: >> >> =A0 =A0vfs.hidirtybuffers =3D 134 >> =A0 =A0vfs.lodirtybuffers =3D 67 >> =A0 =A0vfs.dirtybufthresh =3D 120 >> >> =A0 =A0For example, a 2MB file is copied into flash disk in about 3 >> minutes and 15 second. If dirtybufthresh is set to 40, the copy time >> is about 20 seconds. >> >> =A0 =A0My solution is a mix of three things: >> =A0 =A01. Suppresion of buf_daemon() wakeup by setting bd_request to 1 i= n >> the main buf_daemon() loop. > I cannot understand this. Please provide a patch that shows what do > you mean there. > curthread->td_pflags |=3D TDP_NORUNNINGBUF | TDP_BUFNEED; mtx_lock(&bdlock); for (;;) { - bd_request =3D 0; + bd_request =3D 1; mtx_unlock(&bdlock); I read description of bd_request variable. However, bd_request should serve as an indicator that buf_daemon() is in sleep. I.e., the following paradigma should be used: mtx_lock(&bdlock); bd_request =3D 0; /* now, it's only time when wakeup() will be meaningfu= l */ sleep(&bd_request, ..., hz/10); bd_request =3D 1; /* in case of timeout, we must set it (bd_wakeup() already set it) */ mtx_unlock(&bdlock); My patch follows the paradigma. What happens without the patch in described problem: buf_daemon() fails in its job and goes to sleep with hz/10 period. It supposes that next early wakeup will do nothing too. bd_request is untouched but buf_daemon() doesn't know if its last wakeup was made by bd_wakeup() or by timeout. So, bd_request could be 0 and buf_daemon() can be waked up before hz/10 just by bd_wakeup(). Moreover, setting bd_request to 0 when buf_daemon() is not in sleep can cause time consuming and useless wakeup() calls without effect. >> =A0 =A02. Increment of buf_daemon() fast timeout from hz/10 to hz/4. >> =A0 =A03. Tuning dirtybufthresh to (((lodirtybuffers + hidirtybuffers) / >> 2) - 15) magic. > Even hz / 10 is awfully long time on modern hardware. > The dirtybufthresh is already the sysctl that you can change. Yes, I noted low-speed CPU. Don't forget that even if buf_daemon() sleeps for hz/4 period (and this is expected to be rare case), dirtybufthresh still works and helps. And I don't push the changes (except bd_request one (a little)). I'm just sharing my experience. > The 32MB is indeed around the lowest amount of memory where recent > FreeBSD can make an illusion of being useful. I am not sure how much > should the system be tuned by default for such configuration. Even recent FreeBSD on this configuration is useful pretty much. Of course, file operations are not main concern ... IMHO, it's always good to know how the system works (and its parts) in various configurations. >> >> =A0 =A0The mention copy time is about 30 seconds now. >> >> =A0 =A0The described problem is just for information to anyone who can b= e >> interested in. Comments are welcome. However, the bd_request thing is >> more general. >> >> =A0 =A0bd_request (despite its description) should be 0 only when >> buf_daemon() is in sleep(). Otherwise, wakeup() on &bd_request channel >> is useless. Therefore, setting bd_request to 1 in the main >> buf_daemon() loop is correct and better as it saves time spent by >> wakeup() on not existing channel. Thanks for your comments, Svata