From owner-freebsd-hackers@FreeBSD.ORG  Wed Jul 14 06:19:59 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 43A2E1065673
	for <freebsd-hackers@freebsd.org>; Wed, 14 Jul 2010 06:19:59 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 00F1C8FC15
	for <freebsd-hackers@freebsd.org>; Wed, 14 Jul 2010 06:19:58 +0000 (UTC)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id o6E6Jt5J012903;
	Tue, 13 Jul 2010 23:19:58 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.4/8.13.4/Submit) id o6E6JtSe012902;
	Tue, 13 Jul 2010 23:19:55 -0700 (PDT)
Date: Tue, 13 Jul 2010 23:19:55 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <201007140619.o6E6JtSe012902@apollo.backplane.com>
To: Jerry Toung <jrytoung@gmail.com>
References: <AANLkTinm3kFm7pF_cxoNz1Cgyd5UvnmgZzCpbjak-zzy@mail.gmail.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: disk I/O, VFS hirunningspace
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 14 Jul 2010 06:19:59 -0000


:void
:waitrunningbufspace(void)
:{
:/*
:        mtx_lock(&rbreqlock);
:        while (runningbufspace > hirunningspace) {
:                ++runningbufreq;
:                msleep(&runningbufreq, &rbreqlock, PVM, "wdrain", 0);
:        }
:        mtx_unlock(&rbreqlock);
:*/
:}
:
:so far, I can't observe any side effects of not running it. Am I on a time
:bomb?
:
:Thank you,
:Jerry

    You can bump up the related sysctl for hirunningspace if it helps
    you, no kernel code modification is needed.  I recommend setting it
    to at least 8MB (8388608).

	sysctl vfs.hirunningspace=8388608
	sysctl vfs.lorunningspace=1048576

    The waitrunningbufspace() code is designed to protect the system from
    several degenerate situations and should be left in place.
    One is where a large backlog of issued WRITE BIOs can accumulate on
    block devices.  Because the related buffers are locked during the I/O,
    any attempt to access the data via the buffer cache will unnecessarily
    stall the thread trying to access it.  Without a limit several seconds
    worth of BIOs can accumulate (sometimes tens of seconds worth if the
    I/O is non-linear).  Both accesses to file data and accesses to meta-data
    can wind up stalling, reducing filesystem peformance.

    A second issue is that system buffer cache algorithms will become
    severely inefficient if too much of the buffer cache is held in a
    locked state.

    That said, the defaults in bufinit() (lines 623 and 624) are a bit
    too low for today's high-speed I/O subsystems.  They appear to be set
    to fixed assignments of 512K for lo and 1MB for hi.  Even though the
    defaults are too low they still ought to be enough to maintain maximum
    I/O throughput since WRITE BIOs usually complete very quickly (they
    just go into the target device's own write cache and complete).  The
    pipeline should be maintained if the hysteresis is working properly.
    Perhaps there is something else broken that is causing the hystersis
    to not work properly.

						-Matt