From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 14 06:19:59 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 43A2E1065673 for ; Wed, 14 Jul 2010 06:19:59 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 00F1C8FC15 for ; Wed, 14 Jul 2010 06:19:58 +0000 (UTC) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id o6E6Jt5J012903; Tue, 13 Jul 2010 23:19:58 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.14.4/8.13.4/Submit) id o6E6JtSe012902; Tue, 13 Jul 2010 23:19:55 -0700 (PDT) Date: Tue, 13 Jul 2010 23:19:55 -0700 (PDT) From: Matthew Dillon Message-Id: <201007140619.o6E6JtSe012902@apollo.backplane.com> To: Jerry Toung References: Cc: freebsd-hackers@freebsd.org Subject: Re: disk I/O, VFS hirunningspace X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jul 2010 06:19:59 -0000 :void :waitrunningbufspace(void) :{ :/* : mtx_lock(&rbreqlock); : while (runningbufspace > hirunningspace) { : ++runningbufreq; : msleep(&runningbufreq, &rbreqlock, PVM, "wdrain", 0); : } : mtx_unlock(&rbreqlock); :*/ :} : :so far, I can't observe any side effects of not running it. Am I on a time :bomb? : :Thank you, :Jerry You can bump up the related sysctl for hirunningspace if it helps you, no kernel code modification is needed. I recommend setting it to at least 8MB (8388608). sysctl vfs.hirunningspace=8388608 sysctl vfs.lorunningspace=1048576 The waitrunningbufspace() code is designed to protect the system from several degenerate situations and should be left in place. One is where a large backlog of issued WRITE BIOs can accumulate on block devices. Because the related buffers are locked during the I/O, any attempt to access the data via the buffer cache will unnecessarily stall the thread trying to access it. Without a limit several seconds worth of BIOs can accumulate (sometimes tens of seconds worth if the I/O is non-linear). Both accesses to file data and accesses to meta-data can wind up stalling, reducing filesystem peformance. A second issue is that system buffer cache algorithms will become severely inefficient if too much of the buffer cache is held in a locked state. That said, the defaults in bufinit() (lines 623 and 624) are a bit too low for today's high-speed I/O subsystems. They appear to be set to fixed assignments of 512K for lo and 1MB for hi. Even though the defaults are too low they still ought to be enough to maintain maximum I/O throughput since WRITE BIOs usually complete very quickly (they just go into the target device's own write cache and complete). The pipeline should be maintained if the hysteresis is working properly. Perhaps there is something else broken that is causing the hystersis to not work properly. -Matt