Date: Tue, 13 Jul 2010 23:19:55 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Jerry Toung <jrytoung@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: disk I/O, VFS hirunningspace Message-ID: <201007140619.o6E6JtSe012902@apollo.backplane.com> References: <AANLkTinm3kFm7pF_cxoNz1Cgyd5UvnmgZzCpbjak-zzy@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:void :waitrunningbufspace(void) :{ :/* : mtx_lock(&rbreqlock); : while (runningbufspace > hirunningspace) { : ++runningbufreq; : msleep(&runningbufreq, &rbreqlock, PVM, "wdrain", 0); : } : mtx_unlock(&rbreqlock); :*/ :} : :so far, I can't observe any side effects of not running it. Am I on a time :bomb? : :Thank you, :Jerry You can bump up the related sysctl for hirunningspace if it helps you, no kernel code modification is needed. I recommend setting it to at least 8MB (8388608). sysctl vfs.hirunningspace=8388608 sysctl vfs.lorunningspace=1048576 The waitrunningbufspace() code is designed to protect the system from several degenerate situations and should be left in place. One is where a large backlog of issued WRITE BIOs can accumulate on block devices. Because the related buffers are locked during the I/O, any attempt to access the data via the buffer cache will unnecessarily stall the thread trying to access it. Without a limit several seconds worth of BIOs can accumulate (sometimes tens of seconds worth if the I/O is non-linear). Both accesses to file data and accesses to meta-data can wind up stalling, reducing filesystem peformance. A second issue is that system buffer cache algorithms will become severely inefficient if too much of the buffer cache is held in a locked state. That said, the defaults in bufinit() (lines 623 and 624) are a bit too low for today's high-speed I/O subsystems. They appear to be set to fixed assignments of 512K for lo and 1MB for hi. Even though the defaults are too low they still ought to be enough to maintain maximum I/O throughput since WRITE BIOs usually complete very quickly (they just go into the target device's own write cache and complete). The pipeline should be maintained if the hysteresis is working properly. Perhaps there is something else broken that is causing the hystersis to not work properly. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007140619.o6E6JtSe012902>