From owner-freebsd-current Tue Mar 21 22:18: 4 2000 Delivered-To: freebsd-current@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 80A3637BB18; Tue, 21 Mar 2000 22:17:57 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id WAA86154; Tue, 21 Mar 2000 22:17:52 -0800 (PST) (envelope-from dillon) Date: Tue, 21 Mar 2000 22:17:52 -0800 (PST) From: Matthew Dillon Message-Id: <200003220617.WAA86154@apollo.backplane.com> To: Paul Richards Cc: Richard Wendland , Alfred Perlstein , Poul-Henning Kamp , current@FreeBSD.ORG, fs@FreeBSD.ORG Subject: Re: FreeBSD random I/O performance issues References: <200003220022.AAA28786@ns0.netcraft.com> <38D833BC.A082DF09@originative.co.uk> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :written immediately which is 8750/10000 writes. : :When the write size drops below the filesystem block size then the :clustering code never gets called because the buffers are just marked :dirty and cached. : :I think if we fixed the issue of writing out full blocks this behviour :would stop but I also think the clustering code could do with a fix. It :should at least check to see if there is a cluster being built when the :blockno is 0 and push it out. Possibly though it'd be better to not push :out clusters of only one block and just leave them in the cache. Hmm. Your analysis is correct but I don't think it's worth fixing the block-is-0 case. It may be worth revisiting the write-behind code to try to give it the ability to better discern random I/O from sequential I/O (e.g. perhaps it should ignore unaligned full blocks). It is perfectly ok for dirty blocks to remain in the buffer cache. In fact, it's *optimal* to leave them in the buffer cache as long as the buffer cache does not get saturated with them. The buffer cache is perfectly capable of clustering delayed writes. Also, the filesystem syncer comes along every 30 seconds or so anyway and flushes everything out. What the write-behind code tries to do is to prevent the buffer cache from being saturated with dirty buffers and to smooth out disk write I/O. It makes the assumption that write-behind data is not typically accessed by the program immediately after being written -- an assumption that winds up being incorrect in the DBM case you tested and resulting in stalls due to the buffer / VM pages being locked during the write I/O. The stalls are *not* due to the I/O itself but instead are due to side effects of the I/O being in-progress. If a user program doesn't access any of the information it recently wrote the whole mechanism winds up operating asynchronously in the background. If a user program does, then the write behind mechanism breaks down and you get a stall. The most common dirty-data case the filesystem has to deal with is appending to a file -- that is, doing piecemeal sequential writes. There are virtually no other cases which have the ability to saturate the buffer cache. This is why the write-behind code only tries to handle the piecemeal-write-flush-full-blocks case. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message