From owner-freebsd-hackers Wed Mar 6 9:26:40 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from damnhippie.dyndns.org (12-253-177-2.client.attbi.com [12.253.177.2]) by hub.freebsd.org (Postfix) with ESMTP id 200EE37B442 for ; Wed, 6 Mar 2002 09:26:02 -0800 (PST) Received: from [172.22.42.2] (peace.hippie.lan [172.22.42.2]) by damnhippie.dyndns.org (8.11.6/8.11.6) with ESMTP id g26HQ2804836; Wed, 6 Mar 2002 10:26:02 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) User-Agent: Microsoft Outlook Express Macintosh Edition - 5.01 (1630) Date: Wed, 06 Mar 2002 10:26:04 -0700 Subject: Re: A weird disk behaviour From: Ian To: Cc: Zhihui Zhang Message-ID: In-Reply-To: <3C8648F5.1EC1E4EE@openet-telecom.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > Zhihui Zhang wrote: > > > >> ... I also do not read anything during the partial block write, >> and I think the disk controller should not do that either. > > If you do a partial block write, surely at some point the block must be read > in order to preserve that segment of data you are _not_ overwriting? This was *exactly* my experience in FreeBSD 3.2, which was the last time I looked into this in detail. The performance of writing full blocks instead of partitial blocks was at least an order of magnitude better. (By "blocks" here I mean the size the filesystem was formatted with, the -b parameter to newfs.) I found that a filesystem formatted as -b8192 -f8192 performed so much faster than the usual -b8192 -f1024 that it was well worth taking the hit in wasted allocation space for small files. When I instrumented code in various places to try to track down why there was such a huge difference when fragsize != blocksize I found that the killer was repeated read-modify-write cycles, especially on filesystem metadata. Creating a file and writing a few bytes to it could result in dozens of blocks read then written, and some of the blocks got re-read several times in the process. It was always a mystery to me why the same sectors would get read over and over again (isn't that what buffer and filesystem caches are for?) But I know for certain the physical reads were happening because the instrumentation for that was in a custom raid driver of our own. But, FreeBSD 3.2 is ancient history now, I have no idea whether filesystem performance is still this bad (and surely softupdates would ameliorate this problem anyway). Also, this may not be relevant to Zhilhui Zang's situation because filesystem behavior is probably different than working directly with the /dev/daxxxx device. (Or maybe not, I guess there must be an implied blocksize from an incore disklabel or something.) It would be interesting to see if formatting a filesystem with blocksize == fragsize still makes a big difference in performance these days, but I remember all the instrumentation I had to do to prove the read-modify-write was happening last time being a BIG hassle, and nobody is paying me to do it anymore. :-) -- Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message