From owner-freebsd-hackers Wed Mar 6 9:53:42 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18]) by hub.freebsd.org (Postfix) with ESMTP id 6CFA137B402 for ; Wed, 6 Mar 2002 09:53:36 -0800 (PST) Received: from onyx (onyx.cs.binghamton.edu [128.226.140.171]) by bingnet2.cc.binghamton.edu (8.11.6/8.11.6) with ESMTP id g26HrYm03883; Wed, 6 Mar 2002 12:53:34 -0500 (EST) Date: Wed, 6 Mar 2002 12:51:48 -0500 (EST) From: Zhihui Zhang X-Sender: zzhang@onyx To: Ian Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: A weird disk behaviour In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 6 Mar 2002, Ian wrote: > > > > Zhihui Zhang wrote: > > > > > > > >> ... I also do not read anything during the partial block write, > >> and I think the disk controller should not do that either. > > > > If you do a partial block write, surely at some point the block must be read > > in order to preserve that segment of data you are _not_ overwriting? > > This was *exactly* my experience in FreeBSD 3.2, which was the last time I > looked into this in detail. The performance of writing full blocks instead > of partitial blocks was at least an order of magnitude better. (By "blocks" > here I mean the size the filesystem was formatted with, the -b parameter to > newfs.) I found that a filesystem formatted as -b8192 -f8192 performed so > much faster than the usual -b8192 -f1024 that it was well worth taking the > hit in wasted allocation space for small files. > > When I instrumented code in various places to try to track down why there > was such a huge difference when fragsize != blocksize I found that the > killer was repeated read-modify-write cycles, especially on filesystem > metadata. Creating a file and writing a few bytes to it could result in > dozens of blocks read then written, and some of the blocks got re-read > several times in the process. It was always a mystery to me why the same > sectors would get read over and over again (isn't that what buffer and > filesystem caches are for?) But I know for certain the physical reads were > happening because the instrumentation for that was in a custom raid driver > of our own. Could you tell me where is your custom raid driver? I mean, is it part of the operating system or inside the disk controller? > But, FreeBSD 3.2 is ancient history now, I have no idea whether filesystem > performance is still this bad (and surely softupdates would ameliorate this > problem anyway). Also, this may not be relevant to Zhilhui Zang's situation > because filesystem behavior is probably different than working directly with > the /dev/daxxxx device. (Or maybe not, I guess there must be an implied > blocksize from an incore disklabel or something.) I feel that the slowness of the file system is due to its sort of out-of-date on-disk structures. Many modern file systems are use B+tree nowadays. Softupdate helps a lot, but it can not solve the problem completely. > It would be interesting to see if formatting a filesystem with blocksize == > fragsize still makes a big difference in performance these days, but I > remember all the instrumentation I had to do to prove the read-modify-write > was happening last time being a BIG hassle, and nobody is paying me to do it > anymore. :-) > > > -- Ian > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message