Date: Sun, 11 Dec 2011 12:10:32 -0800 (PST) From: "Pedro F. Giffuni" <giffunip@tutopia.com> To: Kostik Belousov <kostikbel@gmail.com> Cc: current@freebsd.org Subject: Re: calling all fs experts Message-ID: <1323634232.36004.YahooMailClassic@web113503.mail.gq1.yahoo.com>
next in thread | raw e-mail | index | archive | help
=0A--- Dom 11/12/11, Kostik Belousov <kostikbel@gmail.com> ha scritto:=0A= =0A> =0A> If you wanted to get responses from experts only, sorry in=0A> ad= vance.=0A>=0A=0AI am no fs expert but just thought I'd mention some things= =0Abased on my playing with the BSD ext2fs ...=0A =0A> The fs (AKA UFS) use= s clustering provided by the block=0A> cache. The clustering=0A> code, main= ly located in the kern/vfs_cluster.c, coalesces=0A> sequence of=0A> reads o= r writes that are targeting the consequtive blocks,=0A> into single=0A> phy= sical read or write of the maximal size of MAXPHYS.=0A> Current definition= =0A> of MAXPHYS is 128KB.=0A>=0A=0AThe clustering code is really cool and t= he idea is that it=0Agives UFS the advantages of an extent based fs.=0AI ha= ven't seen benchmarks in UFS2 but on ext2 it didn't=0Aseem to work as it sh= ould though. =0A=0AOne issue is that ext2 doesn't support fragments and as= =0Aa consequence ext2 will not use big blocksizes. This is a=0Alimitation i= n the ext2 design that UFS doesn't have, but=0Astill linux's ext2fs outperf= orms UFS in async mode (we do=0Ashine in sync mode).=0A=0AIt was never clea= r exactly why this happens but it would=0Aappear there is a bottleneck in g= eom that is not good in=0Awriting many contiguous blocks.=0A=0A> Clustering= allows filesystem to improve the layout of the=0A> files by calling=0A> VO= P_REALLOCBLKS() to redo the allocation to make the=0A> writing sequence of= =0A> blocks sequential if it is not.=0A> =0A> Even if file is not layed out= ideally, or the i/o pattern=0A> is random, most=0A> writes scheduled are a= synchronous, and for reads, the=0A> system tries to=0A> schedule read-ahead= s for some limited number of blocks.=0A> This allows the=0A> lower layers, = i.e. geom and disk drivers, to optimize the=0A> i/o queue=0A> to coalesce r= equests that are consequitive on disk, but not=0A> on the queue.=0A> =0A> B= TW, some time ago I was interested in the effect on the=0A> fragmentation= =0A> on UFS, due to some semi-abandoned patch, which could make=0A> the=0A>= fragmentation worse. I wrote the tool that calculated the=0A> percentage= =0A> of non-consequtive spots in the whole filesystem.=0A> Apparently, even= =0A> under the hard load consisting of writing a lot of files=0A> under the= =0A> megabytes in size, UFS managed to keep the number of spots=0A> under 2= -3% on=0A> sufficiently free volume.=0A> =0A=0AYes, the realloc_blk code is= very efficient in that. In fact=0Ait is so good it actually hides some ine= fficient operations=0Ain UFS. Bruce had a patch for this that I cc'd to Kir= k but=0Athe difference was not big because the realloc_blk code does=0Ait's= job in memory.=0A=0AZheng Liu did the reallocation thing for ext2fs and it= gave=0Abetter results than preallocation but the results are not=0Aas spec= tacular as in UFS (the UFS code takes advantage of=0Afragments there too). = I do expect to commit it (kern/159233)=0Aonce my mentor reviews and approve= s it.=0A=0Acheers,=0A=0APedro.=0A
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1323634232.36004.YahooMailClassic>