Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jul 2009 15:15:35 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Alexander Motin <mav@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <200907072215.n67MFZeM092096@apollo.backplane.com>
References:  <1246746182.00135530.1246735202@10.7.7.3> <1246792983.00135712.1246781401@10.7.7.3> <1246796580.00135722.1246783203@10.7.7.3> <1246814582.00135806.1246803602@10.7.7.3> <1246818181.00135809.1246804804@10.7.7.3> <1246825383.00135846.1246812602@10.7.7.3> <1246825385.00135854.1246814404@10.7.7.3> <1246830930.00135868.1246819202@10.7.7.3> <1246830933.00135875.1246820402@10.7.7.3> <1246908182.00136258.1246896003@10.7.7.3> <1246911786.00136277.1246900203@10.7.7.3> <1246915383.00136290.1246904409@10.7.7.3> <4A534D05.1040709@FreeBSD.org> <200907071636.n67GagxN087660@apollo.backplane.com> <4A53931D.6040307@FreeBSD.org> <200907071902.n67J2Dcm090246@apollo.backplane.com> <20090708062346.G1555@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
:I will disagree with most of this
:- the amount of read-ahead/clustering is not very important.  fs's already
:   depend on the drive doing significant buffering, so that when the fs gets
:   things and seeks around a lot, not all the seeks are physical.  Locality
:   is much more important.

    Yes, I agree with you there to a point, but drive cache performance
    tails off very quickly if things are not exactly sequential in each
    zone being read, and it is fairly difficult to achieve exact
    sequentiality in the filesystem layout.  Also command latency really
    starts to interfere if you have to go to the drive every few name
    lookups / stats / whatever since those operations only take a few
    microseconds if the data is sitting in the buffer cache, even if
    its just going to the HD's on-drive cache.

    The cluster code fixes both the command latency issue and the problem
    of slight non-sequentialities in the access pattern (in each zone being
    seek-read).  Without it performance numbers will wind up being all
    over the board.  That makes it fairly important.

    I got a nifty program to test that.

	fetch http://apollo.backplane.com/DFlyMisc/zoneread.c
	cc ...
	(^C to stop test, use iostat to see the results)
	./zr /dev/da0 16 16 1024 1
	./zr /dev/da0 16 16 1024 2
	./zr /dev/da0 16 16 1024 3
	./zr /dev/da0 16 16 1024 4


    If you play with it you will find that most drives can track around
    16 zones and 100% sequential forward reads in each zone.  Any other
    access pattern severely degrades performance.  For example if you
    read the data in reverse you can kiss goodbyte to performance.  If
    you introduce slight non-linearities in the access pattern, even though
    the seeks are within 16-32K of each other, performance degrades very
    rapidly.

    This is what I mean by drives not doing sane caching.  It was ok with
    smaller drives where the non-linearities were hitting up against the
    need to do an actual head seek, but the drive caches in today's huge
    drives are just not tuned very well.

    UFS does have a bit of advantage here but HAMMER does a fairly good
    job too.  The problem HAMMER has is with its initial layout due to
    B-Tree node splits (which messes up linearity in the B-Tree).  Once
    the reblocker cleans up the B-Tree performance is recovered.  The
    B-Tree is the biggest problem, but I can't fix the initial layout
    without making incompatible media changes so I'm holding off on
    doing it for now.

						-Matt




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200907072215.n67MFZeM092096>