Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jul 2009 12:02:13 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <200907071902.n67J2Dcm090246@apollo.backplane.com>
References:  <1246746182.00135530.1246735202@10.7.7.3> <1246792983.00135712.1246781401@10.7.7.3> <1246796580.00135722.1246783203@10.7.7.3> <1246814582.00135806.1246803602@10.7.7.3> <1246818181.00135809.1246804804@10.7.7.3> <1246825383.00135846.1246812602@10.7.7.3> <1246825385.00135854.1246814404@10.7.7.3> <1246830930.00135868.1246819202@10.7.7.3> <1246830933.00135875.1246820402@10.7.7.3> <1246908182.00136258.1246896003@10.7.7.3> <1246911786.00136277.1246900203@10.7.7.3> <1246915383.00136290.1246904409@10.7.7.3> <4A534D05.1040709@FreeBSD.org> <200907071636.n67GagxN087660@apollo.backplane.com> <4A53931D.6040307@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

:All I wanted to say, is that it is FS privilege to decide how much data 
:it needs. But when it really needs a lot of data, they should be better 
:transferred with smaller number of bigger transactions, without strict 
:MAXPHYS limitation.
:
:-- 
:Alexander Motin

    We are in agreement.  That's essentially what I mean by all my
    cluster_read() comments.  What matters the most is how much read-ahead
    the cluster code does, and how well matched the read-ahead is on
    reducing future transactions, and not so much on anything else (such as
    cpu caches).

    The cluster heuristics are pretty good but they do break down under
    certain circumstances.  For example, for UFS they break down when there
    is file data adjacency between different inodes.  That is often why one
    sees the KB/t sizes go down (and the TPS rate go up) when tar'ing up a
    large number of small files.  taring up /usr/src is a good example of
    this.  KB/t can drop all the way down to 8K and performance is noticably
    degraded.

    The cluster heuristic also tends to break down on the initial read() from
    a newly constituted vnode, because it has no prior history to work with
    and so does not immediately issue a read-ahead even though the I/O may
    end up being linear.

    --

    For command latency issues Julian pointed out a very interesting contrast
    between a HD and a (SATA) SSD.  With no seek times to speak of command
    overhead becomes a bigger deal when trying to maximize the peformance
    of a SSD.  I would guess that larger DMA transactions (from the point of
    view of the host cpu anyhow) would be more highly desired once we start
    hitting bandwidth ceilings of 300 MBytes/sec for SATA II and
    600 MBytes/sec beyond that.

    If in my example the bandwidth ceiling for a HD capable of doing 60MB/s
    is hit at the 8K mark then presumably the block size needed to hit the
    bandwidth ceiling for a HD or SSD capable of 200MB/s, or 300MB/s, or
    higher, will also have to be larger.  16K, 32K, etc.  This is fast
    approaching the 64K mark people are arguing about.

    In anycase, the main reason I posted is to try to correct people's
    assumptions on the importance of various parameters, particularly the
    irrelevancy of cpu caches in the bigger picture.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200907071902.n67J2Dcm090246>