FreeBSD Mail Archives

Date:      Tue, 07 Jul 2009 16:26:29 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <4A534D05.1040709@FreeBSD.org>
In-Reply-To: <1246915383.00136290.1246904409@10.7.7.3>
References:  <1246746182.00135530.1246735202@10.7.7.3> <1246792983.00135712.1246781401@10.7.7.3> <1246796580.00135722.1246783203@10.7.7.3> <1246814582.00135806.1246803602@10.7.7.3> <1246818181.00135809.1246804804@10.7.7.3> <1246825383.00135846.1246812602@10.7.7.3> <1246825385.00135854.1246814404@10.7.7.3> <1246830930.00135868.1246819202@10.7.7.3> <1246830933.00135875.1246820402@10.7.7.3> <1246908182.00136258.1246896003@10.7.7.3> <1246911786.00136277.1246900203@10.7.7.3> <1246915383.00136290.1246904409@10.7.7.3>

Matthew Dillon wrote:
>       tty             da0             cpu
>  tin tout  KB/t tps   MB/s  us ni sy in id
>    0   11   0.50 17511  8.55   0  0 15  0 85            bs=512
>    0   11   1.00 16108 15.73   0  0 12  0 87            bs=1024
>    0   11   2.00 14758 28.82   0  0 11  0 89            bs=2048
>    0   11   4.00 12195 47.64   0  0  7  0 93            bs=4096
>    0   11   8.00  8026 62.70   0  0  5  0 95            bs=8192 << MB/s breakpt
>    0   11  16.00  4018 62.78   0  0  4  0 96            bs=16384
>    0   11  32.00  2025 63.28   0  0  2  0 98            bs=32768 << id breakpt
>    0   11  64.00  1004 62.75   0  0  1  0 99            bs=65536
>    0   11 128.00   506 63.25   0  0  1  0 99            bs=131072

As I have written before, my SSD continues to improve speed up to 512KB
transaction size, and may be farther, I haven't tested

> Random seek/read
> 
>       tty              da0             cpu
>  tin tout   KB/t tps    MB/s  us ni sy in id
>    0   11   0.50  189   0.09   0  0  0  0 100	bs=512
>    0   11   1.00  184   0.18   0  0  0  0 100	bs=1024
>    0   11   2.00  177   0.35   0  0  0  0 100	bs=2048
>    0   11   4.00  175   0.68   0  0  0  0 100	bs=4096
>    0   11   8.00  172   1.34   0  0  0  0 100	bs=8192
>    0   11  16.00  166   2.59   0  0  0  0 100	bs=16384
>    0   11  32.00  159   4.97   0  0  1  0 99	bs=32768
>    0   11  64.00  142   8.87   0  0  0  0 100	bs=65536
>    0   11 128.00  117  14.62   0  0  0  0 100	bs=131072
> 		  ^^^   ^^^
> 		  note TPS rate and MB/s
> 
>     Which is the more important tuning variable?  Efficiency of linear
>     reads or saving re-seeks by buffering more data?  If you didn't choose
>     saving re-seeks you lose.
> 
>     To go from 16K to 32K requires saving 5% of future re-seeks to break-even.
>     To go from 32K to 64K requires saving 11% of future re-seeks.
>     To go from 64K to 128K requires saving 18% of future re-seeks.
>     (at least with this particular disk)
> 
>     At the point where the block size exceeds 32768 if you aren't saving
>     re-seeks with locality of reference from the additional cached data,
>     you lose.  If you are saving reseeks you win.  cpu caches do not enter
>     into the equation at all.
> 
>     For most filesystems the re-seeks being saved depend on the access
>     pattern.  For example, if you are doing a ls -lR or a find the re-seek
>     pattern will be related to inode and directory lookups.  The number of
>     inodes which fit in a cluster_read(), assuming reasonable locality of
>     reference, will wind up determining the performance.
> 
>     However, as the buffer size grows the total number of bytes you are
>     able to cache becomes the dominant factor in calculating the re-seek
>     efficiency.  I don't have a graph for that but, ultimately, it means
>     that reading very large blocks (i.e. 1MB) with a non-linear access
>     pattern is bad because most of the additional data cached will never
>     be used before the memory winds up being re-used to cache some other
>     cluster.

You are mixing completely different things. I was never talking about
file system block size. I am not trying to argue that 16/32K file system
block size may be quite effective in most of cases. I was speaking about
maximum _disk_transaction_ size. It is not the same.

When file system needs small amount of data, or there is just small
file, there is definitely no need to read/write more then one small FS
block. But instead, when file system prognoses effective large
read-ahead or it have a lot of write-back data, there is no reason to
not transfer more contiguous blocks with one big disk transaction.
Splitting it will just increase command overhead at all layers and make
possible drive to be interrupted between that operations to do some very
long seek.

-- 
Alexander Motin

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A534D05.1040709>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation