Date: Tue, 7 Jul 2009 12:02:13 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-arch@FreeBSD.org Subject: Re: DFLTPHYS vs MAXPHYS Message-ID: <200907071902.n67J2Dcm090246@apollo.backplane.com> References: <1246746182.00135530.1246735202@10.7.7.3> <1246792983.00135712.1246781401@10.7.7.3> <1246796580.00135722.1246783203@10.7.7.3> <1246814582.00135806.1246803602@10.7.7.3> <1246818181.00135809.1246804804@10.7.7.3> <1246825383.00135846.1246812602@10.7.7.3> <1246825385.00135854.1246814404@10.7.7.3> <1246830930.00135868.1246819202@10.7.7.3> <1246830933.00135875.1246820402@10.7.7.3> <1246908182.00136258.1246896003@10.7.7.3> <1246911786.00136277.1246900203@10.7.7.3> <1246915383.00136290.1246904409@10.7.7.3> <4A534D05.1040709@FreeBSD.org> <200907071636.n67GagxN087660@apollo.backplane.com> <4A53931D.6040307@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:All I wanted to say, is that it is FS privilege to decide how much data :it needs. But when it really needs a lot of data, they should be better :transferred with smaller number of bigger transactions, without strict :MAXPHYS limitation. : :-- :Alexander Motin We are in agreement. That's essentially what I mean by all my cluster_read() comments. What matters the most is how much read-ahead the cluster code does, and how well matched the read-ahead is on reducing future transactions, and not so much on anything else (such as cpu caches). The cluster heuristics are pretty good but they do break down under certain circumstances. For example, for UFS they break down when there is file data adjacency between different inodes. That is often why one sees the KB/t sizes go down (and the TPS rate go up) when tar'ing up a large number of small files. taring up /usr/src is a good example of this. KB/t can drop all the way down to 8K and performance is noticably degraded. The cluster heuristic also tends to break down on the initial read() from a newly constituted vnode, because it has no prior history to work with and so does not immediately issue a read-ahead even though the I/O may end up being linear. -- For command latency issues Julian pointed out a very interesting contrast between a HD and a (SATA) SSD. With no seek times to speak of command overhead becomes a bigger deal when trying to maximize the peformance of a SSD. I would guess that larger DMA transactions (from the point of view of the host cpu anyhow) would be more highly desired once we start hitting bandwidth ceilings of 300 MBytes/sec for SATA II and 600 MBytes/sec beyond that. If in my example the bandwidth ceiling for a HD capable of doing 60MB/s is hit at the 8K mark then presumably the block size needed to hit the bandwidth ceiling for a HD or SSD capable of 200MB/s, or 300MB/s, or higher, will also have to be larger. 16K, 32K, etc. This is fast approaching the 64K mark people are arguing about. In anycase, the main reason I posted is to try to correct people's assumptions on the importance of various parameters, particularly the irrelevancy of cpu caches in the bigger picture. -Matt Matthew Dillon <dillon@backplane.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200907071902.n67J2Dcm090246>