Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Jul 2009 22:16:27 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: DFLTPHYS vs MAXPHYS
Message-ID:  <4A50FC0B.9090601@FreeBSD.org>
In-Reply-To: <d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
References:  <4A4FAA2D.3020409@FreeBSD.org>	 <20090705100044.4053e2f9@ernst.jennejohn.org>	 <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>	 <4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>	 <4A50DEE8.6080406@FreeBSD.org> <20090706034250.C2240@besplex.bde.org>	 <4A50F619.4020101@FreeBSD.org> <d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Adrian Chadd wrote:
> 2009/7/6 Alexander Motin <mav@freebsd.org>:
> 
>> In this tests you've got almost only negative side of effect, as you have
>> said, due to cache misses. Do you really have CPU with so small L2 cache?
>> Some kind of P3 or old Celeron? But with 64K MAXPHYS you just didn't get any
>> benefit from using bigger block size.
> 
> All the world isn't your current desktop box with only SATA devices :)

This is laptop and what do you mean by "only SATA"? You know any storage 
which performance degrade from big transactions?

> There have been and will be plenty of little embedded CPUs with tiny
> amounts of cache for quite some time to come.

Fine, lets set it to 8K on ARM. What do want to say by that?

> You're also doing simple stream IO tests. Please re-think the thought
> experiment with a whole lot of parallel IO going on rather than just
> straight single stream IO.

Please don't. Parallel access with big blocks becomes just more linear 
with growing block length. For modern drives with >100MB/s speeds and 
10ms access time it is just a madness to transfer less then 1MB in one 
transaction with random access.

> Also, please realise that part of having your cache thrashed is what
> it does to the performance of -other- code. dd may be fast, but if
> you're constantly purging your caches by copying around all of that
> data, subsequent code has to go and freshen the cache again. On older
> and anaemic embedded/low power boxes the cost of a cache miss vs cache
> hit can still be quite expensive.

I think that anaemic embedded/low power boxes will prefer to handle 
operation by chipset hardware as much as possible without interrupting CPU.

Also please read one of my previous posts. I don't see why, with, for 
example, 1M user-level buffer, buffer-cache backed access spited into 
many small disk transactions could less trash CPU cache. It just 
transmit same amount of data into the same buffer cachememory addresses. 
It is not a disk transaction DMA size trashes the cache. If you want to 
fight it - OK, but not there.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A50FC0B.9090601>