From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 17:12:16 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 75AD01065670
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 17:12:16 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id B4AE58FC17
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 17:12:15 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247691562; Sun, 05 Jul 2009 20:12:12 +0300
Message-ID: <4A50DEE8.6080406@FreeBSD.org>
Date: Sun, 05 Jul 2009 20:12:08 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>
In-Reply-To: <20090706005851.L1439@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 17:12:16 -0000

Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>> Bruce Evans wrote:
>>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>> 64K is large enough to bust modern L1 caches and old L2 caches.  Make 
>>> the
>>> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
>>> when you are transfering 64K items per interrupt.
>>
>> How cache size related to it, if DMA transfers data directly to RAM? 
>> Sure, CPU will invalidate related cache lines, but why it should 
>> invalidate everything?
> 
> I was thinking more of transfers to userland.  Increasing user buffer
> sizes above about half the L2 cache size guarantees busting the L2
> cache, if the application actually looks at all of its data.  If the
> data is read using read(), then the L2 cache will be busted twice (or
> a bit less with nontemporal copying), first by copying out the data
> and then by looking at it.  If the data is read using mmap(), then the
> L2 cache will only be busted once.  This effect has always been very
> noticeable using dd.  Larger buffer sizes are also bad for latency.
> 
>> Small transfers give more work to all levels from GEOM down to 
>> CAM/ATA, controllers and drives. It is not just a context switching.
> 
> Yes, I can't see any cache busting below the level of copyout().  Also,
> after you convert all applications to use mmap() instead of read(),
> the cache busting should become per-CPU.

As soon as file data usually passing via buffer cache, they will anyway 
be read to the different memory areas and copied-out from them. So I 
don't see much difference there between doing single big and several 
small transactions. Cache trashing by user-level also will depends only 
on user-level application buffer size, but not on kernel.

How to reproduce that dd experiment? I have my system running with 
MAXPHYS of 512K and here is what I have:

# dd if=/dev/ada0 of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=256k count=2000
2000+0 records in
2000+0 records out
524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=128k count=4000
4000+0 records in
4000+0 records out
524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=64k count=8000
8000+0 records in
8000+0 records out
524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)

CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing 
effect will only be noticeable at block comparable to cache size, but 
modern CPUs have megabytes of cache.

-- 
Alexander Motin