From owner-freebsd-arch@FreeBSD.ORG Sun Jul 5 17:12:16 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75AD01065670 for ; Sun, 5 Jul 2009 17:12:16 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id B4AE58FC17 for ; Sun, 5 Jul 2009 17:12:15 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from [212.86.226.226] (account mav@alkar.net HELO mavbook.mavhome.dp.ua) by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9) with ESMTPSA id 247691562; Sun, 05 Jul 2009 20:12:12 +0300 Message-ID: <4A50DEE8.6080406@FreeBSD.org> Date: Sun, 05 Jul 2009 20:12:08 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.21 (X11/20090405) MIME-Version: 1.0 To: Bruce Evans References: <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org> <4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org> In-Reply-To: <20090706005851.L1439@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org Subject: Re: DFLTPHYS vs MAXPHYS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2009 17:12:16 -0000 Bruce Evans wrote: > On Sun, 5 Jul 2009, Alexander Motin wrote: >> Bruce Evans wrote: >>> On Sun, 5 Jul 2009, Alexander Motin wrote: >>> 64K is large enough to bust modern L1 caches and old L2 caches. Make >>> the >>> size bigger to bust modern L2 caches too. Interrupt rates don't matter >>> when you are transfering 64K items per interrupt. >> >> How cache size related to it, if DMA transfers data directly to RAM? >> Sure, CPU will invalidate related cache lines, but why it should >> invalidate everything? > > I was thinking more of transfers to userland. Increasing user buffer > sizes above about half the L2 cache size guarantees busting the L2 > cache, if the application actually looks at all of its data. If the > data is read using read(), then the L2 cache will be busted twice (or > a bit less with nontemporal copying), first by copying out the data > and then by looking at it. If the data is read using mmap(), then the > L2 cache will only be busted once. This effect has always been very > noticeable using dd. Larger buffer sizes are also bad for latency. > >> Small transfers give more work to all levels from GEOM down to >> CAM/ATA, controllers and drives. It is not just a context switching. > > Yes, I can't see any cache busting below the level of copyout(). Also, > after you convert all applications to use mmap() instead of read(), > the cache busting should become per-CPU. As soon as file data usually passing via buffer cache, they will anyway be read to the different memory areas and copied-out from them. So I don't see much difference there between doing single big and several small transactions. Cache trashing by user-level also will depends only on user-level application buffer size, but not on kernel. How to reproduce that dd experiment? I have my system running with MAXPHYS of 512K and here is what I have: # dd if=/dev/ada0 of=/dev/null bs=512k count=1000 1000+0 records in 1000+0 records out 524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=256k count=2000 2000+0 records in 2000+0 records out 524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=128k count=4000 4000+0 records in 4000+0 records out 524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec) # dd if=/dev/ada0 of=/dev/null bs=64k count=8000 8000+0 records in 8000+0 records out 524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec) CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing effect will only be noticeable at block comparable to cache size, but modern CPUs have megabytes of cache. -- Alexander Motin