From owner-svn-src-head@FreeBSD.ORG Mon May 30 15:49:17 2011 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE740106564A; Mon, 30 May 2011 15:49:17 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2138A8FC1C; Mon, 30 May 2011 15:49:16 +0000 (UTC) Received: by wyf23 with SMTP id 23so3686701wyf.13 for ; Mon, 30 May 2011 08:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=YAqFh3aFa6MNx5jmjC6TLgYxL1IIAIU+Dr+5wnZWCxU=; b=lpy921bG+0UkUxvVi8f/YBDjE9Puq+ZKK/x8F+u3UESLgl+7p/tvirg0A9Ua05SAWB PAqoYgOgvi4Sa9os4yyQS3ceg0NNvS5/EX1Bg40C3f/81nz/RlrLhYiQOVr9fgOfqxOF g46rQSjjUZzR3FTzhyFJgWVYyLTQ3QmaPa3Bs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=mYronCzc3H/gKXJUeunQekyMHunVTtrM2cEBEZXHH2eoZicHPGD8BDCAnV0x07Krim buGhHSQUETncbuAguMAbiFCjPlsN7w9VrtRDOcNf1Er967mchjA1k0Ff4q0GgdxkaBRN VHoeqd0krhvVBK6Kfvoprew51clsCw0dooLG8= MIME-Version: 1.0 Received: by 10.216.141.1 with SMTP id f1mr5020365wej.35.1306770555897; Mon, 30 May 2011 08:49:15 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.216.93.193 with HTTP; Mon, 30 May 2011 08:49:15 -0700 (PDT) In-Reply-To: <20110531004247.C4034@besplex.bde.org> References: <201105131848.p4DIm1j7079495@svn.freebsd.org> <201105282103.43370.pieter@degoeje.nl> <20110531004247.C4034@besplex.bde.org> Date: Mon, 30 May 2011 08:49:15 -0700 X-Google-Sender-Auth: aQwvUUlVZ5VveKrt-oKfx_ahp4c Message-ID: From: mdf@FreeBSD.org To: Bruce Evans Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: svn-src-head@freebsd.org, Pieter de Goeje , svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r221853 - in head/sys: dev/md dev/null sys vm X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 May 2011 15:49:18 -0000 On Mon, May 30, 2011 at 8:25 AM, Bruce Evans wrote: > On Sat, 28 May 2011 mdf@FreeBSD.org wrote: > >> On Sat, May 28, 2011 at 12:03 PM, Pieter de Goeje >> wrote: >>> >>> On Friday 13 May 2011 20:48:01 Matthew D Fleming wrote: >>>> >>>> Author: mdf >>>> Date: Fri May 13 18:48:00 2011 >>>> New Revision: 221853 >>>> URL: http://svn.freebsd.org/changeset/base/221853 >>>> >>>> Log: >>>> =A0 Usa a globally visible region of zeros for both /dev/zero and the = md >>>> =A0 device. =A0There are likely other kernel uses of "blob of zeros" t= han >>>> can >>>> =A0 be converted. >>>> >>>> =A0 Reviewed by: =A0 =A0 =A0 =A0alc >>>> =A0 MFC after: =A01 week >>> >>> This change seems to reduce /dev/zero performance by 68% as measured by >>> this >>> command: dd if=3D/dev/zero of=3D/dev/null bs=3D64k count=3D100000. >>> >>> x dd-8-stable >>> + dd-9-current >>> >>> +----------------------------------------------------------------------= ---+ >>> |+ >>> =A0| > > Argh, hard \xa0. > > [...binary garbage deleted] > >>> This particular measurement was against 8-stable but the results are th= e >>> same >>> for -current just before this commit. Basically througput drops from >>> ~13GB/sec to 4GB/sec. >>> >>> Hardware is a Phenom II X4 945 with 8GB of 800Mhz DDR2 memory. >>> FreeBSD/amd64 >>> is installed. This processor has 6MB of L3 cache. >>> >>> To me it looks like it's not able to cache the zeroes anymore. Is this >>> intentional? I tried to change ZERO_REGION_SIZE back to 64K but that >>> didn't >>> help. >> >> Hmm. =A0I don't have access to my FreeBSD box over the weekend, but I'll >> run this on my box when I get back to work. >> >> Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I >> think that will restore things to the original performance. > > Using /dev/zero always thrashes caches by the amount size> + (unless the arch uses nontemporal memory > accesses for uiomove, which none do AFAIK). =A0So a large source buffer > is always just a pessimization. =A0A large target buffer size is also a > pessimization, but for the target buffer a fairly large size is needed > to amortize the large syscall costs. =A0In this PR, the target buffer > size is 64K. =A0ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. =A064K+6= 4K > on i386 is good for thrashing the L1 cache. That depends -- is the cache virtually or physically addressed? The zero_region only has 4k (PAGE_SIZE) of unique physical addresses. So most of the cache thrashing is due to the user-space buffer, if the cache is physically addressed. =A0It will only have a > noticeable impact on a current L2 cache in competition with other > threads. =A0It is hard to fit everything in the L1 cache even with > non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0 > for the source (D)cache and 4K for the target cache might work). =A0On > amd64, 2M+2M is good for thrashing most L2 caches. =A0In this PR, the > thrashing is limited by the target buffer size to about 64K+64K, up > from 4K+64K, and it is marginal whether the extra thrashing from the > larger source buffer makes much difference. > > The old zbuf source buffer size of PAGE_SIZE was already too large. Wouldn't this depend on how far down from the use of the buffer the actual copy happens? Another advantage to a large virtual buffer is that it reduces the number of times the copy loop in uiomove has to return up to the device layer that initiated the copy. This is all pretty fast, but again assuming a physical cache fewer trips is better. Thanks, matthew > The source buffer size only needs to be large enough to amortize > loop overhead. =A01 cache line is enough in most cases. =A0uiomove() > and copyout() unfortunately don't support copying from register > space, so there must be a source buffer. =A0This may limit the bandwidth > by a factor of 2 in some cases, since most modern CPUs can execute > either 2 64-bit stores or 1 64-bit store and 1 64-bit load per cycle > if everything is already in the L1 cache. =A0However, target buffers > for /dev/zero (or any user i/o) probably need to be larger than the > L1 cache to amortize the syscall overhead, so there are usually plenty > of cycles to spare for the unnecessary loads while the stores wait for > caches. > > This behaviour is easy to see for regular files too (regular files get > copied out from the buffer cache). =A0You have limited control on the > amount of thrashing by changing the target buffer size, and can determine > cache sizes by looking at throughputs. > > Bruce