From owner-svn-src-head@FreeBSD.ORG Thu Jun 20 22:45:43 2013 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8ED12195; Thu, 20 Jun 2013 22:45:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 559311B11; Thu, 20 Jun 2013 22:45:42 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 18E03123F17; Fri, 21 Jun 2013 08:45:33 +1000 (EST) Date: Fri, 21 Jun 2013 08:45:32 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: svn commit: r252032 - head/sys/amd64/include In-Reply-To: <20130621065839.J916@besplex.bde.org> Message-ID: <20130621081116.E1151@besplex.bde.org> References: <201306201430.r5KEU4G5049115@svn.freebsd.org> <20130621065839.J916@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=0l9hOOMEwYoA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=gvJhbHXk4isA:10 a=ZWu6qTPcdpYG9e4PV5AA:9 a=CjuIK1q_8ugA:10 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Konstantin Belousov X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jun 2013 22:45:43 -0000 On Fri, 21 Jun 2013, I wrote: > On Thu, 20 Jun 2013, Konstantin Belousov wrote: >> ... >> @@ -44,7 +44,7 @@ counter_u64_add(counter_u64_t c, int64_t >> ... > The i386 version of the counter asm doesn't support the immediate > constraint for technical reasons. 64 bit counters are too large and > slow to use on i386, especially when they are implemented as they are > without races. Actual testing showed that it is only about twice as slow as a direct increment. With the enclosed test program (a userland version hacked on a bit to avoid pcpu), on ref10-i386 the times are: - loop overhead: 1 cycle - direct unlocked increment of a uint32_t: 6 cycles - direct unlocked increment of a uint64_t: 7 cycles - non-inline function unlocked increment of a uint64_t: 7.5 cycles - counter_u64_add(): 14 cycles - non-inline counter_u64_add(): 18 cycles Add many more when critical_enter()/exit() is needed. I thought that a direct increment of a uint32_t took only 3 cycles. This is the documented time for i486. 4 cycles latency is documented for AthlonxXP/64. The carry check for incrementing a uint64_t is pipelined on most modern i386, so it adds very little to this. Nevertheless, the correct implementation of counters, once you have the complexity of a counter API and can't just do counter++, is to use small counters and run a daemon to accumulate them in larger counters before they overflow. pcpu accesses should allow simple counter++ accesses to work for the smaller counters (except their address is in pcpu space). But I don't see how sysctl accesses can work without lots of context switches to reach strictly per-CPU context. The current accumulation of pcpu counters in places like vcnt() doesn't do that -- it accesses pcpu counters for other CPUs, so has races. The races are more serious for accumulating counters into larger ones. Then the smaller ones need to be cleared atomically with copying them. The accumulation daemon(s) can run per-CPU more easily than sysctls, since the daemons don't need to run as frequently as sysctls might. Bruce