From owner-svn-src-head@FreeBSD.ORG Wed Jun 26 14:24:59 2013 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7B5C4479; Wed, 26 Jun 2013 14:24:59 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 0DC2F19CD; Wed, 26 Jun 2013 14:24:58 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id A789078120C; Thu, 27 Jun 2013 00:24:55 +1000 (EST) Date: Thu, 27 Jun 2013 00:24:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Gleb Smirnoff Subject: Re: svn commit: r252032 - head/sys/amd64/include In-Reply-To: <20130626091055.GU1214@FreeBSD.org> Message-ID: <20130626233533.H2933@besplex.bde.org> References: <20130622124832.S2347@besplex.bde.org> <20130622174921.I3112@besplex.bde.org> <20130623073343.GY91021@kib.kiev.ua> <20130623181458.J2256@besplex.bde.org> <20130624170849.GH91021@kib.kiev.ua> <20130625102023.K899@besplex.bde.org> <20130625062039.GJ91021@kib.kiev.ua> <20130625190352.P986@besplex.bde.org> <20130625205826.GM91021@kib.kiev.ua> <20130626092955.B891@besplex.bde.org> <20130626091055.GU1214@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=eqSHVfVX c=1 sm=1 a=0l9hOOMEwYoA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=gvJhbHXk4isA:10 a=TwPnWy5AyOENhnukqDYA:9 a=CjuIK1q_8ugA:10 a=lQ-GOWw9zyKXZn9M:21 a=D9zNZYRpd28xmzhw:21 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 Cc: Konstantin Belousov , svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jun 2013 14:24:59 -0000 On Wed, 26 Jun 2013, Gleb Smirnoff wrote: > On Wed, Jun 26, 2013 at 11:42:39AM +1000, Bruce Evans wrote: > B> > Anyway, as Gleb said, there is no point in > B> > optimizing the i386 kernel. > B> > B> I said that there is every point in optimizing the i386 kernel. This > B> applies even more to other 32-bit arches. Some CPUs are much slower > B> than modern x86's. They shouldn't be slowed down more by inefficient > B> KPIs. > > I didn't mean that i386 arch is a relic and should be ignored at all. > > What I actually meant, is that the problem of performance drop due to > cache poisoning and loss of statistics with simple "+=" operation can > be observed only at extremely high event rates, with multiple processors > involved. I think you already fixed cache poisoning, and it has nothing to do with whether the access is a simple "+=' operation. amd64 still uses a simple '+=' operation (written in asm so that it uses the right instructions), so the slow cmpxch8b used on i386 can't be doing anything good for the cache. > The counter(9) is solution for these conditions. Thus we are interested > in optimising amd64, not i386. The latter isn't affected neither positively > nor negatively with these changes, just because last i386 CPUs can't reach > the event rates where need for counter(9) arises. Yes, you can tweak > implementation and obtain better results with microbenchmarks, but I bet > that any change in counter(9) implementation won't affect packet forwarding > rate on any i386. What we claim for i386 (and all other arches) that > counter(9) is lossless, and that's all. > > I second to Konstantin, that we don't have objections in any changes to > i386 part of counter, including a daemon, but the changes shouldn't affect > amd64. amd64 should be changed too, to use 32-bit pcpu counters to avoid ifdefs and to use less cache. You can't reach event rates that overflow 32-bit counters faster than a daemon can accumulate them. For example, with 10 Gbps ethernet the maximum packet rate is about 14 Mpps. Suppose that is all handled and counted on 1 CPU, which isn't possible yet. The daemon must run once every 286 seconds to keep up with that. Byte counts are more interesting. Counting 1 G/second of anything requires running the daemon every 4 seconds. I don't remember how you distributed the counters to avoid cache poisoning. Is it one pcpu counter per cache line, so that counters never poison nor benefit from caching for other counters for the same CPU? Or is it just separate cache lines for each CPU? I think the latter. So there can be several 64-bit counters per cache line, or twice as many 32-bit counters. Bruce