From owner-svn-src-all@FreeBSD.ORG Fri Jun 21 06:49:10 2013 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 409B1120; Fri, 21 Jun 2013 06:49:10 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) by mx1.freebsd.org (Postfix) with ESMTP id B3ECE1CD9; Fri, 21 Jun 2013 06:49:08 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.7/8.14.7) with ESMTP id r5L6n1Ad019590; Fri, 21 Jun 2013 10:49:01 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.7/8.14.7/Submit) id r5L6n1Qe019589; Fri, 21 Jun 2013 10:49:01 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Fri, 21 Jun 2013 10:49:01 +0400 From: Gleb Smirnoff To: Bruce Evans Subject: Re: svn commit: r252032 - head/sys/amd64/include Message-ID: <20130621064901.GS1214@FreeBSD.org> References: <201306201430.r5KEU4G5049115@svn.freebsd.org> <20130621065839.J916@besplex.bde.org> <20130621081116.E1151@besplex.bde.org> <20130621090207.F1318@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20130621090207.F1318@besplex.bde.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Konstantin Belousov X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jun 2013 06:49:10 -0000 Bruce, On Fri, Jun 21, 2013 at 09:04:34AM +1000, Bruce Evans wrote: B> >> The i386 version of the counter asm doesn't support the immediate B> >> constraint for technical reasons. 64 bit counters are too large and B> >> slow to use on i386, especially when they are implemented as they are B> >> without races. B> > B> > Actual testing showed that it is only about twice as slow as a direct B> > increment. With the enclosed test program (a userland version hacked B> > on a bit to avoid pcpu), on ref10-i386 the times are: B> > - loop overhead: 1 cycle B> > - direct unlocked increment of a uint32_t: 6 cycles B> > - direct unlocked increment of a uint64_t: 7 cycles B> > - non-inline function unlocked increment of a uint64_t: 7.5 cycles B> > - counter_u64_add(): 14 cycles B> > - non-inline counter_u64_add(): 18 cycles B> > ... B> B> Actually enclosing the test program: B> B> % #include B> % #include B> % B> % static inline void B> % counter_64_inc_8b(volatile uint64_t *p, int64_t inc) B> % { B> % B> % __asm __volatile( B> % "movl %%ds:(%%esi),%%eax\n\t" B> % "movl %%ds:4(%%esi),%%edx\n" B> % "1:\n\t" B> % "movl %%eax,%%ebx\n\t" B> % "movl %%edx,%%ecx\n\t" B> % "addl (%%edi),%%ebx\n\t" B> % "adcl 4(%%edi),%%ecx\n\t" B> % "cmpxchg8b %%ds:(%%esi)\n\t" B> % "jnz 1b" B> % : B> % : "S" (p), "D" (&inc) B> % : "memory", "cc", "eax", "edx", "ebx", "ecx"); B> % } B> % B> % uint32_t cpu_feature = 1; B> % B> % typedef volatile uint64_t *counter_u64_t; B> % B> % static void B> % #if 1 B> % inline B> % #else B> % __noinline B> % #endif B> % counter_u64_add(counter_u64_t c, int64_t inc) B> % { B> % B> % #if 1 B> % if ((cpu_feature & 1) == 1) { B> % counter_64_inc_8b(c, inc); B> % } B> % #elif 0 B> % if ((cpu_feature & 1) == 1) { B> % *c += inc; B> % } B> % #else B> % *c += inc; B> % #endif B> % } B> % B> % uint64_t mycounter[1]; B> % B> % int B> % main(void) B> % { B> % unsigned i; B> % B> % for (i = 0; i < 1861955704; i++) /* sysctl -n machdep.tsc_freq */ B> % counter_u64_add(mycounter, 1); B> % printf("%ju\n", (uintmax_t)mycounter[0]); B> % } Yes, for a single threaded userland program using "+=" is faster than all the magic that counter(9) does. But when multiple threads need to access one counter "+=" fails both with precision and with performance. Using "+=" upon a per-CPU counter is racy, since += is compiled into "load", "increment", "store" sequence and if we are not in a critical section, then this is racy. We might be removed from CPU between load and store. Entering critical section means modifying curthread, which is again a %gs based load & store. Exiting critical section has the same cost. Thus, we assume that doing a direct %gs based update on the counter is cheaper than critical_enter(); counter += foo; critical_exit(); -- Totus tuus, Glebius.