From owner-freebsd-arch@FreeBSD.ORG Fri Jun 8 09:16:24 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9F329106574B; Fri, 8 Jun 2012 09:16:24 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 2809B8FC17; Fri, 8 Jun 2012 09:16:23 +0000 (UTC) Received: from c122-106-171-232.carlnfd1.nsw.optusnet.com.au (c122-106-171-232.carlnfd1.nsw.optusnet.com.au [122.106.171.232]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q589G717025135 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 8 Jun 2012 19:16:09 +1000 Date: Fri, 8 Jun 2012 19:16:07 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov In-Reply-To: <20120607100401.GW85127@deviant.kiev.zoral.com.ua> Message-ID: <20120608185204.T1708@besplex.bde.org> References: <201206051008.29568.jhb@freebsd.org> <86haupvk4a.fsf@ds4.des.no> <201206051222.12627.jhb@freebsd.org> <20120605171446.GA28387@onelab2.iet.unipi.it> <20120606040931.F1050@besplex.bde.org> <864nqovoek.fsf@ds4.des.no> <20120607064951.C1106@besplex.bde.org> <86sje7sf31.fsf@ds4.des.no> <20120607100401.GW85127@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Dag-Erling Sm??rgrav , freebsd-arch@freebsd.org Subject: Re: Fast vs slow syscalls (Re: Fwd: [RFC] Kernel shared variables) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jun 2012 09:16:24 -0000 On Thu, 7 Jun 2012, Konstantin Belousov wrote: > On Thu, Jun 07, 2012 at 10:26:10AM +0200, Dag-Erling Sm??rgrav wrote: >> Bruce Evans writes: >>> Now 2.44 nsec/call makes sense, but you really should add some volatiles >>> here to ensure that getpid() is not optimized away. >> >> As you can see from the disassembly I provided, it isn't. >> >>> SO it loops OK, but we can't see what getpid() does. It must not be >>> doing much. >> >> Umm, yes, that's the whole point of this conversation. Linux's getpid() >> is not a syscall, but a library function that returns a constant from a >> page shared by the kernel. Of course, but were down to nearly single-cycle times, so the difference between the libary function using 1 or 2 instructions to load the value may be significant. >>> 5.4104 nsec/call for gettimeofday() is impossible if there is any >>> rdtsc() hardware call or much layering. >> >> It's gettimeofday(0, 0), actually, so it doesn't need to read the clock. >> If I pass a struct timeval as the first argument - so it *does* need to >> read the clock - it's a little bit slower but still faster than an >> actual system call. Here's another run that demonstrates this - a >> little bit slower than previous runs because I have other processes >> running: >> >> getpid(): 10,000,000 iterations in 30,377 us >> gettimeofday(0, 0): 10,000,000 iterations in 55,571 us >> gettimeofday(&tv, 0): 10,000,000 iterations in 302,634 us > So this timing seems to be approximately same by the order of magnitude > as the times I get for the patch, around 25 vs. 30ns/per gettimeofday() > call. Not great. I get 6.97 nsec for a slightly reduced version of FreeBSD's 1998 version of microtime(), which was written in i386 asm. (This depends on rdtsc taking only 6.5 cycles = 3.25 nsec on the test CPU (Athlon64)). >From rev.1.40 of microtime.s: % #include % % ENTRY(microtime) % movl tsc_freq, %ecx % testl %ecx, %ecx % je i8254_microtime This branch is predicted perfectly but costs 0.24 nsec (0.5 cycles). % rdtsc % subl tsc_bias, %eax % mull tsc_multiplier % movl %edx, %eax % addl timeoff+4, %eax /* usec += time.tv_sec */ % movl timeoff, %edx /* sec = time.tv_sec */ Similar to binuptime(). To convert from the old microtime.s, I just converted the variable names from aout to elf (and supplied dummy variables), and removed locking instructions, which were pushfl/cli/popfl). % % cmpl $1000000, %eax /* usec valid? */ % jb 1f % subl $1000000, %eax /* adjust usec */ % incl %edx /* bump sec */ Probably faster with bintimes (can be branch-free then (?)), but by converting directly to the final format we avoid a scaling step. The branch in it is predicted too perfectly by my dummy variables. % 1: % movl 4(%esp), %ecx /* load timeval pointer arg */ % movl %edx, (%ecx) /* tvp->tv_sec = sec */ % movl %eax, 4(%ecx) /* tvp->tv_usec = usec */ % % ret % % i8254_microtime: % ret /* XXX garbage */ > > Linux seems slower probably due to slower CPU ? Mine is 3.4Ghz, while > des used 3.1Ghz for Linux box. If it is a different CPU model, the the speed of rdtsc can vary a lot. Bruce