From owner-freebsd-performance@FreeBSD.ORG Tue Jan 1 22:27:39 2008 Return-Path: Delivered-To: freebsd-performance@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3168316A46C; Tue, 1 Jan 2008 22:27:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au [211.29.132.197]) by mx1.freebsd.org (Postfix) with ESMTP id BB05613C45D; Tue, 1 Jan 2008 22:27:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m01MRKXM006453 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 2 Jan 2008 09:27:26 +1100 Date: Wed, 2 Jan 2008 09:27:20 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Jeff Roberson In-Reply-To: <20080101022655.S957@desktop> Message-ID: <20080102084139.X12725@delplex.bde.org> References: <20071204130810.GA77186@harmless.hu> <47779AA7.2060801@FreeBSD.org> <20071230132451.GA61295@harmless.hu> <47779EBC.5020900@FreeBSD.org> <20071230134354.GA63555@harmless.hu> <4777A65C.8020406@FreeBSD.org> <20071230141118.GA67574@harmless.hu> <4777AB9C.1010003@FreeBSD.org> <4779BBE8.2050608@FreeBSD.org> <20080101122249.GA81405@harmless.hu> <20080101022655.S957@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Gergely CZUCZY , Kris Kennaway , freebsd-performance@FreeBSD.org, Ivan Voras Subject: Re: mysql scaling questions X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jan 2008 22:27:39 -0000 On Tue, 1 Jan 2008, Jeff Roberson wrote: > On Tue, 1 Jan 2008, Gergely CZUCZY wrote: >> There's this SYSCALL CPU extension with the SYSENTER/SYSEXIT features. IIRC >> Linux takes advantage of this, while FreeBSD doesn't. I might be wrong >> here, >> of course. > > This is true on 32bit x86 and not true on amd64/x86_64. On 32bit x86 > platforms our syscalls cost about 750 cycles more due to using int0x80. > Various patches have been around for a while to implement sysenter/sysexit > support but it's difficult to get compatibility right and probably not worth > it now that everyone is moving to 64bit. No, syscalls on i386 UP take about 65 cyles _less_ than on amd64, due mainly to 64-bit code and data being larger. A syscall takes about 385 cycles on an A64 running i386 UP (0.17us @ 2.205GHz), so it can't possibly take 750 cycles more than on the same A64 running amd64 UP (0.20us @ 2.205GHz). I think SYSENTER/SYSEXIT saves more like 7.5 or 75 cycles and thus compensates for some of the 64-bit overhead, else amd64 would be even slower. I don't have documents or measurements for current int0x80 or SYS* times -- on i486, int0x80 takes about 80 cycles and iret takes about the same, so the total overhead from the bad hardware interface is about half of the total syscall overerhead. The times 0.17us and 0.20us are from lmbench2 doing a COMPAT_43 getppid(). As is well known, getppid() is a better benchmark than getpid() since it is much harder for libraries to cache (since the parent may change to init at any time). In FreeBSD, it always does proc locking, while getpid() only does proc locking if COMPAT_43. But the overhead for uncontested locking on UP is in the noise -- it is about 5-10 cycles on this hardware. lmbench2 is not up to date enough enough to report things with nanoseconds resolution. I have more accurate measurements for clock_gettime(). After some optimizations, clock_gettime() timing itself takes an average of 233ns in my version of 5.2 and 250-260ns in -current, both on i386 UP @2.205GHz. Linux-2.6.10 i386 UP takes 0.13us for getpid() on slightly different hardware (AXP 2.223GHz) where FreeBSD i386 UP takes slightly longer than on the A64 (0.17-0.18us). Not a big difference. The difference is more interesting for the even-more-bogus "null I/O" micro-benchmark. This writes 1 byte to /dev/null. Linux used to be 4-5 times faster on this (on the AXP, in 0.16us in Linux-2.3.99 vs 0.90us in FreeBSD-~5.2), but Linux has been speeded down (0.19us in Linux-2.6.10) and FreeBSD has been speeded up (0.33us on the A64 in -current). I consider the speedups bogus since they consist of combining/avoiding vfs layers for devices only. The usual case of (cached) file i/o remains unnecessarily slow. (For most devices, and for uncached file i/o, the hardware part is necessarily slow, so optimization of the software hardly matters.) Bruce