Date: Wed, 2 Jan 2008 13:58:10 -1000 (HST) From: Jeff Roberson <jroberson@chesapeake.net> To: Bruce Evans <brde@optusnet.com.au> Cc: Gergely CZUCZY <phoemix@harmless.hu>, Kris Kennaway <kris@freebsd.org>, freebsd-performance@freebsd.org, Ivan Voras <ivoras@freebsd.org> Subject: Re: mysql scaling questions Message-ID: <20080102135221.E957@desktop> In-Reply-To: <20080102084139.X12725@delplex.bde.org> References: <20071204130810.GA77186@harmless.hu> <47779AA7.2060801@FreeBSD.org> <20071230132451.GA61295@harmless.hu> <47779EBC.5020900@FreeBSD.org> <20071230134354.GA63555@harmless.hu> <4777A65C.8020406@FreeBSD.org> <20071230141118.GA67574@harmless.hu> <4777AB9C.1010003@FreeBSD.org> <flb6bp$8kq$1@ger.gmane.org> <4779BBE8.2050608@FreeBSD.org> <20080101122249.GA81405@harmless.hu> <20080101022655.S957@desktop> <20080102084139.X12725@delplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2 Jan 2008, Bruce Evans wrote: > On Tue, 1 Jan 2008, Jeff Roberson wrote: > >> On Tue, 1 Jan 2008, Gergely CZUCZY wrote: > >>> There's this SYSCALL CPU extension with the SYSENTER/SYSEXIT features. >>> IIRC >>> Linux takes advantage of this, while FreeBSD doesn't. I might be wrong >>> here, >>> of course. >> >> This is true on 32bit x86 and not true on amd64/x86_64. On 32bit x86 >> platforms our syscalls cost about 750 cycles more due to using int0x80. >> Various patches have been around for a while to implement sysenter/sysexit >> support but it's difficult to get compatibility right and probably not >> worth it now that everyone is moving to 64bit. > > No, syscalls on i386 UP take about 65 cyles _less_ than on amd64, due It is true that we are slower on i386 by not using sysenter and on par on 64bit amd64. > mainly to 64-bit code and data being larger. A syscall takes about > 385 cycles on an A64 running i386 UP (0.17us @ 2.205GHz), so it can't > possibly take 750 cycles more than on the same A64 running amd64 UP > (0.20us @ 2.205GHz). I think SYSENTER/SYSEXIT saves more like 7.5 or > 75 cycles and thus compensates for some of the 64-bit overhead, else > amd64 would be even slower. I don't have documents or measurements > for current int0x80 or SYS* times -- on i486, int0x80 takes about 80 > cycles and iret takes about the same, so the total overhead from the > bad hardware interface is about half of the total syscall overerhead. I have not benchmarked since the P4 days so my data must be grossly out of date. At the time I had a small operating system that I used for benchmarking processor features. I also tested call gates, task gates, etc. I might be confusing the results of one of these tests. Thanks, Jeff > > The times 0.17us and 0.20us are from lmbench2 doing a COMPAT_43 getppid(). > As is well known, getppid() is a better benchmark than getpid() since it > is much harder for libraries to cache (since the parent may change to > init at any time). In FreeBSD, it always does proc locking, while getpid() > only does proc locking if COMPAT_43. But the overhead for uncontested > locking on UP is in the noise -- it is about 5-10 cycles on this hardware. > > lmbench2 is not up to date enough enough to report things with nanoseconds > resolution. I have more accurate measurements for clock_gettime(). > After some optimizations, clock_gettime() timing itself takes an average > of 233ns in my version of 5.2 and 250-260ns in -current, both on i386 UP > @2.205GHz. > > Linux-2.6.10 i386 UP takes 0.13us for getpid() on slightly different > hardware (AXP 2.223GHz) where FreeBSD i386 UP takes slightly longer > than on the A64 (0.17-0.18us). Not a big difference. The difference > is more interesting for the even-more-bogus "null I/O" micro-benchmark. > This writes 1 byte to /dev/null. Linux used to be 4-5 times faster on > this (on the AXP, in 0.16us in Linux-2.3.99 vs 0.90us in FreeBSD-~5.2), > but Linux has been speeded down (0.19us in Linux-2.6.10) and FreeBSD > has been speeded up (0.33us on the A64 in -current). I consider the > speedups bogus since they consist of combining/avoiding vfs layers for > devices only. The usual case of (cached) file i/o remains unnecessarily > slow. (For most devices, and for uncached file i/o, the hardware part is > necessarily slow, so optimization of the software hardly matters.) > > Bruce > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080102135221.E957>