From owner-freebsd-current@FreeBSD.ORG Thu Oct 27 14:27:51 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5B4F16A420; Thu, 27 Oct 2005 14:27:51 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 20AC343D5D; Thu, 27 Oct 2005 14:27:48 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 23CBFBC7A; Thu, 27 Oct 2005 14:27:47 +0000 (UTC) To: Robert Watson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Thu, 27 Oct 2005 14:04:05 BST." <20051027140031.L32255@fledge.watson.org> Date: Thu, 27 Oct 2005 16:27:46 +0200 Message-ID: <23153.1130423266@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: David Xu , Marian Hettwer , current@freebsd.org Subject: Re: MySQL Performance 6.0rc1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2005 14:27:51 -0000 In message <20051027140031.L32255@fledge.watson.org>, Robert Watson writes: There are a several things we can do to speed up our timekeeping code without affecting its integrity: For instance: * Userland-only timestamp facility, provided the hardware is available from userland (TSC is, i8254 isn't, ACPI normally isn't and HPET will be, so it's roughly a 50% hit there). * Additional CLOCK_FOO values for various degraded but fast timestamps. Unfortunately, they either force intense versioning of libc or application source-code changes, so neither is very desirable. In addition to this there are a couple of kernel only optimizations I have always tried to avoid: * Inline assembler for timecounter math. The 'C' language is notoriously bad at expressing the simple concept of a carry and some of the multiplications could be truncated intelligently, but I far prefer simple and portable C to complex assembly. * Cache+Reuse of timestamps in the kernel. It's very hard to cheaply determine when the cached timestamp is "too old" and it may require locking to work in the first place because per-CPU caches would probably not give enough hits to be worth it. Before we go any further, let me remind you that our current timecounter code does not use intra-CPU locks, provided the hardware does not need locks. Many if not most of the more radical ideas, TSC based two-clock interpolation for instance, would require intra-CPU locks to prevent against time-travel and excessive jitter. It is also important to remember that no matter what we do, a significant part of the overhead will still be the 'read-the-hardware-step' For instance I just benchmarked the state-of-the-art HPET facility on the two of my machines that have it, and found that it took 500 and 1400 nsec respectively to read them. (HPET timecounter code will arrive in -current RSN). >Sadly, POSIX doesn't say anything about how applications can express >preferences about the cost and granularity of time measurement. Yes, in addition to their other defficiencies [1] the APIs are somewhat limited in what they can express. I've often thought about inventing a new API to solve these problems, it doesn't take much to do it right, but I have never carried through on it because adding yet another "FreeBSD-propriety" API is not the solution we're looking for. Poul-Henning [1] a) Totally bogus leap second handling. b) PseudoQuasiDecimal formats (also on binary computers) c) Lack of traceability, and quality information. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.