From owner-cvs-all@FreeBSD.ORG Mon Nov 28 18:51:59 2005 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 411C216A422; Mon, 28 Nov 2005 18:51:59 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id C7F5943D68; Mon, 28 Nov 2005 18:51:51 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 92EA7BC66; Mon, 28 Nov 2005 18:51:48 +0000 (UTC) To: Robert Watson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Sun, 27 Nov 2005 01:03:59 GMT." <20051127005622.H81764@fledge.watson.org> Date: Mon, 28 Nov 2005 19:51:48 +0100 Message-ID: <5744.1133203908@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/sys time.h src/sys/kern kern_time.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Nov 2005 18:51:59 -0000 This is a joint reply to all that has piled up in my mail-box on this topic while I was being Robert Watson at EuroBSDcon2005[1]. First Bruce: >(1) tc_windup() has no explicit locking, so it can run concurrently > on any number of CPUs, with N-1 of the CPUs calling it via > clock_settime() and 1 calling it via hardclock (this one may also > be the same as one already in it). I doubt that the generation > stuff is enough to prevent problems here, especially with bug (2). It is not as severe as you try to make it sound, but a mutex would probably be in order at some point. >(2) The generation count stuff depends on writes as seen by other CPUs > being ordered. This is not true on many current CPUs. E.g., on > amd64, writes are ordered as seen by readers on the current CPU, > but they may be held in a buffer and I think the buffer can be > written an any order to main memory. I think this only gives a > tiny race window. There is a mutex lock in all (?) execution paths > soon after tc_windup() returns, and this serves to synchronize writes. Yes, a write barrier have been on my todolist for some time here. Your observations about how out of whac^H^H^H^Hstep things are these days is seconded apart from the bit about deliberately making it worse. The fact that your own fix cost 8% in performance is very much support to my opinion that any attempt to speed it up by adding complexity is doomed from the start. Then Robert (on programs with event engines): Yes, event engines have an issue here and yes a fast 1/HZ clock would be nice, but if we also move in the direction of a precise timeout using HPET like hardware for deadline interrupting, then 1/HZ will probably belowered significantly and it will almost certainly no longer be the number we are looking for. That is why I clipped the get*time() family to aim for "up to 1 ms" precision. > BTW, simple loopback network testing seems to dramatically confirm that > the impact of time measurement and context switching is quite significant. This is why I decided long time ago to implement timestamps in a way that would not require or trigger context switches. Getting timestamps is a lock-less process, provided you have non-neandertal hardware (ie: almost anything but i8254 timecounter). With respect to the timekeeping inherent in the context-switch, I think we have a concensus on redefining CPU seconds in times(2) to something sensible when faced with variable CPU clock rate, and that should hopefully lower the cost of context switches. I hope to spew out a proof of concept patch this week. Then Bruce on event engines: >I can see a use for making a timestamp after select() returns, not for >timeout purposes since the timeout should normally be for emergencies and >it's relative so it doesn't need the current time, but just to record when >things happen. This is unfortunately a too simplistic view of event engines. If timeouts were uniformly long, we could ignore the runtime of the programs event handlers, but this is not the case in practice. I've looked a lot at this in the ISC eventlib (bind8) but there is no way to save one timestamp per iteration without getting creeping imprecision in the timer controlled events. >The environment variable (or a sysctl/sysconf variable like vfs.timestamp_ >precision but per-process or per-user) is probably needed, since you don't >want to teach all applications about unportable CLOCK_*. This was my first suggestion as well. I will however defer to anybody who is going to actually fix the ports. Poul-Henning [1] Yes, great conference, you missed out. We beat OpenBSD approx 2:1 on the beer drinking contest and it seems the only reason DF didn't have an empty glass was a couple of "non-judgemental" participants who did one for all of the five projects :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.