From owner-freebsd-arch@FreeBSD.ORG Thu Mar 1 05:42:48 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C4E0106564A for ; Thu, 1 Mar 2012 05:42:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id BACF68FC14 for ; Thu, 1 Mar 2012 05:42:47 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q215gD7w009742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 1 Mar 2012 16:42:44 +1100 Date: Thu, 1 Mar 2012 16:42:13 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120301143042.F2406@besplex.bde.org> Message-ID: <20120301161011.A2654@besplex.bde.org> References: <20120229194042.GA10921@onelab2.iet.unipi.it> <20120301071145.O879@besplex.bde.org> <20120301012315.GB14508@onelab2.iet.unipi.it> <20120301132806.O2255@besplex.bde.org> <20120301143042.F2406@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@FreeBSD.org Subject: Re: select/poll/usleep precision on FreeBSD vs Linux vs OSX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2012 05:42:48 -0000 On Thu, 1 Mar 2012, Bruce Evans wrote: > On Thu, 1 Mar 2012, Bruce Evans wrote: > >> ... >> Bakul Shah confirmed that Linux now reprograms the timer. It has to, >> for a tickless kernel. FreeBSD reprograms timers too. I think you >> can set HZ large and only get timeout interrupts at that frequency if >> there are active timeouts that need them. Timeout granularity is still >> 1/HZ. > > I tried this in -current and in a 2008 -current with hz=10000. It worked > mediocrely: > - the 2008 version gave lapic cpuN: timer interrupts on all CPUs at > frequency of almost exactly 10 kHz. This is the behaviour before > FreeBSD reprogrammed timers (except the frequency is often off by > as much as 10% due to calibration bugs). There were many anomolies > in the results from the test program (like select() adding 199 usec > and usleep() adding 999 usec). > - [... no surprises in -current] I tried this in -current with hz=100000. This gives (some not very surprising) behaviour: - systat claims ~100% idle, but the ~100k interrupts on 1 CPU actually reduces performance by 33% (two CPUs take 30 seconds user time to do what can be done in 20 seconds user time with hz=100). This is a normal problem with fast interrupt handlers. They need a faster interrupt handler to account for them properly. - ./prog 1 select works reasonably. It reports timeouts of 29-30 us. I expected 19-20. - ./prog 1 poll is broken as we know. It asks for timeouts of 0 and takes 3 us. - ./prog 1 usleep shows brokenness. It reports timeouts of 999 us. I think this is due to getnanouptime()'s brokenness. $(sysctl kern.timecounter.tick) is 100. This reduces getnanouptime()'s accuracy back to to 1 msec, which explains the 999 us. But why doesn't select() have the same problem? select() uses getmicrouptime(), but it has the same brokenness. The sysctl is r/o, so I couldn't use it easily. I have changed tc_tick using ddb before, but don't want to risk reducing it by a factor of 100. The timecounter update algorithm depends on the timehands not being recycled too fast, and probably couldn't copy with recycling 100 times faster. - ./prog 1000 select and ./prog 1000 poll take 20 us extra. I expected 9-10 extra. - ./prog 1000 usleep takes 619-693 us extra. Not the full extra 100 ticks from getnanouptime() fuzziness now. - ./prog 500000 usleep takes 500026-500885 us. Even higher variance which agrees with the fuzziness better. select and poll with this timeout still have accuracy and low variance (21-26 us extra). The fuzzy versions are actually useful for optimization after all: - for long timeouts, use the fuzzy versions and accept their inaccuracies. Sleep longer by the amount fuzziness so that sleeps are never too short. - for short timeouts, it seems necessary for the initial timestamp to be accuarate. When checking if the timeout has expired, first try a fuzzy check. This is sufficent if the current fuzzy time is far from the expiry time. Bruce