From owner-freebsd-current@FreeBSD.ORG Sun Feb 8 01:45:48 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0427F16A4CE; Sun, 8 Feb 2004 01:45:48 -0800 (PST) Received: from VARK.homeunix.com (adsl-68-122-2-18.dsl.pltn13.pacbell.net [68.122.2.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id A854B43D1F; Sun, 8 Feb 2004 01:45:47 -0800 (PST) (envelope-from das@FreeBSD.ORG) Received: from VARK.homeunix.com (localhost [127.0.0.1]) by VARK.homeunix.com (8.12.10/8.12.10) with ESMTP id i189jbKW014996; Sun, 8 Feb 2004 01:45:37 -0800 (PST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.homeunix.com (8.12.10/8.12.10/Submit) id i189jbnG014995; Sun, 8 Feb 2004 01:45:37 -0800 (PST) (envelope-from das@FreeBSD.ORG) Date: Sun, 8 Feb 2004 01:45:37 -0800 From: David Schultz To: Poul-Henning Kamp Message-ID: <20040208094537.GA14749@VARK.homeunix.com> Mail-Followup-To: Poul-Henning Kamp , Jun Su , tjr@FreeBSD.ORG, current@FreeBSD.ORG, jhb@FreeBSD.ORG References: <20040208080630.GA14364@VARK.homeunix.com> <16215.1076229779@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16215.1076229779@critter.freebsd.dk> cc: jhb@FreeBSD.ORG cc: current@FreeBSD.ORG cc: tjr@FreeBSD.ORG cc: Jun Su Subject: Re: PID Allocator Performance Results (was: Re: [UPDATE] new pid alloc...) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Feb 2004 09:45:48 -0000 On Sun, Feb 08, 2004, Poul-Henning Kamp wrote: > In message <20040208080630.GA14364@VARK.homeunix.com>, David Schultz writes: > >I spent some time today benchmarking the various proposed pid > >allocators. The full results, along with pretty pictures and a > >more complete analysis, are at: > > > > http://people.freebsd.org/~das/pbench/pbench.html [1] > > You _do_ realize that the difference between "tjr" and "net" in the > bottom plot is not statistically significant ? > > Stratification is visibly present from approx 1500 pids and up, and > ends up being responsible for 1/3rd of the difference by the time > you get to 5000 pids. > > (The tell-tale sign here is that the two data sets both fall on two > mostly straight lines in a random looking pattern, with practically > no measurements hitting the interval between the two lines.) > > If we assume the stratification has linearity with number of pids, > which I think looks reasonable, and we read the right hand edge as > half a second and the left hand edge as zero, we find: > > (.5 - 0) [second] > -------------------------------------- = 10 [nsec] / [iteration*pid] > 10000 [iterations] * (5000 - 0) [pids] > > 10nsec per operation is getting you into the territory of effective > TSC-timecounter resolution, RAM access time, cache miss delays > and all sorts of other hardware effects. To avoid jitter and timestamping overhead, I read the time only at the start and end of the entire sequence of 10000 operations. I obtained the sample variance by running the entire test three times, i.e. for (pass = 0; pass < 3; pass++) { for (nprocs = 100; nprocs < 5000; nprocs++) { set up the test, fork the sleepers; take starting timestamp; for (iter = 0; iter < 10000; iter++) run test; take ending timestamp; } } Nevertheless, you're definitely right about the stratification. I'm not sure how to explain that. My best theory is that there's some confounding factor, such as the pageout daemon waking up, that has a constant overhead. If you look at the original samples, there's always two that are normal and one outlier. More samples would probably correct for this, and if I were running the benchmark again, I would have done more iterations of the outer loop in the pseudocode above and fewer of the inner loop. > So all in all, I would say that you have proven that "tjr" and "net" > are better than "old", but not that there is any statistically > significant performance difference between them. Yes, I realize that. I took 10 more samples of 10000 forks each with 5000 sleeping processes in the background and got the following: tjr: 1.130558492 1.125901197 1.144079485 1.118981882 1.131435699 1.123052511 1.133321135 1.121301171 1.133015788 1.124377539 net: 1.116848091 1.119333603 1.117941526 1.121989527 (got an outlier (2.547301682) here, so I reran this test) 1.118023912 1.110658198 1.126021045 1.106436712 1.116406694 1.100889638 This data show a difference at the 95% confidence level, namely, that the NetBSD algorithm is about 1% faster on a system with 5000 processes (and only 0.1% faster if you're looking at the total overhead of fork() rather than vfork().) I think that pretty much rules out performance as the deciding factor between the two.