From owner-freebsd-stable@FreeBSD.ORG Wed Apr 7 16:47:16 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D35216A4CE; Wed, 7 Apr 2004 16:47:16 -0700 (PDT) Received: from mail3.panix.com (mail3.panix.com [166.84.1.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 14F5343D39; Wed, 7 Apr 2004 16:47:16 -0700 (PDT) (envelope-from tls@rek.tjls.com) Received: from panix5.panix.com (panix5.panix.com [166.84.1.5]) by mail3.panix.com (Postfix) with ESMTP id 4CFA698738; Wed, 7 Apr 2004 19:46:33 -0400 (EDT) Received: (from tls@localhost) by panix5.panix.com (8.11.6p2-a/8.8.8/PanixN1.1) id i37NkXp21149; Wed, 7 Apr 2004 19:46:33 -0400 (EDT) Date: Wed, 7 Apr 2004 19:46:33 -0400 From: Thor Lancelot Simon To: current@freebsd.org, netbsd-users@NetBSD.org, stable@freebsd.org, kernel@crater.dragonflybsd.org Message-ID: <20040407234633.GA20155@panix.com> References: <40745C07.6030501@fer.hr> <20040408001205.40c1b163@pheisar> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040408001205.40c1b163@pheisar> User-Agent: Mutt/1.4.2.1i Subject: Re: Benchmarking X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: tls@rek.tjls.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Apr 2004 23:47:16 -0000 On Thu, Apr 08, 2004 at 12:12:05AM +0100, goteki wrote: > On Wed, 07 Apr 2004 21:52:39 +0200 > Ivan Voras wrote: > > > I've finished the article on benchmarking FreeBSD, NetBSD, DragonflyBSD and > > Linux, it is available at: > > > > http://alfredo.cc.fer.hr/ > > > Why didn't you benchmarked netbsd-current? Presumably because it is not a released version of the operating system; though, in that context, benchmarking "DragonflyBSD" seems rather odd, to say the least. What is of much more concern to me, as someone who relies on high-quality benchmark numbers to guide his role in OS development, is the poor methodology of this study, particularly when compared to other recent studies such as Felix von Leitner's (http://bulk.fefe.de/scalability). To me, honestly, this benchmark is not so good, for a number of reasons. Here are three of the most obvious ones: 1) The non-repeatability of results for some tests is merely mentioned in passing, rather than investigated and explained. 2) Of particular concern is the omission of rows from large tables of results because they were "too big" or "too small" to be interpreted meaningfully. The willingness to accept such results, to me, means that I should seriously question whether any attention was given to appropriately sizing _any_ of the components of the benchmark so as to actually measure what is purported to be measured. 2) The inclusion of tests which are intended to measure attributes of *the underlying hardware* in what purports to be an OS benchmark is indicative of poor benchmark design and analysis. In particular, synthetic benchmarks that measure "CPU speed" or "memory bandwidth" are wholly inappropriate in this context; the difference in results indicates both the poor quality of those benchmarks for their actual design purposes (though this is by now well-understood WRT many of the tests in question) and that, in general, this benchmark suite as a whole fails to adequately control (or even acknowledge) a number of variables which may cause what it _actually_ measures to not be what it _purports to measure_. Notable here are compiler, system state at start of test and during test, and the general "entropy" which results from performing even good tests at too small a size (iteration count, memory footprint, etc). In general, though the effort is good, I think overall this "benchmark" shows more about how to not design an OS benchmark than it does about the performance of _any_ of the underlying operating systems. Do note that, actually, NetBSD did somewhat better on this test than we initially did on Felix's -- I'm not slagging this test because we did poorly; in fact, I'm not entirely displeased with how we did. The problem is that, like so many other benchmarks, this one doesn't actually measure what it claims to measure; and so, as an OS developer, it's not very useful to me. Thor