From owner-freebsd-arch@FreeBSD.ORG Sun Jan 18 22:25:00 2009 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F08310656C4 for ; Sun, 18 Jan 2009 22:25:00 +0000 (UTC) (envelope-from pho@holm.cc) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.freebsd.org (Postfix) with SMTP id 3635A8FC24 for ; Sun, 18 Jan 2009 22:24:59 +0000 (UTC) (envelope-from pho@holm.cc) Received: (qmail 40640 invoked from network); 18 Jan 2009 22:24:58 -0000 Received: from 87.58.145.190 (HELO x2.osted.lan) (87.58.145.190) by relay03.pair.com with SMTP; 18 Jan 2009 22:24:58 -0000 X-pair-Authenticated: 87.58.145.190 Received: from x2.osted.lan (localhost.osted.lan [127.0.0.1]) by x2.osted.lan (8.14.2/8.14.2) with ESMTP id n0IMOvJu042803; Sun, 18 Jan 2009 23:24:57 +0100 (CET) (envelope-from pho@x2.osted.lan) Received: (from pho@localhost) by x2.osted.lan (8.14.2/8.14.2/Submit) id n0IMOu3s042802; Sun, 18 Jan 2009 23:24:56 +0100 (CET) (envelope-from pho) Date: Sun, 18 Jan 2009 23:24:56 +0100 From: Peter Holm To: Bakul Shah Message-ID: <20090118222456.GA42363@x2.osted.lan> References: <20090118082145.GA18067@x2.osted.lan> <86iqocstjm.fsf@ds4.des.no> <20090118131028.GA26179@x2.osted.lan> <20090118132819.GS48057@deviant.kiev.zoral.com.ua> <20090118140924.GA27264@x2.osted.lan> <20090118201202.674665B61@mail.bitblocks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090118201202.674665B61@mail.bitblocks.com> User-Agent: Mutt/1.4.2.3i Cc: Kostik Belousov , Dag-Erling Sm?rgrav , freebsd-arch@freebsd.org Subject: Re: stress2 is now in projects X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Jan 2009 22:25:00 -0000 On Sun, Jan 18, 2009 at 12:12:02PM -0800, Bakul Shah wrote: > On Sun, 18 Jan 2009 15:09:24 +0100 Peter Holm wrote: > > On Sun, Jan 18, 2009 at 03:28:19PM +0200, Kostik Belousov wrote: > > > On Sun, Jan 18, 2009 at 02:10:28PM +0100, Peter Holm wrote: > > > > On Sun, Jan 18, 2009 at 01:11:25PM +0100, Dag-Erling Sm?rgrav wrote: > > > > > Peter Holm writes: > > > > > > The key functionality of this test suite is that it runs a random > > > > > > number of test programs for a random period, in random incarnations > > > > > > and in random sequence. > > > > > > > > > > In other words, it's non-deterministic and non-reproducable. > > > > > > > > > > > > > Yes, by design. > > > > > > > > > You should at the very least allow the user to specify the random seed. > > > > > > > > > > > > > Yes, it would be interesting to see if this is enough to reproduce a > > > > problem in a deterministic way. I'll look into this. > > > > > > I shall state from my experience using it (or, rather, inspecting bug > > > reports generated by stress2), that in fact it is quite repeatable. > > > I.e., when looking into one area, you almost always get _that_ problem, > > > together with 2-3 related issues. > > > > > > Due to the nature of the tests and kernel undeterministic operations, > > > I think that use of the same random seed gains nothing in regard with > > > repeatability of the tests. > > > > It is an old issue that has come up many times: It would be so great > > if it was possible to some how record the exact sequence that lead up > > to a panic and play it back. > > > > But on the other hand, as you say, it *is* repeatable. The only > > issue is that it may take 5 minutes or 5 hours. > > > > But I'm still game to see if it is possible at all (in single user > > mode with no network activity etc.) > > Allowing a user to specify the random seed (and *always* > reporting the random seed of every test) can't hurt and it > may actually gain you repeatability in some cases. Most bugs > are typically of garden variety, not dependent on some Ah, yes if that was the case. But most of the problems I encounter are typically lock related. > complex interactions between parallel programs (or worse, on > processor heisenbugs). You can always try repeating a failing > test on a more deterministic set up like qemu etc. > Different hardware also seems to play a big role in finding bugs: Number of CPUs etc. > One trick I have used in the past is to record "significant" > events in one or more ring buffers using some cheap encoding. > You have then access to past N events during any post kernel > crash analysis. This has far less of an overhead than debug > printfs and you can even leave it enabled in production use. -- Peter Holm