From owner-freebsd-arch@FreeBSD.ORG  Sun Jan 18 20:17:45 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1253F1065670
	for <freebsd-arch@freebsd.org>; Sun, 18 Jan 2009 20:17:45 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (bitblocks.com [64.142.15.60])
	by mx1.freebsd.org (Postfix) with ESMTP id A7F4A8FC1E
	for <freebsd-arch@freebsd.org>; Sun, 18 Jan 2009 20:17:44 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1])
	by mail.bitblocks.com (Postfix) with ESMTP id 674665B61;
	Sun, 18 Jan 2009 12:12:02 -0800 (PST)
To: Peter Holm <pho@freebsd.org>
In-reply-to: Your message of "Sun, 18 Jan 2009 15:09:24 +0100."
	<20090118140924.GA27264@x2.osted.lan> 
References: <20090118082145.GA18067@x2.osted.lan> <86iqocstjm.fsf@ds4.des.no>
	<20090118131028.GA26179@x2.osted.lan>
	<20090118132819.GS48057@deviant.kiev.zoral.com.ua>
	<20090118140924.GA27264@x2.osted.lan>
Date: Sun, 18 Jan 2009 12:12:02 -0800
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20090118201202.674665B61@mail.bitblocks.com>
Cc: Kostik Belousov <kostikbel@gmail.com>, Dag-Erling Sm?rgrav <des@des.no>,
	freebsd-arch@freebsd.org
Subject: Re: stress2 is now in projects 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Jan 2009 20:17:45 -0000

On Sun, 18 Jan 2009 15:09:24 +0100 Peter Holm <pho@freebsd.org>  wrote:
> On Sun, Jan 18, 2009 at 03:28:19PM +0200, Kostik Belousov wrote:
> > On Sun, Jan 18, 2009 at 02:10:28PM +0100, Peter Holm wrote:
> > > On Sun, Jan 18, 2009 at 01:11:25PM +0100, Dag-Erling Sm?rgrav wrote:
> > > > Peter Holm <pho@freebsd.org> writes:
> > > > > The key functionality of this test suite is that it runs a random
> > > > > number of test programs for a random period, in random incarnations
> > > > > and in random sequence.
> > > > 
> > > > In other words, it's non-deterministic and non-reproducable.
> > > > 
> > > 
> > > Yes, by design.
> > > 
> > > > You should at the very least allow the user to specify the random seed.
> > > > 
> > > 
> > > Yes, it would be interesting to see if this is enough to reproduce a
> > > problem in a deterministic way. I'll look into this.
> > 
> > I shall state from my experience using it (or, rather, inspecting bug
> > reports generated by stress2), that in fact it is quite repeatable.
> > I.e., when  looking into one area, you almost always get _that_ problem,
> > together with 2-3 related issues.
> > 
> > Due to the nature of the tests and kernel undeterministic operations,
> > I think that use of the same random seed gains nothing in regard with
> > repeatability of the tests.
> 
> It is an old issue that has come up many times: It would be so great 
> if it was possible to some how record the exact sequence that lead up 
> to a panic and play it back.
> 
> But on the other hand, as you say, it *is* repeatable. The only
> issue is that it may take 5 minutes or 5 hours.
> 
> But I'm still game to see if it is possible at all (in single user 
> mode with no network activity etc.)

Allowing a user to specify the random seed (and *always*
reporting the random seed of every test) can't hurt and it
may actually gain you repeatability in some cases.  Most bugs
are typically of garden variety, not dependent on some
complex interactions between parallel programs (or worse, on
processor heisenbugs). You can always try repeating a failing
test on a more deterministic set up like qemu etc.

One trick I have used in the past is to record "significant"
events in one or more ring buffers using some cheap encoding.
You have then access to past N events during any post kernel
crash analysis.  This has far less of an overhead than debug
printfs and you can even leave it enabled in production use.