From owner-freebsd-current@FreeBSD.ORG Sat Jul 10 05:18:15 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1BC8E16A4CE; Sat, 10 Jul 2004 05:18:15 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id A7AA243D39; Sat, 10 Jul 2004 05:18:14 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i6A5I7AN094955; Sat, 10 Jul 2004 01:18:07 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i6A5I686094952; Sat, 10 Jul 2004 01:18:06 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 10 Jul 2004 01:18:06 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: "Marc G. Fournier" In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Taku YAMAMOTO cc: freebsd-current@freebsd.org cc: Steve Kargl Subject: Re: Native preemption is the culprit [was Re: today's CURRENT lockups] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Jul 2004 05:18:15 -0000 On Fri, 9 Jul 2004, Robert Watson wrote: > I'm now experiencing extremely hard hangs in the following configurations: > > SMP kernel running SCHED_ULE with hyperthreads > SMP kernel running SCHED_4BSD with hyperthreads > > To generate the load, I'm using the "supersmack" benchark with the > select-key.smack query set with 30 client workers and 10,000 > transactions. I am able to reliable hang the system with one or two > runs. > > By disabling the "#define PREEMPTION" entry in param.h with SCHED_4BSD, > I'm able to complete the benchmark several times in a row without > apparent problems. However, I'll leave it running for a few more hours > and see if I didn't just "get lucky". I'll then try SCHED_ULE w/o > PREEMPTION. > > By "extremely hard" I mean that I am unable to break into the debugger > using a serial break on the serial console. I have not yet been able to > run the test on a system with easily accessible NMI but will attempt to > do so in the next few days. > > I'll give UP a spin with various combinations next. FYI, UP+SCHED_ULE with PREEMPTION hung within three seconds of starting the benchmark. Without PREEMPTION it seems to run fine. So it looks like either PREEMPTION has a problem, or it's triggering an existing problem elsewhere. If it's only one problem, it seems not to depend on either SMP/UP or the scheduler choice. If it's multiple problems, who knows :-). As the MySQL test relies on threading, we could be looking at an edge case involving threading and scheduling/preemption -- the other reports I've seen mention X11/KDE, which would also involve threading. On the other hand, it could just be load. Tomorrow I'll load up a box with non-threaded apps and see what happens. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research