From owner-freebsd-stable@FreeBSD.ORG Mon Jan 12 14:55:23 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BADE9106566B for ; Mon, 12 Jan 2009 14:55:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 943C58FC13 for ; Mon, 12 Jan 2009 14:55:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 4E19A46B0C; Mon, 12 Jan 2009 09:55:23 -0500 (EST) Date: Mon, 12 Jan 2009 14:55:23 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Garance A Drosihn In-Reply-To: Message-ID: References: <042FE04A-2F8D-47DD-8454-7BBA3791D7A8@inoc.net> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org, Pete French , Robert Blayzor Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jan 2009 14:55:24 -0000 On Fri, 9 Jan 2009, Garance A Drosihn wrote: > At 2:39 PM -0500 1/9/09, Robert Blayzor wrote: >> On Jan 8, 2009, at 8:58 PM, Pete French wrote: >>> I have a number of HP 1U servers, all of which were running 7.0 perfectly >>> happily. I have been testing 7.1 in it's various incarnations for the last >>> couple of months on our test server and it has performed perfectly. >> >> I noticed a problem with 7.0 on a couple of Dell servers. [...] We've >> since then compiled the kernel under the BSD scheduler to rule that out, >> and so far so good. >> >> Since ULE is now default in 7.1 and not in 7.0, perhaps you can try that? > > FWIW, the other guy I know who is having this problem had already switched > to using ULE under 7.0-release, and did not have any problems with it. So > *his* problem was probably not related to SCHED_ULE, unless something has > recently changed there. > > Turns out he hasn't reverted back to 7.0-release just yet, so he's going to > try SCHED_4BSD and see if that helps his situation. Scheduler changes always come with some risk of exposing bugs that have existed in the code for a long time but never really manifested themselves. ULE is well shaken-out, having been under development for at least five years, but it is possible that some problems will become visible as a result of the switch. I would encourage people to stick with ULE, but if you're having a stability problem then experimenting with scheduler as a variable that could be triggering the problem may well be useful to help track down the bug. Most of the time the bugs will not be in ULE itself, rather, triggered because ULE will change the ordering or balancing of work in the system, so we should try to avoid situations where people switch to 4BSD from ULE and stick with it rather than getting the underlying problem fixed! Robert N M Watson Computer Laboratory University of Cambridge