From owner-freebsd-stable@FreeBSD.ORG  Mon Jan 12 21:04:17 2009
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2B3531065670
	for <freebsd-stable@FreeBSD.org>; Mon, 12 Jan 2009 21:04:17 +0000 (UTC)
	(envelope-from freebsd@max.af.czu.cz)
Received: from chinook.internetservice.cz (caesar.internetservice.cz
	[217.11.237.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 921598FC1B
	for <freebsd-stable@FreeBSD.org>; Mon, 12 Jan 2009 21:04:16 +0000 (UTC)
	(envelope-from freebsd@max.af.czu.cz)
Received: (qmail 45346 invoked by uid 89); 12 Jan 2009 20:37:34 -0000
Received: from unknown (HELO ares.internetservice.cz)
	(public@chinook.internetservice.cz@217.11.239.237)
	by chinook.internetservice.cz with ESMTPA; 12 Jan 2009 20:37:34 -0000
Message-ID: <496BA9AE.10801@max.af.czu.cz>
Date: Mon, 12 Jan 2009 21:35:58 +0100
From: Tomas Randa <freebsd@max.af.czu.cz>
User-Agent: Thunderbird 2.0.0.14 (X11/20080723)
MIME-Version: 1.0
To: Garance A Drosihn <drosih@rpi.edu>
References: <E1LL6dg-0007CN-DI@dilbert.ticketswitch.com>	<042FE04A-2F8D-47DD-8454-7BBA3791D7A8@inoc.net>	<p06240802c58db5953598@[128.113.24.47]>	<alpine.BSF.2.00.0901121453200.16794@fledge.watson.org>
	<p06240808c5911644dd11@[128.113.24.47]>
In-Reply-To: <p06240808c5911644dd11@[128.113.24.47]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-stable@FreeBSD.org, Robert Watson <rwatson@FreeBSD.org>
Subject: Re: Big problems with 7.1 locking up :-(
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jan 2009 21:04:17 -0000

Hello,

I have similar problems. The last "good" kernel I have from stable 
brach, october the 8. Then in next upgrade, I saw big problems with 
performance.
I tried ULE, 4BSD etc, but nothing helps, only downgrading system back.

Now I am trying 7.1-p1 and problems are here again. Mysql is waiting a 
lot of time with status "waiting for opening table" or "waiting for 
close tables"

I have 32bit FreeBSD with PAE, 1x xeon 5420, supermicro motherboard, 
areca SATA controller. Could not be problem in "da" device for example?

Thanks Tomas Randa

Garance A Drosihn wrote:
> At 2:55 PM +0000 1/12/09, Robert Watson wrote:
>> On Fri, 9 Jan 2009, Garance A Drosihn wrote:
>>
>>> At 2:39 PM -0500 1/9/09, Robert Blayzor wrote:
>>>> On Jan 8, 2009, at 8:58 PM, Pete French wrote:
>>>>> I have a number of HP 1U servers, all of which were running 7.0 
>>>>> perfectly happily. I have been testing 7.1 in it's various 
>>>>> incarnations for the last couple of months on our test server and 
>>>>> it has performed perfectly.
>>>>
>>>> I noticed a problem with 7.0 on a couple of Dell servers.  [...] 
>>>> We've since then compiled the kernel under the BSD scheduler to 
>>>> rule that out, and so far so good.
>>>>
>>>> Since ULE is now default in 7.1 and not in 7.0, perhaps you can try 
>>>> that?
>>>
>>> FWIW, the other guy I know who is having this problem had already 
>>> switched to using ULE under 7.0-release, and did not have any 
>>> problems with it.  So *his* problem was probably not related to 
>>> SCHED_ULE, unless something has recently changed there.
>>>
>>> Turns out he hasn't reverted back to 7.0-release just yet, so he's 
>>> going to try SCHED_4BSD and see if that helps his situation.
>>
>> Scheduler changes always come with some risk of exposing bugs that 
>> have existed in the code for a long time but never really manifested 
>> themselves. ULE is well shaken-out, having been under development for 
>> at least five years, but it is possible that some problems will 
>> become visible as a result of the switch.  I would encourage people 
>> to stick with ULE, but if you're having a stability problem then 
>> experimenting with scheduler as a variable that could be triggering 
>> the problem may well be useful to help track down the bug.
>
> Just to followup on this:  My friend did switch back to a 7.1 kernel with
> SCHED_4BSD, and he still ran into problems.  The error messages weren't
> the same, but errors did happen in the same high disk-I/O situations as
> the lockup happened with SCHED_ULE.  At this point he's fallen back to
> the 7.0-kernel that he had been running (which also has SCHED_ULE), and
> all the problems have gone away.  So at the moment he's running with a
> 7.0-ish kernel and the 7.1-release userland, without the hanging 
> problems.
> So the problem is something in the kernel, but it is *NOT* the scheduler
> (at least, not in his case).
>
> He is not eager to do a whole lot of experiments to track down the
> problem, since this is happening on busy production machines and he
> can't afford to have a lot of downtime on them (especially now that the
> semester at RPI has started up).  The systems have some large (2 TB)
> filesystems on them, and the lockups occur in high disk-I/O situations.
> He's seeing the problem on one system which is a dual CPU quad-core
> xeon, and another which is a 64 bit P4 with hyperthreading.  The one
> thing in common between the two setups is that the boot drives + a
> 3ware controller (with its array of RAID disks) is moved from one
> machine to the other one:
>
>   "its a 3ware 9500 12 port model, the boot drive is connected to
>    an ICH6 in IDE mode, and yes, I've run it in single, single with
>    hyper threading, and 8 way mode.  All 64 bit."
>
> We still have no idea where the problem really is.  For all we know,
> someone spilled a Pepsi on it when he wasn't looking...
>