From owner-freebsd-questions@FreeBSD.ORG  Tue Jun  5 22:38:52 2007
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: questions@freebsd.org
Delivered-To: freebsd-questions@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F198116A400
	for <questions@freebsd.org>; Tue,  5 Jun 2007 22:38:52 +0000 (UTC)
	(envelope-from youshi10@u.washington.edu)
Received: from mxout5.cac.washington.edu (mxout5.cac.washington.edu
	[140.142.32.135])
	by mx1.freebsd.org (Postfix) with ESMTP id D7FE613C46E
	for <questions@freebsd.org>; Tue,  5 Jun 2007 22:38:52 +0000 (UTC)
	(envelope-from youshi10@u.washington.edu)
Received: from hymn09.u.washington.edu (hymn09.u.washington.edu
	[140.142.12.183])
	by mxout5.cac.washington.edu (8.13.7+UW06.06/8.13.7+UW07.05) with ESMTP
	id l55McqPD012469
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 5 Jun 2007 15:38:52 -0700
Received: from localhost (localhost [127.0.0.1])
	by hymn09.u.washington.edu (8.13.7+UW06.06/8.13.7+UW07.03) with ESMTP
	id l55McqWw030984; Tue, 5 Jun 2007 15:38:52 -0700
X-Auth-Received: from [192.55.52.3] by hymn09.u.washington.edu via HTTP;
	Tue, 05 Jun 2007 15:38:51 PDT
Date: Tue, 5 Jun 2007 15:38:51 -0700 (PDT)
From: youshi10@u.washington.edu
To: "N. Harrington" <drumslayer2@yahoo.com>
In-Reply-To: <362995.35822.qm@web34505.mail.mud.yahoo.com>
Message-ID: <Pine.LNX.4.43.0706051538510.27212@hymn09.u.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-PMX-Version: 5.3.1.294258, Antispam-Engine: 2.5.1.298604,
	Antispam-Data: 2007.6.5.151733
X-Uwash-Spam: Gauge=IIIIIII, Probability=7%, Report='NO_REAL_NAME 0,
	__C230066_P2 0, __CT 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0,
	__MIME_TEXT_ONLY 0, __MIME_VERSION 0, __SANE_MSGID 0'
Cc: questions@freebsd.org
Subject: Re: How to solve mysterious system lockups?
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jun 2007 22:38:53 -0000

On Tue, 5 Jun 2007, N. Harrington wrote:

>
> --- Garrett Cooper <youshi10@u.washington.edu> wrote:
>
>> N. Harrington wrote:
>>> --- Garrett Cooper <youshi10@u.washington.edu>
>> wrote:
>>>
>>>
>>>> N. Harrington wrote:
>>>>
>>>>> Hello
>>>>>   I have several systems that are used as squid
>>>>> caching servers. I have some systems that use
>> SCSI
>>>>> disks and some  that use SATA disks. They are
>>>>> identical in everyway except for the sata vs
>> SCSI
>>>>> drives.
>>>>>
>>>>>  At random times, the sata based systems seem to
>>>>>
>>>> be
>>>>
>>>>> freezing. You can ping them and they respond,
>> but
>>>>>
>>>> you
>>>>
>>>>> cannot log in. Nor are any logs processed during
>>>>>
>>>> that
>>>>
>>>>> time.
>>>>>
>>>>>  I figure it mist be something to do with the
>>>>>
>>>> disks,
>>>>
>>>>> but I am not sure how to solve it. There seems
>> to
>>>>>
>>>> be
>>>>
>>>>> little rhyme or reason. It does not happen
>>>>>
>>>> necessarily
>>>>
>>>>> during busy times. It can happen in the middle
>> of
>>>>>
>>>> the
>>>>
>>>>> night.
>>>>>
>>>>>  Any pointers in how to track down the cause
>> would
>>>>>
>>>> be
>>>>
>>>>> much appreciated.
>>>>>
>>>>>  Tyan S2881 Motherboard - 4gigs mem
>>>>>  Using 4 SATA (or scsi) drives
>>>>>  FreeBSD amd64 6.2-STABLE.
>>>>>
>>>>>  Thanks!
>>>>>
>>>>>   Nicole
>>>>>
>>>>>
>>>> Nicole,
>>>>     What's the driver in use for the SATA and the
>>>> SCSI drives?
>>>> -Garrett
>>>>
>>>
>>>  Hi Garret
>>>  Here is the driver info.
>>>
>>> -- SATA
>>>
>>> atapci0: <SiI 3114 SATA150 controller> port
>>>
>>
> 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f
>>>
>>> mem
>>> 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3
>>> ata2: <ATA channel 0> on atapci0
>>> ata3: <ATA channel 1> on atapci0
>>> ata4: <ATA channel 2> on atapci0
>>> ata5: <ATA channel 3> on atapci0
>>> pci3: <display, VGA> at device 6.0 (no driver
>>> attached)
>>> isab0: <PCI-ISA bridge> at device 7.0 on pci0
>>> isa0: <ISA bus> on isab0
>>> atapci1: <AMD 8111 UDMA133 controller> port
>>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf
>> at
>>> device 7.1 on pci0
>>> ata0: <ATA channel 0> on atapci1
>>> ata1: <ATA channel 1> on atapci1
>>> pci0: <serial bus, SMBus> at device 7.2 (no driver
>>> attached)
>>> pci0: <bridge> at device 7.3 (no driver attached)
>>> pcib2: <ACPI PCI-PCI bridge> at device 10.0 on
>> pci0
>>> pci2: <ACPI PCI bus> on pcib2
>>>
>>> -- SCSI
>>>
>>> ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port
>>
>>> 0x8000-0x80ff,0x7800-0x78ff
>>> mem 0xfc89c000-0xfc89dfff irq 24 at device 10.0 on
>>> pci2
>>> ahd0: [GIANT-LOCKED]
>>> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X
>>> 67-100Mhz, 512 SCBs
>>> ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port
>>
>>> 0x8800-0x88ff,0x8400-0x84ff
>>> mem 0xfc89e000-0xfc89ffff irq 25 at device 10.1 on
>>> pci2
>>> ahd1: [GIANT-LOCKED]
>>> aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X
>>> 67-100Mhz, 512 SCBs
>>> pci0: <base peripheral, interrupt controller> at
>>> device 10.1 (no driver attached)
>>> pcib3: <ACPI PCI-PCI bridge> at device 11.0 on
>> pci0
>>> pci1: <ACPI PCI bus> on pcib3
>>> pci0: <base peripheral, interrupt controller> at
>>> device 11.1 (no driver attached)
>>>
>>>
>>>
>>>  Thanks!
>>>
>>>   Nicole
>> Ok, so it's an AMD 8111 northbridge versus an
>> Adaptec onboard SCSI
>> controller.
>>
>> 1. What release / version of FreeBSD are you using?
>> You should upgrade
>> to 6.2 STABLE because there have been a variety of
>> issues worked out in previous releases.
>
> I have a range of Versions from 6.1-Pre to 6.2-STABLE
> as of a few months ago.
>
>> 2. Do you have any logs for activity during the
>> hours when it locks up
>> (in particular anything interesting / fishy popping
>> up)?
>
> Nope. That would make it too easy :)
> They commit suicide without a note.
>
>> 3. What scheduler are you using? 4BSD, ULE?
>
> 4BSD
>
>> 4. Does your machine (using the SATA controllers)
>> lock up under heavy
>> load? If so, you may have a northbridge cooling
>> issue that you need to
>> put a fan on. For instance, the motherboard that I
>> was using for a while
>> (ASUS P5N-E SLI) was really close to my CPU
>> heatsink, and there was a
>> lot of heat transfer between my northbridge and CPU
>> heatsink, which was
>> raising the onboard temperatures 5~10 degrees C. The
>> new motherboard
>> (ASUS P5B DLX) doesn't do that though.
>
> The lockups seem rather random. I have healthd
> running and they never seem to show very warm. The
> room is cold and the servers have great fans. Altho
> healthd can seem wonky as the cpu temp has actually
> gone below the minimum. Also the -2Volt line seems
> very low. But some servers runs forever that way.
>
> At least with SCSI, since it seems to manage itself
> as another layer away from the system, you get some
> error messages. Sort of like windows 3.1 dropping to
> dos. Verses sata issues where it's just blue screen of
> death but without even some debugging code.
>
> I am going to try the patch chuck Swiger sent me and
> see how that effects things. Also try a few
> replacement sata cards. Altho that is always fun
> especially in 1U servers. As well as seeing if using
> SAS drives may help if I can find some cheap enough.
> Do you think that using the ULE scheduler could
> really help?

Don't try it in 6.x. It's not stable by any means.

7-CURRENT's getting a lot closer though, especially as of late (past week)..

-Garrett