Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Nov 2005 09:10:46 -0700
From:      Scott Long <scottl@samsco.org>
To:        Joerg Pulz <Joerg.Pulz@frm2.tum.de>
Cc:        stable@freebsd.org
Subject:   Re: FreeBSD-6 amr and ahd trouble
Message-ID:  <437B5A06.6060804@samsco.org>
In-Reply-To: <20051115161253.F7025@hades.admin.frm2>
References:  <20051115161253.F7025@hades.admin.frm2>

next in thread | previous in thread | raw e-mail | index | archive | help
Joerg Pulz wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> Hi guys,
> 
> I'm running an Fujitsu-Siemens Primergy RX300 dual-XEON hyperthreading 
> enabled server with an onboard LSI MegaRAID controller and an Adaptec 
> 39320A Ultra320 dual channel SCSI adapter. The LSI MegaRAID controller 
> is configured to RAID1 with two disk and one hotspare. On this array 
> FreeBSD is installed.
> Up to now, the system was running fine with FreeBSD-5.3 first and 
> FreeBSD-5.4 now.
> I tried to upgrade this beast to FreeBSD-6.0-RELEASE without success. 
> The kernel is booting and detects all devices correctly but when it 
> comes to read from the amr(4) the last thing i see is "GEOM: new disk 
> amrd0" after that the system "hangs" and its nearly impossible to scroll 
> the kernel messages up or down (Scroll lock pressed). then after a while 
> there are a lot of SCSI error messages about SCB timeouts coming from 
> the ahd(4).
> I decided to boot the old RELENG_5_4 kernel and cvsup'ed the sources to 
> RELENG_6 but i got the same results. booting from a FreeBSD-6.0-RELEASE 
> bootonly CDRom got again the same results.
> I searched google about this, and found something about a tuneable 
> sysctl/loader setting called hw.pci.do_powerstate and tried it, but the 
> same result. later i saw, that in RELENG_6 this tuneable is renamed and 
> set to 0 anyway.
> the next step was removing the Adaptec card to make sure this one is not 
> interrupting the amr(4) but the only thing that happened was the SCSI 
> error messages going away so this was not the problem.
> I decided to give CURRENT from today a try, and it was working without 
> any problems. I have tested CURRENT some steps back until i hit 700003 
> dated to "Sun Sep 18 05:12:39 2005 UTC" which is exactly the same time 
> the RELENG_6 branch was marked for 6.0-BETA5 and CURRENT was working 
> with every point i checked out from cvs. Unfortunately 6.0-BETA5 is NOT 
> working.
> I checked out the sources for 6.0-BETA4 and it is working again. So 
> somewhere between 6.0-BETA4 and 6.0-BETA5 the whole thing is broken, at 
> least for me and my hardware.
> I've seen some differences in sys/cam/cam_xpt.c, maybe these cause the 
> trouble i have, but I'm not so deep in the FreeBSD kernel code to make 
> this sure.
> 
> It would be nice if someone can take a look at this to get this fixed in 
> RELENG_6.
> Any patches to test are welcome.
> 
> regards
> Joerg
> 

This is almost certainly an interrupt routing bug.  Can you try booting 
with ACPI disabled?  Can you try building a 6.0 kernel without SMP and
the 'apic' devices?  From 5.4, can you send your system information?

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?437B5A06.6060804>