Date: Wed, 16 Nov 2005 09:10:46 -0700 From: Scott Long <scottl@samsco.org> To: Joerg Pulz <Joerg.Pulz@frm2.tum.de> Cc: stable@freebsd.org Subject: Re: FreeBSD-6 amr and ahd trouble Message-ID: <437B5A06.6060804@samsco.org> In-Reply-To: <20051115161253.F7025@hades.admin.frm2> References: <20051115161253.F7025@hades.admin.frm2>
next in thread | previous in thread | raw e-mail | index | archive | help
Joerg Pulz wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Hi guys, > > I'm running an Fujitsu-Siemens Primergy RX300 dual-XEON hyperthreading > enabled server with an onboard LSI MegaRAID controller and an Adaptec > 39320A Ultra320 dual channel SCSI adapter. The LSI MegaRAID controller > is configured to RAID1 with two disk and one hotspare. On this array > FreeBSD is installed. > Up to now, the system was running fine with FreeBSD-5.3 first and > FreeBSD-5.4 now. > I tried to upgrade this beast to FreeBSD-6.0-RELEASE without success. > The kernel is booting and detects all devices correctly but when it > comes to read from the amr(4) the last thing i see is "GEOM: new disk > amrd0" after that the system "hangs" and its nearly impossible to scroll > the kernel messages up or down (Scroll lock pressed). then after a while > there are a lot of SCSI error messages about SCB timeouts coming from > the ahd(4). > I decided to boot the old RELENG_5_4 kernel and cvsup'ed the sources to > RELENG_6 but i got the same results. booting from a FreeBSD-6.0-RELEASE > bootonly CDRom got again the same results. > I searched google about this, and found something about a tuneable > sysctl/loader setting called hw.pci.do_powerstate and tried it, but the > same result. later i saw, that in RELENG_6 this tuneable is renamed and > set to 0 anyway. > the next step was removing the Adaptec card to make sure this one is not > interrupting the amr(4) but the only thing that happened was the SCSI > error messages going away so this was not the problem. > I decided to give CURRENT from today a try, and it was working without > any problems. I have tested CURRENT some steps back until i hit 700003 > dated to "Sun Sep 18 05:12:39 2005 UTC" which is exactly the same time > the RELENG_6 branch was marked for 6.0-BETA5 and CURRENT was working > with every point i checked out from cvs. Unfortunately 6.0-BETA5 is NOT > working. > I checked out the sources for 6.0-BETA4 and it is working again. So > somewhere between 6.0-BETA4 and 6.0-BETA5 the whole thing is broken, at > least for me and my hardware. > I've seen some differences in sys/cam/cam_xpt.c, maybe these cause the > trouble i have, but I'm not so deep in the FreeBSD kernel code to make > this sure. > > It would be nice if someone can take a look at this to get this fixed in > RELENG_6. > Any patches to test are welcome. > > regards > Joerg > This is almost certainly an interrupt routing bug. Can you try booting with ACPI disabled? Can you try building a 6.0 kernel without SMP and the 'apic' devices? From 5.4, can you send your system information? Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?437B5A06.6060804>