Date: Mon, 25 Nov 2002 01:26:37 -0400 (AST) From: "Marc G. Fournier" <scrappy@hub.org> To: "Moore, Eric Dean" <emoore@lsil.com> Cc: freebsd-hardware@freebsd.org, <freebsd-smp@freebsd.org>, Mike Tancsa <mike@sentex.net> Subject: SMP kernel hangs with latest MegaRAID firmware Message-ID: <20021124235548.G16724-100000@hub.org> In-Reply-To: <0E3FA95632D6D047BA649F95DAB60E57017EF2D3@EXA-ATLANTA.se.lsil.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Eric ... First and foremost, I don't believe its the CAM support directly that is breaking things, as you may have seen in my other posts ... but I do believe its related to the MegaRAID controller. When we started this, the problem was that an Oct 28th kernel would work, but an Oct 29th kernel would hang while booting ... Oct 29th was when you updated the AMR driver code, and, as I recall, the changes touched enough files that it wasn't just camifying the code ... Your suggestion was to upgrade the firmware on the controller itself, which made sense, so we schedualed it ... now, while waiting for that, I downgraded the server to RELENG_4_7, negating any work you did on the AMR driver, figuring it would give me some stability while waiting for the firmware upgrade ... On Friday, as schedualed, Rackspace upgraded the firmware on the card, at which point, all hell broke lose ... the RELENG_4_7 kernel could no longer boot up, they had to bring it up on a GENERIC kernel ... After futzing around for a period of time with the kernel configs (namely, what was different between a GENERIC kernel and my kernel), we determined that if we disable the SMP code, everything boots up great ... as soon as we enable the two options required for SMP, it hangs ... so, I added -v to /boot.config, figuring I should be able to get some better information for around the hang ... after scannin through the output a few times, I finally stumbled upon something that should have been more (or less) obvious: IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 SMP: CPU0 apic_initialize(): lint0: 0x00000700 lint1: 0x00010400 TPR: 0x00000010 SVR: 0x000001ff FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 4, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 5, version: 0x000f0011, at 0xfec01000 bios32: Found BIOS32 Service Directory header at 0xc00fdb90 bios32: Entry = 0xfdba0 (c00fdba0) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xdbc1 pnpbios: Found PnP BIOS data at 0xc00f4c50 pnpbios: Entry = f0000:3954 Rev = 1.0 Other BIOS signatures found: ACPI: 00000000 cpu1 is missing, which is why its hanging while trying to start up CPU #1 ... so, went back at Rackspace to take a look at the server, make sure that both CPUs are actually in the machine ... sure enough, they are, and the BIOs recognizes both ... but, just in case, they swap'd both CPUs out ... mptable shows: Processors: APIC ID Version State Family Model Step Flags 0 0x11 BSP, usable 6 11 1 0x383fbff 1 0x11 AP, usable 6 11 1 0x383fbff So, the machine has two CPUs in it that worked under RELENG_4_7 *before* the firmware upgrade, but fails to work after the firmware upgrade ... the operating system sees that there are, in fact, two CPUs in the machine ... So, we have two changes to the MegaRAID card/driver that have succeeded in crippling SMP ... the motherboard is a Tyan LE-T with a 1.06 BIOS on it ... there are 7 18gig drives in a RAID5 configuration on the MegaRAID card ... the server was originally setup with (and ran) with a 300W power supply, that has since been upgraded to 400W ... One person email'd me and suggested that they've seen similar with an Adaptec RAID controller when a drive was bad, but as part of the firmware upgrade, Rackspace ran a consistency check, which I would assume would pick that up ... The key thing right now, to note, is that since the firmware upgrade, neither a pre or post oct 29th SMP kernel will work, while both pre/post non-SMP does ... Right now, I'm stump'd, so if anyone else has any ideas, I'm all ears ... Thanks ... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hardware" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021124235548.G16724-100000>