From owner-freebsd-current Wed Jan 15 9:32:32 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 29AD137B401; Wed, 15 Jan 2003 09:32:29 -0800 (PST) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9E3E43F65; Wed, 15 Jan 2003 09:32:28 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 65A372A8A0; Wed, 15 Jan 2003 09:32:23 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: John Baldwin Cc: "Cagle, John (ISS-Houston)" , freebsd-current@freebsd.org Subject: Re: SMP hang at boot on Compaq Proliant ML370 In-Reply-To: Date: Wed, 15 Jan 2003 09:32:23 -0800 From: Peter Wemm Message-Id: <20030115173223.65A372A8A0@canning.wemm.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG John Baldwin wrote: > > On 15-Jan-2003 Cagle, John (ISS-Houston) wrote: > > That's a vicious rumor -- no operating system could work without clock > > interrupts in SMP mode... Could it? > > I think it had something to do more with the RTC interrupt not being > routed to the I/O APIC and FreeBSD currently can only handle the ISA > timer interrupt being only rounted to the 8259A PICs. Thus, FreeBSD > hangs because it doesn't get any interrupts from the RTC via the I/O > APIC. I couldn't remmeber the exact details earlier hence the vagueness > of my message. The same problem happens with a newer DL380 that we have here. G3 I think, but I can never keep track of all the new toys. :-) Backgrounder: There are two sources of clock in i386 systems. The 8254 timer and the RTC (real time clock). FreeBSD uses both. Linux and Windows use the 8254 primarily, but Linux can be configured to use the RTC as well via the /dev/ rtc device and a couple of ioctl() calls. Linux normally wouldn't notice if the RTC device wasn't generating interrupts. vmware might though as it uses /dev/rtc. In uniprocessor mode, both of these are used via the 8259 interrupt controller (PIC). But in SMP mode, neither the 8254 or RTC originated interrupts show up on the APIC on their designated interrupt inputs. I really do not know why. From what I have seen of the chipset specs (Serverworks CSB5 - Champion South Bridge 5), the RTC interrupts are delivered to a single piece of silicon via serial interrupt protocol along with the floppy controller and COM ports etc. It is supposedly an all-or-nothing deal. Meanwhile the 8254, 8259 and IO APIC are all on the same silicon. There shouldn't be anything that can't be connected. So all I can think of is a hardware quirk in the CSB5 (Older rev 1.x in the system I was playing with that isn't working, not CSB5 v2.0), a BIOS programming issue (not setting up the interrupt routing correctly), or a bug in FreeBSD. I'm 99.9% convinced that it is not the latter. I've looked at register dumps of the CSB5 and I can't see anything obvious that is wrong there. :-( I have a brutal workaround (use a single 8254 clock and simulate the RTC clock), but it breaks some things (eg: high res profiling). I really dont like it, and I'm working on a different possibility as well (keep the 8259 PIC alive and use it in ExtInt mode directly on LINT0 on the BSP, but this is nastier than it sounds). I'd really like to know if Linux can generate RTC PIE interrupts via the IO APIC on this hardware. I'm 99.9999% sure that it wont work either. I should try it, it would be nice to know for sure if it was a hardware/firmware bug. > > Which generation of the ML370 is having this problem? I had a similar > > problem on another box that was corrected with a newer BIOS version. > > > > Thanks, > > John > > > >> -----Original Message----- > >> From: John Baldwin [mailto:jhb@FreeBSD.org] > >> Sent: Wednesday, January 15, 2003 10:55 AM > >> To: Nicolas Kowalski > >> Cc: freebsd-current@FreeBSD.org > >> Subject: Re: SMP hang at boot on Compaq Proliant ML370 > >> > >> > >> > >> On 15-Jan-2003 Nicolas Kowalski wrote: > >> > phk@FreeBSD.ORG writes: > >> > > >> >> I had a Compaq visit my lab recently. Unless the aic driver were > >> >> removed from the kernel (disabling it might have worked > >> too) it would > >> >> screw up the floppy driver. > >> >> > >> >> This sounds like black magic, but the explanation is that the aic > >> >> driver has a very intrusive probe routine which sticks > >> random bytes > >> >> into whatever I/O locations it feels like and this > >> appearantly is not > >> >> liked by certain machines. > >> >> > >> >> Compaqs with all their bells and whistles could be particular > >> >> sensitive to this, so try to disable the aic driver and see if it > >> >> helps. > >> > > >> > I tried this. Now there is only the ahc driver (the only > >> one needed) > >> > compiled in the kernel but this does not help, the server > >> still hangs. > >> > I also removed the ata driver, without success. :-( > >> > >> Is the ML370 a new box? I've heard rumors recently that one > >> of the recent Compaq boxes effectively doesn't generate clock > >> interrupts in SMP mode and there isn't a workaround for that > >> at the moment. > >> > >> -- > >> > >> John Baldwin <>< > >> http://www.FreeBSD.org/~jhb/ "Power Users Use > the Power to > >> Serve!" - http://www.FreeBSD.org/ Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message