From owner-freebsd-stable@FreeBSD.ORG Wed Nov 24 05:11:47 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D807316A4CE for ; Wed, 24 Nov 2004 05:11:47 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 2CDD043D41 for ; Wed, 24 Nov 2004 05:11:47 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 22090 invoked by uid 89); 24 Nov 2004 05:11:23 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 24 Nov 2004 05:11:23 -0000 Received: (qmail 22064 invoked by uid 89); 24 Nov 2004 05:11:22 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 24 Nov 2004 05:11:22 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id iAO5BM5R049315; Wed, 24 Nov 2004 00:11:22 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Doug White In-Reply-To: <20041123191634.K90740@carver.gumbysoft.com> References: <20041115045912.A79200@titus.hanley.stade.co.uk> <20041123191634.K90740@carver.gumbysoft.com> Content-Type: text/plain Message-Id: <1101273082.48967.57.camel@palm.tree.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Wed, 24 Nov 2004 00:11:22 -0500 Content-Transfer-Encoding: 7bit cc: stable@freebsd.org Subject: Re: panic: APIC: Previous IPI is stuck X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2004 05:11:48 -0000 On Tue, 2004-11-23 at 22:19, Doug White wrote: > On Mon, 15 Nov 2004, Adrian Wontroba wrote: > > > At work, I've just taken an old cast off NT server and used it as > > a replacement for an equally elderly low end PC which performs an > > important monitoring task. > > > > I took the opportunity to upgrade to 5.3 (5.3-RC2 now, yesterday's > > 5.3-STABLE when I get to work again) rather than stay on 4.10-RELEASE. > > > > The rationale was this would be a nice resilient machine, demonstrating > > how FreeBSD can extend the useful working life of aging hardware. > > > > The practice is that it it has now crashed three times in a couple of > > days with "panic: APIC: Previous IPI is stuck", the most recent one > > dragging me out from home early in a Monday morning. > > Welcome to the club. This is a known problem with affects older, true 4 > proc machines. Stephan Uphoff (ups@tree.com) has posted a patch to > -current that seems to help. I have a Dell PE6500 (4x500MHz) I'm trying to > get to duplicate the problem (and compile world without resetting) before > I try the patch. (Replacing a CPU has made it happy again, thankfully) > > Dual proc hyperthreaded machines don't seem to be affected, or at least > not as frequently. > > I'd suggest trying the patch and see if that helps for you. It doesn't > seem to be making things worse for people :) The patch has a few testers and no "APIC: Previous IPI is stuck" panics have been reported. Hopefully I will be able to get a new patch out the next days that will be optimized. Once the new patch got some testing it will go into current. ( And hopefully I can MFC it later) > > Over in current there are a couple of threads starting in late September > > where a few people are suffering this problem. Like them, I'm using an > > old (1997) Pentium Pro multiprocessor, in my case a 4 way Fujitsu M700. > > > > The machine is running with the SMP kernel (ie GENERIC + SMP), 4BSD > > scheduler, without preemption. > > > > I've set kern.sched.ipiwakeup.enabled=0 and crossed my fingers. > > > > I'm a SMP novice. Would the machine become stable if I switched to a > > non-SMP kernel? Reliability is more important than speed in this case, > > and the opportunity for experimentation close to zero. Creditability > > has already been damaged by the gvinum RAID5 experience (8-( > > > > I'm not knocking 5.3 - in all other respects it seems wonderful. > > > > "me too" diagnostics: > > > > kern.sched.name: 4BSD > > kern.sched.quantum: 100000 > > kern.sched.ipiwakeup.enabled: 1 > > kern.sched.ipiwakeup.requested: 858129 > > kern.sched.ipiwakeup.delivered: 858129 > > kern.sched.ipiwakeup.usemask: 1 > > kern.sched.ipiwakeup.useloop: 0 > > kern.sched.ipiwakeup.onecpu: 0 > > kern.sched.ipiwakeup.htt2: 0 > > kern.sched.followon: 0 > > kern.sched.pfollowons: 0 > > kern.sched.kgfollowons: 0 > > kern.sched.runq_fuzz: 1 > > > > ============================================================================ > > > > MPTable, version 2.0.15 > > > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0008f000 > > searching CMOS 'top of mem' @ 0x0008ec00 (571K) > > searching default 'top of mem' @ 0x0009fc00 (639K) > > searching BIOS @ 0x000f0000 > > > > MP FPS found in BIOS @ physical addr: 0x000fdc30 > > > > ---------------------------------------------------------------------------- > > > > MP Floating Pointer Structure: > > > > location: BIOS > > physical address: 0x000fdc30 > > signature: '_MP_' > > length: 16 bytes > > version: 1.4 > > checksum: 0x56 > > mode: Virtual Wire > > > > ---------------------------------------------------------------------------- > > > > MP Config Table Header: > > > > physical address: 0x0008f151 > > signature: 'PCMP' > > base table length: 332 > > version: 1.4 > > checksum: 0x05 > > OEM ID: 'Fujitsu ' > > Product ID: 'Pro Server ' > > OEM table pointer: 0x00000000 > > OEM table size: 0 > > entry count: 30 > > local APIC address: 0xfee00000 > > extended table length: 0 > > extended table checksum: 0 > > > > ---------------------------------------------------------------------------- > > > > MP Config Base Table Entries: > > > > -- > > Processors: APIC ID Version State Family Model Step > > Flags > > 3 0x11 BSP, usable 6 1 9 > > 0xfbff > > 0 0x11 AP, usable 6 1 9 > > 0xfbff > > 1 0x11 AP, usable 6 1 9 > > 0xfbff > > 2 0x11 AP, usable 6 1 9 > > 0xfbff > > -- > > Bus: Bus ID Type > > 0 PCI > > 1 PCI > > 2 EISA > > -- > > I/O APICs: APIC ID Version State Address > > 8 0x11 usable 0xfec00000 > > 9 0x11 usable 0xfec0c000 > > -- > > I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# > > ExtINT active-hi edge 2 0 8 0 > > INT conforms conforms 2 1 8 1 > > INT conforms conforms 2 2 8 2 > > INT conforms conforms 2 3 8 3 > > INT conforms conforms 2 4 8 4 > > INT conforms conforms 2 5 8 5 > > INT conforms conforms 2 6 8 6 > > INT conforms conforms 2 7 8 7 > > INT conforms conforms 2 8 8 8 > > INT conforms conforms 2 9 8 9 > > INT conforms conforms 2 10 8 10 > > INT conforms conforms 2 11 8 11 > > INT conforms conforms 2 12 8 12 > > INT conforms conforms 2 13 8 13 > > INT conforms conforms 2 14 8 14 > > INT conforms conforms 2 15 8 15 > > INT active-lo level 0 1:A 9 11 > > INT active-lo level 1 1:A 9 12 > > INT active-lo level 1 2:A 9 12 > > -- > > Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# > > ExtINT active-hi edge 0 0:A 255 0 > > NMI active-hi edge 0 0:A 255 1 > > > > ---------------------------------------------------------------------------- > > > > dmesg output: > > > > Copyright (c) 1992-2004 The FreeBSD Project. > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > > The Regents of the University of California. All rights reserved. > > FreeBSD 5.3-RC2 #0: Thu Nov 4 03:48:56 GMT 2004 > > > > toor@xjamesfriis.:/usr/src/sys/i386/compile/JAMESFRIIS > > MPTable: > > Timecounter "i8254" frequency 1193182 Hz quality 0 > > CPU: Pentium Pro (199.84-MHz 686-class CPU) > > Origin = "GenuineIntel" Id = 0x619 Stepping = 9 > > > > Features=0xfbff > V> > > real memory = 2147483648 (2048 MB) > > avail memory = 2095947776 (1998 MB) > > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > > cpu0 (BSP): APIC ID: 3 > > cpu1 (AP): APIC ID: 0 > > cpu2 (AP): APIC ID: 1 > > cpu3 (AP): APIC ID: 2 > > ioapic0: Assuming intbase of 0 > > ioapic1: Assuming intbase of 16 > > ioapic0 irqs 0-15 on motherboard > > ioapic1 irqs 16-31 on motherboard > > npx0: [FAST] > > npx0: on motherboard > > npx0: INT 16 interface > > pcib0: pcibus 0 on motherboard > > pci0: on pcib0 > > fxp0: port 0xfce0-0xfcff mem > > 0xfe900000-0xfe9fffff,0xfe8ff000-0xfe8fffff irq 27 at device 1.0 on pci0 > > miibus0: on fxp0 > > ukphy0: on miibus0 > > ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > > fxp0: Ethernet address: 00:10:a8:00:10:d6 > > pci0: at device 2.0 (no driver attached) > > eisab0: at device 3.0 on pci0 > > eisa0: on eisab0 > > mainboard0: on eisa0 slot 0 > > isa0: on eisab0 > > pcib1: pcibus 1 on motherboard > > pci1: on pcib1 > > ahc0: port 0xf800-0xf8ff mem > > 0xfceef000-0xfceeffff irq 28 at device 1.0 on pci1 > > ahc0: [GIANT-LOCKED] > > aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs > > ahc1: port 0xf400-0xf4ff mem > > 0xfceee000-0xfceeefff irq 28 at device 2.0 on pci1 > > ahc1: [GIANT-LOCKED] > > aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs > > pci1: at device 3.0 (no driver attached) > > cpu0 on motherboard > > cpu1 on motherboard > > cpu2 on motherboard > > cpu3 on motherboard > > orm0: at iomem 0xc0000-0xc7fff on isa0 > > pmtimer0 on isa0 > > ata0 at port 0x3f6,0x1f0-0x1f7 irq 14 on isa0 > > ata1 at port 0x376,0x170-0x177 irq 15 on isa0 > > atkbdc0: at port 0x64,0x60 on isa0 > > atkbd0: irq 1 on atkbdc0 > > kbd0 at atkbd0 > > atkbd0: [GIANT-LOCKED] > > psm0: irq 12 on atkbdc0 > > psm0: [GIANT-LOCKED] > > psm0: model MouseMan+, device ID 0 > > fdc0: at port 0x3f0-0x3f5 irq 6 drq 2 on isa0 > > fdc0: [FAST] > > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > > ppc0: at port 0x378-0x37f irq 7 on isa0 > > ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode > > ppbus0: on ppc0 > > plip0: on ppbus0 > > lpt0: on ppbus0 > > lpt0: Interrupt-driven port > > ppi0: on ppbus0 > > sc0: at flags 0x100 on isa0 > > sc0: VGA <16 virtual consoles, flags=0x300> > > sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > > sio0: type 16550A > > sio1 at port 0x2f8-0x2ff irq 3 on isa0 > > sio1: type 16550A > > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > > unknown: can't assign resources (port) > > unknown: can't assign resources (port) > > unknown: can't assign resources (port) > > unknown: can't assign resources (port) > > unknown: can't assign resources (port) > > Timecounters tick every 10.000 msec > > Waiting 15 seconds for SCSI devices to settle > > (probe6:ahc0:0:6:0): AutoSense Failed > > (probe5:ahc0:0:6:1): AutoSense Failed > > (probe0:ahc0:0:6:2): AutoSense Failed > > (probe5:ahc0:0:6:3): AutoSense Failed > > (probe5:ahc0:0:6:4): AutoSense Failed > > (probe0:ahc0:0:6:5): AutoSense Failed > > (probe0:ahc0:0:6:6): AutoSense Failed > > (probe0:ahc0:0:6:7): AutoSense Failed > > (probe21:ahc1:0:6:0): AutoSense Failed > > (probe1:ahc1:0:6:1): AutoSense Failed > > (probe1:ahc1:0:6:2): AutoSense Failed > > (probe1:ahc1:0:6:3): AutoSense Failed > > (probe1:ahc1:0:6:4): AutoSense Failed > > (probe1:ahc1:0:6:5): AutoSense Failed > > (probe1:ahc1:0:6:6): AutoSense Failed > > (probe1:ahc1:0:6:7): AutoSense Failed > > sa0 at ahc0 bus 0 target 4 lun 0 > > sa0: Removable Sequential Access SCSI-2 device > > sa0: 10.000MB/s transfers (10.000MHz, offset 15) > > ses0 at ahc0 bus 0 target 6 lun 0 > > ses0: Fixed Processor SCSI-2 device > > ses0: 3.300MB/s transfers > > ses0: SAF-TE Compliant Device > > ses1 at ahc1 bus 0 target 6 lun 0 > > ses1: Fixed Processor SCSI-2 device > > ses1: 3.300MB/s transfers > > ses1: SAF-TE Compliant Device > > da0 at ahc0 bus 0 target 0 lun 0 > > da0: Fixed Direct Access SCSI-2 device > > da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > > da1 at ahc0 bus 0 target 1 lun 0 > > da1: Fixed Direct Access SCSI-2 device > > da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da1: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > > da2 at ahc0 bus 0 target 2 lun 0 > > da2: Fixed Direct Access SCSI-2 device > > da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da2: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > > da3 at ahc1 bus 0 target 0 lun 0 > > da3: Fixed Direct Access SCSI-2 device > > da3: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da3: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > > da4 at ahc1 bus 0 target 1 lun 0 > > da4: Fixed Direct Access SCSI-2 device > > da4: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da4: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > > da5 at ahc1 bus 0 target 2 lun 0 > > da5: Fixed Direct Access SCSI-2 device > > da5: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > > Enabled > > da5: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C) > > cd0 at ahc0 bus 0 target 5 lun 0 > > cd0: Removable CD-ROM SCSI-2 device > > cd0: 10.000MB/s transfers (10.000MHz, offset 15) > > cd0: Attempt to query device size failed: NOT READY, Medium not present > > GEOM_MIRROR: Device mirror0 created (id=138753045). > > GEOM_MIRROR: Device mirror0: provider da0 detected. > > GEOM_CONCAT: Device usr2 created (id=1051984440). > > GEOM_CONCAT: Disk da1 attached to usr2. > > GEOM_CONCAT: Disk da2 attached to usr2. > > GEOM_MIRROR: Device mirror0: provider da3 detected. > > GEOM_MIRROR: Device mirror0: provider da3 activated. > > GEOM_MIRROR: Device mirror0: provider mirror/mirror0 launched. > > GEOM_MIRROR: Device mirror0: rebuilding provider da0. > > GEOM_CONCAT: Disk da4 attached to usr2. > > GEOM_CONCAT: Disk da5 attached to usr2. > > GEOM_CONCAT: Device usr2 activated. > > SMP: AP CPU #3 Launched! > > SMP: AP CPU #1 Launched! > > SMP: AP CPU #2 Launched! > > Mounting root from ufs:/dev/mirror/mirror0a > > WARNING: / was not properly dismounted > > WARNING: /var was not properly dismounted > > WARNING: /usr was not properly dismounted > > /usr: mount pending error: blocks 4 files 2 > > WARNING: /usr2 was not properly dismounted > > > >