From owner-freebsd-stable@FreeBSD.ORG Wed Nov 24 03:19:46 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF87B16A4CE for ; Wed, 24 Nov 2004 03:19:46 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 64B7243D3F for ; Wed, 24 Nov 2004 03:19:46 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 5748B72DF8; Tue, 23 Nov 2004 19:19:46 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 5256072DCB; Tue, 23 Nov 2004 19:19:46 -0800 (PST) Date: Tue, 23 Nov 2004 19:19:46 -0800 (PST) From: Doug White To: Adrian Wontroba In-Reply-To: <20041115045912.A79200@titus.hanley.stade.co.uk> Message-ID: <20041123191634.K90740@carver.gumbysoft.com> References: <20041115045912.A79200@titus.hanley.stade.co.uk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: stable@freebsd.org Subject: Re: panic: APIC: Previous IPI is stuck X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2004 03:19:46 -0000 On Mon, 15 Nov 2004, Adrian Wontroba wrote: > At work, I've just taken an old cast off NT server and used it as > a replacement for an equally elderly low end PC which performs an > important monitoring task. > > I took the opportunity to upgrade to 5.3 (5.3-RC2 now, yesterday's > 5.3-STABLE when I get to work again) rather than stay on 4.10-RELEASE. > > The rationale was this would be a nice resilient machine, demonstrating > how FreeBSD can extend the useful working life of aging hardware. > > The practice is that it it has now crashed three times in a couple of > days with "panic: APIC: Previous IPI is stuck", the most recent one > dragging me out from home early in a Monday morning. Welcome to the club. This is a known problem with affects older, true 4 proc machines. Stephan Uphoff (ups@tree.com) has posted a patch to -current that seems to help. I have a Dell PE6500 (4x500MHz) I'm trying to get to duplicate the problem (and compile world without resetting) before I try the patch. (Replacing a CPU has made it happy again, thankfully) Dual proc hyperthreaded machines don't seem to be affected, or at least not as frequently. I'd suggest trying the patch and see if that helps for you. It doesn't seem to be making things worse for people :) > Over in current there are a couple of threads starting in late September > where a few people are suffering this problem. Like them, I'm using an > old (1997) Pentium Pro multiprocessor, in my case a 4 way Fujitsu M700. > > The machine is running with the SMP kernel (ie GENERIC + SMP), 4BSD > scheduler, without preemption. > > I've set kern.sched.ipiwakeup.enabled=0 and crossed my fingers. > > I'm a SMP novice. Would the machine become stable if I switched to a > non-SMP kernel? Reliability is more important than speed in this case, > and the opportunity for experimentation close to zero. Creditability > has already been damaged by the gvinum RAID5 experience (8-( > > I'm not knocking 5.3 - in all other respects it seems wonderful. > > "me too" diagnostics: > > kern.sched.name: 4BSD > kern.sched.quantum: 100000 > kern.sched.ipiwakeup.enabled: 1 > kern.sched.ipiwakeup.requested: 858129 > kern.sched.ipiwakeup.delivered: 858129 > kern.sched.ipiwakeup.usemask: 1 > kern.sched.ipiwakeup.useloop: 0 > kern.sched.ipiwakeup.onecpu: 0 > kern.sched.ipiwakeup.htt2: 0 > kern.sched.followon: 0 > kern.sched.pfollowons: 0 > kern.sched.kgfollowons: 0 > kern.sched.runq_fuzz: 1 > > ============================================================================ > > MPTable, version 2.0.15 > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0008f000 > searching CMOS 'top of mem' @ 0x0008ec00 (571K) > searching default 'top of mem' @ 0x0009fc00 (639K) > searching BIOS @ 0x000f0000 > > MP FPS found in BIOS @ physical addr: 0x000fdc30 > > ---------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000fdc30 > signature: '_MP_' > length: 16 bytes > version: 1.4 > checksum: 0x56 > mode: Virtual Wire > > ---------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0008f151 > signature: 'PCMP' > base table length: 332 > version: 1.4 > checksum: 0x05 > OEM ID: 'Fujitsu ' > Product ID: 'Pro Server ' > OEM table pointer: 0x00000000 > OEM table size: 0 > entry count: 30 > local APIC address: 0xfee00000 > extended table length: 0 > extended table checksum: 0 > > ---------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > Processors: APIC ID Version State Family Model Step > Flags > 3 0x11 BSP, usable 6 1 9 > 0xfbff > 0 0x11 AP, usable 6 1 9 > 0xfbff > 1 0x11 AP, usable 6 1 9 > 0xfbff > 2 0x11 AP, usable 6 1 9 > 0xfbff > -- > Bus: Bus ID Type > 0 PCI > 1 PCI > 2 EISA > -- > I/O APICs: APIC ID Version State Address > 8 0x11 usable 0xfec00000 > 9 0x11 usable 0xfec0c000 > -- > I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# > ExtINT active-hi edge 2 0 8 0 > INT conforms conforms 2 1 8 1 > INT conforms conforms 2 2 8 2 > INT conforms conforms 2 3 8 3 > INT conforms conforms 2 4 8 4 > INT conforms conforms 2 5 8 5 > INT conforms conforms 2 6 8 6 > INT conforms conforms 2 7 8 7 > INT conforms conforms 2 8 8 8 > INT conforms conforms 2 9 8 9 > INT conforms conforms 2 10 8 10 > INT conforms conforms 2 11 8 11 > INT conforms conforms 2 12 8 12 > INT conforms conforms 2 13 8 13 > INT conforms conforms 2 14 8 14 > INT conforms conforms 2 15 8 15 > INT active-lo level 0 1:A 9 11 > INT active-lo level 1 1:A 9 12 > INT active-lo level 1 2:A 9 12 > -- > Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# > ExtINT active-hi edge 0 0:A 255 0 > NMI active-hi edge 0 0:A 255 1 > > ---------------------------------------------------------------------------- > > dmesg output: > > Copyright (c) 1992-2004 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.3-RC2 #0: Thu Nov 4 03:48:56 GMT 2004 > > toor@xjamesfriis.:/usr/src/sys/i386/compile/JAMESFRIIS > MPTable: > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Pentium Pro (199.84-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x619 Stepping = 9 > > Features=0xfbff V> > real memory = 2147483648 (2048 MB) > avail memory = 2095947776 (1998 MB) > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > cpu0 (BSP): APIC ID: 3 > cpu1 (AP): APIC ID: 0 > cpu2 (AP): APIC ID: 1 > cpu3 (AP): APIC ID: 2 > ioapic0: Assuming intbase of 0 > ioapic1: Assuming intbase of 16 > ioapic0 irqs 0-15 on motherboard > ioapic1 irqs 16-31 on motherboard > npx0: [FAST] > npx0: on motherboard > npx0: INT 16 interface > pcib0: pcibus 0 on motherboard > pci0: on pcib0 > fxp0: port 0xfce0-0xfcff mem > 0xfe900000-0xfe9fffff,0xfe8ff000-0xfe8fffff irq 27 at device 1.0 on pci0 > miibus0: on fxp0 > ukphy0: on miibus0 > ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > fxp0: Ethernet address: 00:10:a8:00:10:d6 > pci0: at device 2.0 (no driver attached) > eisab0: at device 3.0 on pci0 > eisa0: on eisab0 > mainboard0: on eisa0 slot 0 > isa0: on eisab0 > pcib1: pcibus 1 on motherboard > pci1: on pcib1 > ahc0: port 0xf800-0xf8ff mem > 0xfceef000-0xfceeffff irq 28 at device 1.0 on pci1 > ahc0: [GIANT-LOCKED] > aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs > ahc1: port 0xf400-0xf4ff mem > 0xfceee000-0xfceeefff irq 28 at device 2.0 on pci1 > ahc1: [GIANT-LOCKED] > aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs > pci1: at device 3.0 (no driver attached) > cpu0 on motherboard > cpu1 on motherboard > cpu2 on motherboard > cpu3 on motherboard > orm0: at iomem 0xc0000-0xc7fff on isa0 > pmtimer0 on isa0 > ata0 at port 0x3f6,0x1f0-0x1f7 irq 14 on isa0 > ata1 at port 0x376,0x170-0x177 irq 15 on isa0 > atkbdc0: at port 0x64,0x60 on isa0 > atkbd0: irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > psm0: irq 12 on atkbdc0 > psm0: [GIANT-LOCKED] > psm0: model MouseMan+, device ID 0 > fdc0: at port 0x3f0-0x3f5 irq 6 drq 2 on isa0 > fdc0: [FAST] > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > ppc0: at port 0x378-0x37f irq 7 on isa0 > ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode > ppbus0: on ppc0 > plip0: on ppbus0 > lpt0: on ppbus0 > lpt0: Interrupt-driven port > ppi0: on ppbus0 > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > sio0: type 16550A > sio1 at port 0x2f8-0x2ff irq 3 on isa0 > sio1: type 16550A > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > unknown: can't assign resources (port) > unknown: can't assign resources (port) > unknown: can't assign resources (port) > unknown: can't assign resources (port) > unknown: can't assign resources (port) > Timecounters tick every 10.000 msec > Waiting 15 seconds for SCSI devices to settle > (probe6:ahc0:0:6:0): AutoSense Failed > (probe5:ahc0:0:6:1): AutoSense Failed > (probe0:ahc0:0:6:2): AutoSense Failed > (probe5:ahc0:0:6:3): AutoSense Failed > (probe5:ahc0:0:6:4): AutoSense Failed > (probe0:ahc0:0:6:5): AutoSense Failed > (probe0:ahc0:0:6:6): AutoSense Failed > (probe0:ahc0:0:6:7): AutoSense Failed > (probe21:ahc1:0:6:0): AutoSense Failed > (probe1:ahc1:0:6:1): AutoSense Failed > (probe1:ahc1:0:6:2): AutoSense Failed > (probe1:ahc1:0:6:3): AutoSense Failed > (probe1:ahc1:0:6:4): AutoSense Failed > (probe1:ahc1:0:6:5): AutoSense Failed > (probe1:ahc1:0:6:6): AutoSense Failed > (probe1:ahc1:0:6:7): AutoSense Failed > sa0 at ahc0 bus 0 target 4 lun 0 > sa0: Removable Sequential Access SCSI-2 device > sa0: 10.000MB/s transfers (10.000MHz, offset 15) > ses0 at ahc0 bus 0 target 6 lun 0 > ses0: Fixed Processor SCSI-2 device > ses0: 3.300MB/s transfers > ses0: SAF-TE Compliant Device > ses1 at ahc1 bus 0 target 6 lun 0 > ses1: Fixed Processor SCSI-2 device > ses1: 3.300MB/s transfers > ses1: SAF-TE Compliant Device > da0 at ahc0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-2 device > da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > da1 at ahc0 bus 0 target 1 lun 0 > da1: Fixed Direct Access SCSI-2 device > da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da1: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > da2 at ahc0 bus 0 target 2 lun 0 > da2: Fixed Direct Access SCSI-2 device > da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da2: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > da3 at ahc1 bus 0 target 0 lun 0 > da3: Fixed Direct Access SCSI-2 device > da3: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da3: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > da4 at ahc1 bus 0 target 1 lun 0 > da4: Fixed Direct Access SCSI-2 device > da4: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da4: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) > da5 at ahc1 bus 0 target 2 lun 0 > da5: Fixed Direct Access SCSI-2 device > da5: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing > Enabled > da5: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C) > cd0 at ahc0 bus 0 target 5 lun 0 > cd0: Removable CD-ROM SCSI-2 device > cd0: 10.000MB/s transfers (10.000MHz, offset 15) > cd0: Attempt to query device size failed: NOT READY, Medium not present > GEOM_MIRROR: Device mirror0 created (id=138753045). > GEOM_MIRROR: Device mirror0: provider da0 detected. > GEOM_CONCAT: Device usr2 created (id=1051984440). > GEOM_CONCAT: Disk da1 attached to usr2. > GEOM_CONCAT: Disk da2 attached to usr2. > GEOM_MIRROR: Device mirror0: provider da3 detected. > GEOM_MIRROR: Device mirror0: provider da3 activated. > GEOM_MIRROR: Device mirror0: provider mirror/mirror0 launched. > GEOM_MIRROR: Device mirror0: rebuilding provider da0. > GEOM_CONCAT: Disk da4 attached to usr2. > GEOM_CONCAT: Disk da5 attached to usr2. > GEOM_CONCAT: Device usr2 activated. > SMP: AP CPU #3 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Mounting root from ufs:/dev/mirror/mirror0a > WARNING: / was not properly dismounted > WARNING: /var was not properly dismounted > WARNING: /usr was not properly dismounted > /usr: mount pending error: blocks 4 files 2 > WARNING: /usr2 was not properly dismounted > > -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org