Date: Wed, 5 Sep 2001 13:30:36 +0200 (CEST) From: "Hartmann, O." <ohartman@klima.physik.uni-mainz.de> To: <freebsd-smp@freebsd.org> Cc: <freebsd-stable@freebsd.org> Subject: Spontanous reboot on SMP system FBSD 4.4-RC Message-ID: <20010905130056.K26477-100000@klima.physik.uni-mainz.de>
next in thread | raw e-mail | index | archive | help
Dear Sirs. Again, I have a long time not seen problem on one of our SMP machines. We have four servers with dual Intel CPUs around here and one machine is euqipted with a TYAN 2500 (ServerWorks IIIHE chipset) mainboard, Slot 1, dual 866 PIII, 2GB ECC RAM. In the early time of FBSD 4.3 I have had this problem, too. But it seemed to me that after a while the problem has gone away. At this moment the servers run FBSD 4.4-RC, cvsupdated three days ago (and for that with a recent system running). This server runs a 4 channel AMI Enterprise 1600 RAID controller with over 240 GB hard disk space. Another server is running the 2 channel version of this controller in a 32 PCI slot - without a problem. So earlier responses of my serious problem have targeted the AMI controller - but I think it isn't. The machine is an a big cabinet with two redundant 300W quality power supplies and a lot of fans for cooling. Internal temperature is never over 38 degrees Celsius, the server's room is air conditioned. So I'm sure that no environmental problems (e.g heat) is the problem. The kernel of this system is configured 'normaly' execept that I use the ISA option 'options AUTO_EOI_1' The further option AUTO_EOI_2 works also, but only for a while and the server could be forced to reboot sponatnous very likely by using this option. I use AUTO_EOI_1 due the fact I was told that this option increases performance (?). On all other system (one machine is a dual PII 350/GigaByte GA686BXD, one a dual 600MHz KATMAI on a ASUS P2B-D and one is a dual 800EB PIII on a ASUS CUV4X-D) the option AUTO_EOI_1 works fine and these systems never have had these spontanous reboots. The phenomenon is that the rebooting machine never reboots on heavy load or a while after beeing under heavy load. I suspected faulty hardware for the problem, but I never tracked down those components. This machine is used as a computational system for numeric solutions and in addition as a NFS Server. The longest uptime without a reboot was three weeks. The reason why the machine gets so often rebooted is because we do very often cvsupdates, almost every day. For several campaigns the duty cycle is one week and for that the system ran stable in the meanwhile, since yesterday. After one day uptime it has a spontanous reboot at 10 o'clock in the morningtime and this is a time it is not under load and no cron jobs are cycled. This problem is very, very serious to me due the fact I can not rely on this machine when we start a campaign next month in which we need this machine for several numerical simulations (small ones, but they run a long time). In the past FreeBSD has very often been the target of sponatnous reboots on several hardware platforms, as I remember and ServerWorks chipsets seemed to be candidates. But expensive hardware should be more stable than cheaper hardware, in my opinion (that's the reason why we spent a lot of money for those systems). In addition, I send you the dmesg output of the running kernel and a mptable output. Hope someone can help me a little bit. Many thanks in advance, Oliver ------------------------- - dmesg output - ------------------------- Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.4-RC #168: Wed Sep 5 01:22:09 CEST 2001 root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (868.57-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> real memory = 2147483648 (2097152K bytes) avail memory = 2087907328 (2038972K bytes) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 Preloaded elf kernel "kernel" at 0xc03a0000. Pentium Pro MTRR support enabled Using $PIR table, 12 entries at 0xc00fdf00 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Host to PCI bridge> on motherboard IOAPIC #1 intpin 13 -> irq 2 IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 7 -> irq 17 pci0: <PCI bus> on pcib0 pcib3: <PCI to PCI bridge (vendor=1166 device=0005)> at device 0.1 on pci0 IOAPIC #1 intpin 1 -> irq 18 pci1: <PCI bus> on pcib3 pci1: <NVidia Riva TNT2 graphics accelerator> at 0.0 irq 18 sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 sym0: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 sym1: Symbios NVRAM, ID 7, Fast-40, SE, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. pcib5: <DEC 21154 PCI-PCI bridge> at device 3.0 on pci0 IOAPIC #1 intpin 2 -> irq 19 pci2: <PCI bus> on pcib5 pcib6: <DEC 21154 PCI-PCI bridge> at device 0.0 on pci2 IOAPIC #1 intpin 0 -> irq 20 pci3: <PCI bus> on pcib6 amr0: <AMI MegaRAID> mem 0xf4000000-0xf7ffffff irq 20 at device 0.0 on pci3 amr0: <Series 471 40 Logical Drive Firmware> Firmware A159, BIOS 3.11, 64MB RAM pci2: <unknown card> (vendor=0x1077, dev=0x1216) at 1.0 irq 18 pci2: <unknown card> (vendor=0x1077, dev=0x1216) at 2.0 irq 19 fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xfcc0-0xfcff mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 17 at device 7.0 on pci0 fxp0: Ethernet address 00:e0:81:00:f0:d7 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0 isa0: <ISA bus> on isab0 pci0: <Unknown PCI ATA controller> at 15.1 pcib1: <ServerWorks NB6536 2.0HE host to PCI bridge> on motherboard pci4: <PCI bus> on pcib1 pcib2: <ServerWorks host to PCI bridge> on motherboard pci5: <PCI bus> on pcib2 pcib4: <ServerWorks host to PCI bridge> on motherboard pci6: <PCI bus> on pcib4 orm0: <Option ROMs> at iomem 0xc0000-0xc9fff,0xca000-0xcdfff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> on isa0 sc0: VGA <6 virtual consoles, flags=0x200> fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 flags 0x10 on isa0 sio1: type 16550A ppc0: <Parallel port> at port 0x378-0x37f irq 7 drq 1 flags 0x8 on isa0 ppc0: SMC-like chipset (ECP-only) in ECP mode ppc0: FIFO with 16/16/8 bytes threshold lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 DUMMYNET initialized (010124) IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to deny, unlimited logging IPsec: Initialized Security Association Processing. Waiting 4 seconds for SCSI devices to settle (noperiph:sym0:0:-1:-1): SCSI BUS reset delivered. (noperiph:sym1:0:-1:-1): SCSI BUS reset delivered. amrd0: <MegaRAID logical drive> on amr0 amrd0: 245014MB (501788672 sectors) RAID 5 (optimal) SMP: AP CPU #1 Launched! sa0 at sym1 bus 0 target 5 lun 0 sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) Mounting root from ufs:/dev/amrd0s1a ch0 at sym1 bus 0 target 5 lun 1 ch0: <HP C5713A H910> Removable Changer SCSI-2 device ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) ch0: 6 slots, 1 drive, 0 pickers, 0 portals cd0 at sym1 bus 0 target 3 lun 0 cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 16) cd0: Attempt to query device size failed: NOT READY, Medium not present link_elf: symbol splash_register undefined fxp0: promiscuous mode enabled ------------------------- - mptable output - ------------------------- =============================================================================== MPTable, version 2.0.15 looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009ec00 searching CMOS 'top of mem' @ 0x0009e800 (634K) searching default 'top of mem' @ 0x0009fc00 (639K) searching BIOS @ 0x000f0000 MP FPS found in BIOS @ physical addr: 0x000f7450 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f7450 signature: '_MP_' length: 16 bytes version: 1.4 checksum: 0x4a mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0009ed60 signature: 'PCMP' base table length: 348 version: 1.4 checksum: 0xef OEM ID: 'INTRGRPH' Product ID: 'ZX10 ' OEM table pointer: 0x00000000 OEM table size: 0 entry count: 35 local APIC address: 0xfee00000 extended table length: 148 extended table checksum: 247 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 1 0x11 BSP, usable 6 8 3 0x387fbff 0 0x11 AP, usable 6 8 3 0x387fbff -- Bus: Bus ID Type 0 PCI 1 PCI 2 PCI 3 PCI 4 PCI 5 ISA -- I/O APICs: APIC ID Version State Address 2 0x11 usable 0xfec00000 3 0x11 usable 0xfec01000 -- I/O Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT active-hi edge 5 0 2 0 INT active-hi edge 5 1 2 1 INT active-hi edge 5 0 2 2 INT active-hi edge 5 3 2 3 INT active-hi edge 5 4 2 4 INT active-lo edge 5 5 2 5 INT active-hi edge 5 6 2 6 INT active-hi edge 5 7 2 7 INT active-hi edge 5 8 2 8 INT active-lo level 5 9 2 9 INT active-lo edge 5 10 2 10 INT active-lo edge 5 11 2 11 INT active-hi edge 5 12 2 12 INT active-hi edge 5 13 2 13 INT active-hi edge 5 14 2 14 INT active-hi edge 5 15 2 15 INT active-lo level 0 1:A 3 13 INT active-lo level 0 1:B 3 12 INT active-lo level 3 0:A 3 0 INT active-lo level 2 1:A 3 1 INT active-lo level 2 2:A 3 2 INT active-lo level 0 7:A 3 7 INT active-lo level 1 0:A 3 1 -- Local Ints: Type Polarity Trigger Bus ID IRQ APIC ID PIN# ExtINT active-hi edge 5 0 255 0 NMI active-hi edge 0 0:A 255 1 ------------------------------------------------------------------------------- MP Config Extended Table Entries: -- System Address Space bus ID: 0 address type: I/O address address base: 0x0 address range: 0x10000 -- System Address Space bus ID: 0 address type: memory address address base: 0x80000000 address range: 0x74000000 -- System Address Space bus ID: 0 address type: prefetch address address base: 0xf4000000 address range: 0x8000000 -- System Address Space bus ID: 0 address type: memory address address base: 0xfc000000 address range: 0x2e00000 -- System Address Space bus ID: 0 address type: memory address address base: 0xfee01000 address range: 0x11ff000 -- System Address Space bus ID: 0 address type: memory address address base: 0xa0000 address range: 0x20000 -- System Address Space bus ID: 0 address type: memory address address base: 0xd0000 address range: 0x18000 -- Bus Heirarchy bus ID: 5 bus info: 0x01 parent bus ID: 0 =============================================================================== -- MfG O. Hartmann ohartman@klima.physik.uni-mainz.de ---------------------------------------------------------------- IT-Administration des Institutes fuer Physik der Atmosphaere (IPA) ---------------------------------------------------------------- Johannes Gutenberg Universitaet Mainz Becherweg 21 55099 Mainz Tel: +496131/3924662 (Maschinenraum) Tel: +496131/3924144 FAX: +496131/3923532 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010905130056.K26477-100000>