From owner-freebsd-smp Wed Sep 5 4:31: 2 2001 Delivered-To: freebsd-smp@freebsd.org Received: from klima.physik.uni-mainz.de (klima.Physik.Uni-Mainz.DE [134.93.180.162]) by hub.freebsd.org (Postfix) with ESMTP id 2424837B406; Wed, 5 Sep 2001 04:30:38 -0700 (PDT) Received: from klima.Physik.Uni-Mainz.DE (klima.Physik.Uni-Mainz.DE [134.93.180.162]) by klima.physik.uni-mainz.de (8.11.6/8.11.4) with ESMTP id f85BUaA27598; Wed, 5 Sep 2001 13:30:37 +0200 (CEST) (envelope-from ohartman@klima.physik.uni-mainz.de) Date: Wed, 5 Sep 2001 13:30:36 +0200 (CEST) From: "Hartmann, O." To: Cc: Subject: Spontanous reboot on SMP system FBSD 4.4-RC Message-ID: <20010905130056.K26477-100000@klima.physik.uni-mainz.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Dear Sirs. Again, I have a long time not seen problem on one of our SMP machines. We have four servers with dual Intel CPUs around here and one machine is euqipted with a TYAN 2500 (ServerWorks IIIHE chipset) mainboard, Slot 1, dual 866 PIII, 2GB ECC RAM. In the early time of FBSD 4.3 I have had this problem, too. But it seemed to me that after a while the problem has gone away. At this moment the servers run FBSD 4.4-RC, cvsupdated three days ago (and for that with a recent system running). This server runs a 4 channel AMI Enterprise 1600 RAID controller with over 240 GB hard disk space. Another server is running the 2 channel version of this controller in a 32 PCI slot - without a problem. So earlier responses of my serious problem have targeted the AMI controller - but I think it isn't. The machine is an a big cabinet with two redundant 300W quality power supplies and a lot of fans for cooling. Internal temperature is never over 38 degrees Celsius, the server's room is air conditioned. So I'm sure that no environmental problems (e.g heat) is the problem. The kernel of this system is configured 'normaly' execept that I use the ISA option 'options AUTO_EOI_1' The further option AUTO_EOI_2 works also, but only for a while and the server could be forced to reboot sponatnous very likely by using this option. I use AUTO_EOI_1 due the fact I was told that this option increases performance (?). On all other system (one machine is a dual PII 350/GigaByte GA686BXD, one a dual 600MHz KATMAI on a ASUS P2B-D and one is a dual 800EB PIII on a ASUS CUV4X-D) the option AUTO_EOI_1 works fine and these systems never have had these spontanous reboots. The phenomenon is that the rebooting machine never reboots on heavy load or a while after beeing under heavy load. I suspected faulty hardware for the problem, but I never tracked down those components. This machine is used as a computational system for numeric solutions and in addition as a NFS Server. The longest uptime without a reboot was three weeks. The reason why the machine gets so often rebooted is because we do very often cvsupdates, almost every day. For several campaigns the duty cycle is one week and for that the system ran stable in the meanwhile, since yesterday. After one day uptime it has a spontanous reboot at 10 o'clock in the morningtime and this is a time it is not under load and no cron jobs are cycled. This problem is very, very serious to me due the fact I can not rely on this machine when we start a campaign next month in which we need this machine for several numerical simulations (small ones, but they run a long time). In the past FreeBSD has very often been the target of sponatnous reboots on several hardware platforms, as I remember and ServerWorks chipsets seemed to be candidates. But expensive hardware should be more stable than cheaper hardware, in my opinion (that's the reason why we spent a lot of money for those systems). In addition, I send you the dmesg output of the running kernel and a mptable output. Hope someone can help me a little bit. Many thanks in advance, Oliver ------------------------- - dmesg output - ------------------------- Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.4-RC #168: Wed Sep 5 01:22:09 CEST 2001 root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (868.57-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x387fbff real memory = 2147483648 (2097152K bytes) avail memory = 2087907328 (2038972K bytes) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 Preloaded elf kernel "kernel" at 0xc03a0000. Pentium Pro MTRR support enabled Using $PIR table, 12 entries at 0xc00fdf00 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #1 intpin 13 -> irq 2 IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 7 -> irq 17 pci0: on pcib0 pcib3: at device 0.1 on pci0 IOAPIC #1 intpin 1 -> irq 18 pci1: on pcib3 pci1: at 0.0 irq 18 sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 sym0: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 sym1: Symbios NVRAM, ID 7, Fast-40, SE, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. pcib5: at device 3.0 on pci0 IOAPIC #1 intpin 2 -> irq 19 pci2: on pcib5 pcib6: at device 0.0 on pci2 IOAPIC #1 intpin 0 -> irq 20 pci3: on pcib6 amr0: mem 0xf4000000-0xf7ffffff irq 20 at device 0.0 on pci3 amr0: Firmware A159, BIOS 3.11, 64MB RAM pci2: (vendor=0x1077, dev=0x1216) at 1.0 irq 18 pci2: (vendor=0x1077, dev=0x1216) at 2.0 irq 19 fxp0: port 0xfcc0-0xfcff mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 17 at device 7.0 on pci0 fxp0: Ethernet address 00:e0:81:00:f0:d7 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: at device 15.0 on pci0 isa0: on isab0 pci0: at 15.1 pcib1: on motherboard pci4: on pcib1 pcib2: on motherboard pci5: on pcib2 pcib4: on motherboard pci6: on pcib4 orm0: