From owner-freebsd-smp@FreeBSD.ORG Wed Aug 13 01:59:21 2003 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E79B237B401; Wed, 13 Aug 2003 01:59:21 -0700 (PDT) Received: from klima.physik.uni-mainz.de (klima.Physik.Uni-Mainz.DE [134.93.180.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id C09F743F3F; Wed, 13 Aug 2003 01:59:20 -0700 (PDT) (envelope-from ohartman@klima.physik.uni-mainz.de) Received: from mail.physik.uni-mainz.de (mail.physik.uni-mainz.de [134.93.180.161])h7D8xJCl070715; Wed, 13 Aug 2003 10:59:19 +0200 (CEST) (envelope-from ohartman@klima.physik.uni-mainz.de) Date: Wed, 13 Aug 2003 10:59:19 +0200 (CEST) From: "Hartmann, O." X-X-Sender: ohartman@mail.physik.uni-mainz.de To: freebsd-smp@freebsd.org Message-ID: <20030813103509.Q49991@mail.physik.uni-mainz.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Aug 2003 08:59:22 -0000 Dear Sirs. It seems to me a never ending story. We run a box with a TYAN Thunder 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit) GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this state was really hard! It is a story similar to that what happend when we changed towards FreeBSD 5.1-RELEASE-p2 on this machine. It seems to be highly dependend in which PCI slot several cards are attached, so I will report this here also. Phenomenon: After a while the machine was running, the SMP kernel reboots spontanously. This is when heavy IO is done, compiling or, when in the morning time our department gets up and our staff connects to the samba server. Dependend on which devices are switched on or off by BIOS, the kernel freezes at the stage when the amr0 RAID got recognized. I can avoid this by enabling the built in NIC (fxp0). I can force this by putting the em0 NIC into another slot, for instance in the one remaining 64BIT/66MHz slot (which should be a separate bus). This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I found out, that putting an additional NIC into PCI slot No. 2 (counted from AGP slot on) made things clear, but using both NICs together (either additional fxp0 or the new em0) remains the systems completely unstable. In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this 'gambling' seems to reach its climax. My kernel is built up with SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately the same way as described (running a while, then coredumping or freeze at the stage after the amr0-RAID showed up in the kernel boot messages, see the dmesg output below). I'm not an hardware expert, but all this wierd stuff looks like to me to be a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations, but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing problems in the TYAN Thunder 2500 SMP environment. We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D, AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is FreeBSD 4.8 and this system never had any problem! I feel a little bit helpless this moment, because I think I tried every trick and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after booting at the same place (amr0 RAID recognition). Is there any help outside? I attach the kernel config file and the dmesg output. Please note: I disabled both serial ports, the parallel port, sound and usb to get additional IRQs. But I have to enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box. ==================================== DMESG output ==================================== Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003 root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 868644793 Hz CPU: Intel Pentium III (868.64-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x387fbff real memory = 2147483648 (2048 MB) avail memory = 2085625856 (1989 MB) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 netsmb_dev: loaded Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface pcibios: BIOS version 2.10 Using $PIR table, 12 entries at 0xc00fdf00 pcib0: at pcibus 0 on motherboard pci0: on pcib0 IOAPIC #1 intpin 13 -> irq 2 IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 2 -> irq 17 IOAPIC #1 intpin 7 -> irq 18 pcib1: at device 0.1 on pci0 pci1: on pcib1 IOAPIC #1 intpin 1 -> irq 19 pci1: at device 0.0 (no driver attached) sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. em0: port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0 em0: Speed:1000 Mbps Duplex:Full fxp0: port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0 fxp0: Ethernet address 00:e0:81:00:f0:d7 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: port 0x500-0x50f at device 15.0 on pci0 isa0: on isab0 pci0: at device 15.1 (no driver attached) pcib2: at pcibus 2 on motherboard pci2: on pcib2 pcib3: at device 2.0 on pci2 pci3: on pcib3 IOAPIC #1 intpin 11 -> irq 20 IOAPIC #1 intpin 8 -> irq 21 pcib4: at device 0.0 on pci3 pci4: on pcib4 IOAPIC #1 intpin 10 -> irq 22 amr0: mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4 amr0: Firmware G170, BIOS F316, 64MB RAM pci3: at device 1.0 (no driver attached) pci3: at device 2.0 (no driver attached) orm0: