From owner-freebsd-stable@FreeBSD.ORG Thu Aug 14 03:45:24 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C135A37B401; Thu, 14 Aug 2003 03:45:24 -0700 (PDT) Received: from klima.physik.uni-mainz.de (klima.Physik.Uni-Mainz.DE [134.93.180.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id C244A43FF9; Thu, 14 Aug 2003 03:45:17 -0700 (PDT) (envelope-from ohartman@klima.physik.uni-mainz.de) Received: from mail.physik.uni-mainz.de (mail.physik.uni-mainz.de [134.93.180.161])h7EAj9Cl078944; Thu, 14 Aug 2003 12:45:09 +0200 (CEST) (envelope-from ohartman@klima.physik.uni-mainz.de) Date: Thu, 14 Aug 2003 12:45:09 +0200 (CEST) From: "Hartmann, O." X-X-Sender: ohartman@mail.physik.uni-mainz.de To: Charles Sprickman In-Reply-To: <20030814103255.C77242@klima.physik.uni-mainz.de> Message-ID: <20030814124145.W64942@mail.physik.uni-mainz.de> References: <20030813103509.Q49991@mail.physik.uni-mainz.de> <20030814103255.C77242@klima.physik.uni-mainz.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org cc: freebsd-smp@freebsd.org Subject: Re: 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2003 10:45:25 -0000 On Thu, 14 Aug 2003, Hartmann, O. wrote: Anothor couriosity: changing these lines in the kernel config makes the kernel freezing after showing up the state of the amrd0: RAID array. The 'sometimes' running kernel has option enabled: options AUTO_EOI_1 options PQ_CACHESIZE=256 options HZ=1000 I disabled these options, because I thought maybe they cause some problems. Now the running kernel doesn't start anymore :-) Is this really a power supply problem? I think not. :>On Wed, 13 Aug 2003, Charles Sprickman wrote: :> :>Hello. :> :>'healthd' is not working on the TYAN Thunder 2500 (LSI Logic 896 LVD version), :>because the ServerWorks HE chipset seems not to be supported - therefore healthd :>does not report anything. I tried this in the past. :> :>I have problems with this mainboard since we purchased it and run FreeBSD on it. :>Most problems were caused by IRQ problems, I gues and I solved them by fiddling around :>with which PCI card in which PCI slot ... :> :>Our next step is to increase power supplies to 400 or 500 W per unit. But this is :>not the final solution in my opinion. While recovering from a desaster I was forced :>to put an additional two channel LVD2 controller into the board (LSI Logic 24060??, :>a LSI 1010-33 64bit based controller). With this constellation, the system was :>totaly instable (I detached the tape unit from power and disabled the built in :>SCSI controller in BIOS!). :> :> :>:>Hi, :>:> :>:>Sadly, I can offer no help, but I do have a Thunder 2462 (SMP mobo), and I :>:>currently have an Adaptec 2110S raid card in it with no problems. :>:>However, it will NOT work when installed in the riser card... :>:> :>:>I'm wondering if you have healthd running on that board and if it reports :>:>valid data? I'm getting nothing from mine, and I think our boards share :>:>the same Winbond chip... :>:> :>:>Thanks, :>:> :>:>Charles :>:> :>:>-- :>:>Charles Sprickman :>:>spork@inch.com :>:> :>:> :>:>On Wed, 13 Aug 2003, Hartmann, O. wrote: :>:> :>:>> Dear Sirs. :>:>> :>:>> It seems to me a never ending story. We run a box with a TYAN Thunder :>:>> 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise :>:>> 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit) :>:>> GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this :>:>> state was really hard! It is a story similar to that what happend when :>:>> we changed towards FreeBSD 5.1-RELEASE-p2 on this machine. :>:>> :>:>> It seems to be highly dependend in which PCI slot several cards are :>:>> attached, so I will report this here also. :>:>> :>:>> Phenomenon: :>:>> :>:>> After a while the machine was running, the SMP kernel reboots :>:>> spontanously. This is when heavy IO is done, compiling or, when in the :>:>> morning time our department gets up and our staff connects to the samba :>:>> server. :>:>> :>:>> Dependend on which devices are switched on or off by BIOS, the kernel :>:>> freezes at the stage when the amr0 RAID got recognized. I can avoid this :>:>> by enabling the built in NIC (fxp0). I can force this by putting the em0 :>:>> NIC into another slot, for instance in the one remaining 64BIT/66MHz :>:>> slot (which should be a separate bus). :>:>> :>:>> This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I :>:>> found out, that putting an additional NIC into PCI slot No. 2 (counted :>:>> from AGP slot on) made things clear, but using both NICs together :>:>> (either additional fxp0 or the new em0) remains the systems completely :>:>> unstable. :>:>> :>:>> In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this :>:>> 'gambling' seems to reach its climax. My kernel is built up with :>:>> SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately :>:>> the same way as described (running a while, then coredumping or freeze :>:>> at the stage after the amr0-RAID showed up in the kernel boot messages, :>:>> see the dmesg output below). :>:>> :>:>> I'm not an hardware expert, but all this wierd stuff looks like to me to be :>:>> a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations, :>:>> but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing :>:>> problems in the TYAN Thunder 2500 SMP environment. :>:>> :>:>> We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D, :>:>> AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is :>:>> FreeBSD 4.8 and this system never had any problem! :>:>> :>:>> I feel a little bit helpless this moment, because I think I tried every trick :>:>> and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD :>:>> 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after :>:>> booting at the same place (amr0 RAID recognition). :>:>> :>:>> Is there any help outside? :>:>> :>:>> I attach the kernel config file and the dmesg output. Please note: I disabled both :>:>> serial ports, the parallel port, sound and usb to get additional IRQs. But I have to :>:>> enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box. :>:>> :>:>> ==================================== :>:>> DMESG output :>:>> ==================================== :>:>> :>:>> Copyright (c) 1992-2003 The FreeBSD Project. :>:>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 :>:>> The Regents of the University of California. All rights reserved. :>:>> FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003 :>:>> root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS :>:>> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000. :>:>> Timecounter "i8254" frequency 1193182 Hz :>:>> Timecounter "TSC" frequency 868644793 Hz :>:>> CPU: Intel Pentium III (868.64-MHz 686-class CPU) :>:>> Origin = "GenuineIntel" Id = 0x683 Stepping = 3 :>:>> Features=0x387fbff :>:>> real memory = 2147483648 (2048 MB) :>:>> avail memory = 2085625856 (1989 MB) :>:>> Programming 16 pins in IOAPIC #0 :>:>> IOAPIC #0 intpin 2 -> irq 0 :>:>> Programming 16 pins in IOAPIC #1 :>:>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs :>:>> cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 :>:>> cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 :>:>> io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 :>:>> io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 :>:>> netsmb_dev: loaded :>:>> Pentium Pro MTRR support enabled :>:>> npx0: on motherboard :>:>> npx0: INT 16 interface :>:>> pcibios: BIOS version 2.10 :>:>> Using $PIR table, 12 entries at 0xc00fdf00 :>:>> pcib0: at pcibus 0 on motherboard :>:>> pci0: on pcib0 :>:>> IOAPIC #1 intpin 13 -> irq 2 :>:>> IOAPIC #1 intpin 12 -> irq 16 :>:>> IOAPIC #1 intpin 2 -> irq 17 :>:>> IOAPIC #1 intpin 7 -> irq 18 :>:>> pcib1: at device 0.1 on pci0 :>:>> pci1: on pcib1 :>:>> IOAPIC #1 intpin 1 -> irq 19 :>:>> pci1: at device 0.0 (no driver attached) :>:>> sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 :>:>> sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking :>:>> sym0: open drain IRQ line driver, using on-chip SRAM :>:>> sym0: using LOAD/STORE-based firmware. :>:>> sym0: handling phase mismatch from SCRIPTS. :>:>> sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 :>:>> sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking :>:>> sym1: open drain IRQ line driver, using on-chip SRAM :>:>> sym1: using LOAD/STORE-based firmware. :>:>> sym1: handling phase mismatch from SCRIPTS. :>:>> em0: port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0 :>:>> em0: Speed:1000 Mbps Duplex:Full :>:>> fxp0: port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0 :>:>> fxp0: Ethernet address 00:e0:81:00:f0:d7 :>:>> miibus0: on fxp0 :>:>> inphy0: on miibus0 :>:>> inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto :>:>> isab0: port 0x500-0x50f at device 15.0 on pci0 :>:>> isa0: on isab0 :>:>> pci0: at device 15.1 (no driver attached) :>:>> pcib2: at pcibus 2 on motherboard :>:>> pci2: on pcib2 :>:>> pcib3: at device 2.0 on pci2 :>:>> pci3: on pcib3 :>:>> IOAPIC #1 intpin 11 -> irq 20 :>:>> IOAPIC #1 intpin 8 -> irq 21 :>:>> pcib4: at device 0.0 on pci3 :>:>> pci4: on pcib4 :>:>> IOAPIC #1 intpin 10 -> irq 22 :>:>> amr0: mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4 :>:>> amr0: Firmware G170, BIOS F316, 64MB RAM :>:>> pci3: at device 1.0 (no driver attached) :>:>> pci3: at device 2.0 (no driver attached) :>:>> orm0: