From owner-freebsd-stable@FreeBSD.ORG Wed Aug 20 11:18:36 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DD7316A4BF; Wed, 20 Aug 2003 11:18:36 -0700 (PDT) Received: from mail.fpsn.net (mail.fpsn.net [63.224.69.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id EDB2F43FD7; Wed, 20 Aug 2003 11:18:32 -0700 (PDT) (envelope-from cfaber@fpsn.net) Received: from fpsn.net (mirc-sucks@unixgr.com [63.224.69.60]) (authenticated bits=0) by mail.fpsn.net (8.12.9/8.12.9) with ESMTP id h7KIIKrd025326; Wed, 20 Aug 2003 12:18:23 -0600 (MDT) Message-ID: <3F43BB52.5060503@fpsn.net> Date: Wed, 20 Aug 2003 12:17:54 -0600 From: Colin Faber Organization: fpsn.net, Inc. (http://www.fpsn.net) User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030507 X-Accept-Language: en-us, en MIME-Version: 1.0 To: ohartman@klima.physik.uni-mainz.de References: <20030813103509.Q49991@mail.physik.uni-mainz.de> In-Reply-To: <20030813103509.Q49991@mail.physik.uni-mainz.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Filter-Engine: scanmail (Ruckus scanmail) 1.0-Beta (ab 1.89) X-Filter-Url: http://www.fpsn.net/ruckus X-Spam: No cc: freebsd-stable@freebsd.org cc: freebsd-smp@freebsd.org Subject: (2) 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Aug 2003 18:18:36 -0000 Hi, I've got nearly the same setup in a Dell 1600SC with a gig of ram and a PERC4/Sc (LSI MegaRAID) card. Dual 2.4GHz Xeon P4 HT CPU's and I've discovered I can lock up FreeBSD 5.1-RELEASE-p2 on command simply by running something to quickly create and remove a directory. i.e.: perl -e 'for(my $i = 0 ; $i < 9999; $i++){ mkdir("abc"); rmdir("abc"); }' Having machdep.cpu_idle_hlt = 0 makes no difference. Kernel: FreeBSD 5.1-RELEASE-p2 FreeBSD 5.1-RELEASE-p2 #0: Mon Aug 11 21:40:47 MDT 2003 i386 Raid: amr0: mem 0xfcd00000-0xfcd0ffff irq 3 at device 2.0 on pci1 amrd0: on amr0 amrd0: 34556MB (70770688 sectors) RAID 5 (optimal) I suspect that your and my problems are more driver related to the amr driver and may be exposing some other problem with in the kernels fs locking. I don't think (as others have suggested) that your issue is power related, or related to the combination of hardware you're using. (Other than the fact that you've got a MegaRAID card). The exact crash message I'm seeing is: panic: lockmgr: locking against myself cpuid = 0; lapic.id 00000000 boot() called on cpu#0 syncing disks, buffers remaining... panic: ffs_copyonwrite: recursive call cpuid = 0; lapic.id 00000000 boot() called on cpu#0 Uptime: 58s pfs_vncache_unload(): 7 entries remaining amr0: flushing cache...done Terminate ACPI Hartmann, O. wrote: > Dear Sirs. > > It seems to me a never ending story. We run a box with a TYAN Thunder > 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise > 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit) > GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this > state was really hard! It is a story similar to that what happend when > we changed towards FreeBSD 5.1-RELEASE-p2 on this machine. > > It seems to be highly dependend in which PCI slot several cards are > attached, so I will report this here also. > > Phenomenon: > > After a while the machine was running, the SMP kernel reboots > spontanously. This is when heavy IO is done, compiling or, when in the > morning time our department gets up and our staff connects to the samba > server. > > Dependend on which devices are switched on or off by BIOS, the kernel > freezes at the stage when the amr0 RAID got recognized. I can avoid this > by enabling the built in NIC (fxp0). I can force this by putting the em0 > NIC into another slot, for instance in the one remaining 64BIT/66MHz > slot (which should be a separate bus). > > This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I > found out, that putting an additional NIC into PCI slot No. 2 (counted > from AGP slot on) made things clear, but using both NICs together > (either additional fxp0 or the new em0) remains the systems completely > unstable. > > In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this > 'gambling' seems to reach its climax. My kernel is built up with > SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately > the same way as described (running a while, then coredumping or freeze > at the stage after the amr0-RAID showed up in the kernel boot messages, > see the dmesg output below). > > I'm not an hardware expert, but all this wierd stuff looks like to me to be > a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations, > but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing > problems in the TYAN Thunder 2500 SMP environment. > > We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D, > AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is > FreeBSD 4.8 and this system never had any problem! > > I feel a little bit helpless this moment, because I think I tried every trick > and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD > 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after > booting at the same place (amr0 RAID recognition). > > Is there any help outside? > > I attach the kernel config file and the dmesg output. Please note: I disabled both > serial ports, the parallel port, sound and usb to get additional IRQs. But I have to > enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box. > > ==================================== > DMESG output > ==================================== > > Copyright (c) 1992-2003 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003 > root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS > Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000. > Timecounter "i8254" frequency 1193182 Hz > Timecounter "TSC" frequency 868644793 Hz > CPU: Intel Pentium III (868.64-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x683 Stepping = 3 > Features=0x387fbff > real memory = 2147483648 (2048 MB) > avail memory = 2085625856 (1989 MB) > Programming 16 pins in IOAPIC #0 > IOAPIC #0 intpin 2 -> irq 0 > Programming 16 pins in IOAPIC #1 > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 > cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 > io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 > io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 > netsmb_dev: loaded > Pentium Pro MTRR support enabled > npx0: on motherboard > npx0: INT 16 interface > pcibios: BIOS version 2.10 > Using $PIR table, 12 entries at 0xc00fdf00 > pcib0: at pcibus 0 on motherboard > pci0: on pcib0 > IOAPIC #1 intpin 13 -> irq 2 > IOAPIC #1 intpin 12 -> irq 16 > IOAPIC #1 intpin 2 -> irq 17 > IOAPIC #1 intpin 7 -> irq 18 > pcib1: at device 0.1 on pci0 > pci1: on pcib1 > IOAPIC #1 intpin 1 -> irq 19 > pci1: at device 0.0 (no driver attached) > sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 > sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking > sym0: open drain IRQ line driver, using on-chip SRAM > sym0: using LOAD/STORE-based firmware. > sym0: handling phase mismatch from SCRIPTS. > sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 > sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking > sym1: open drain IRQ line driver, using on-chip SRAM > sym1: using LOAD/STORE-based firmware. > sym1: handling phase mismatch from SCRIPTS. > em0: port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0 > em0: Speed:1000 Mbps Duplex:Full > fxp0: port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0 > fxp0: Ethernet address 00:e0:81:00:f0:d7 > miibus0: on fxp0 > inphy0: on miibus0 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > isab0: port 0x500-0x50f at device 15.0 on pci0 > isa0: on isab0 > pci0: at device 15.1 (no driver attached) > pcib2: at pcibus 2 on motherboard > pci2: on pcib2 > pcib3: at device 2.0 on pci2 > pci3: on pcib3 > IOAPIC #1 intpin 11 -> irq 20 > IOAPIC #1 intpin 8 -> irq 21 > pcib4: at device 0.0 on pci3 > pci4: on pcib4 > IOAPIC #1 intpin 10 -> irq 22 > amr0: mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4 > amr0: Firmware G170, BIOS F316, 64MB RAM > pci3: at device 1.0 (no driver attached) > pci3: at device 2.0 (no driver attached) > orm0: