Date: Wed, 20 Aug 2003 12:17:54 -0600 From: Colin Faber <cfaber@fpsn.net> To: ohartman@klima.physik.uni-mainz.de Cc: freebsd-smp@freebsd.org Subject: (2) 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro Message-ID: <3F43BB52.5060503@fpsn.net> In-Reply-To: <20030813103509.Q49991@mail.physik.uni-mainz.de> References: <20030813103509.Q49991@mail.physik.uni-mainz.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, I've got nearly the same setup in a Dell 1600SC with a gig of ram and a PERC4/Sc (LSI MegaRAID) card. Dual 2.4GHz Xeon P4 HT CPU's and I've discovered I can lock up FreeBSD 5.1-RELEASE-p2 on command simply by running something to quickly create and remove a directory. i.e.: perl -e 'for(my $i = 0 ; $i < 9999; $i++){ mkdir("abc"); rmdir("abc"); }' Having machdep.cpu_idle_hlt = 0 makes no difference. Kernel: FreeBSD 5.1-RELEASE-p2 FreeBSD 5.1-RELEASE-p2 #0: Mon Aug 11 21:40:47 MDT 2003 i386 Raid: amr0: <LSILogic MegaRAID> mem 0xfcd00000-0xfcd0ffff irq 3 at device 2.0 on pci1 amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 34556MB (70770688 sectors) RAID 5 (optimal) I suspect that your and my problems are more driver related to the amr driver and may be exposing some other problem with in the kernels fs locking. I don't think (as others have suggested) that your issue is power related, or related to the combination of hardware you're using. (Other than the fact that you've got a MegaRAID card). The exact crash message I'm seeing is: panic: lockmgr: locking against myself cpuid = 0; lapic.id 00000000 boot() called on cpu#0 syncing disks, buffers remaining... panic: ffs_copyonwrite: recursive call cpuid = 0; lapic.id 00000000 boot() called on cpu#0 Uptime: 58s pfs_vncache_unload(): 7 entries remaining amr0: flushing cache...done Terminate ACPI Hartmann, O. wrote: > Dear Sirs. > > It seems to me a never ending story. We run a box with a TYAN Thunder > 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise > 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit) > GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this > state was really hard! It is a story similar to that what happend when > we changed towards FreeBSD 5.1-RELEASE-p2 on this machine. > > It seems to be highly dependend in which PCI slot several cards are > attached, so I will report this here also. > > Phenomenon: > > After a while the machine was running, the SMP kernel reboots > spontanously. This is when heavy IO is done, compiling or, when in the > morning time our department gets up and our staff connects to the samba > server. > > Dependend on which devices are switched on or off by BIOS, the kernel > freezes at the stage when the amr0 RAID got recognized. I can avoid this > by enabling the built in NIC (fxp0). I can force this by putting the em0 > NIC into another slot, for instance in the one remaining 64BIT/66MHz > slot (which should be a separate bus). > > This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I > found out, that putting an additional NIC into PCI slot No. 2 (counted > from AGP slot on) made things clear, but using both NICs together > (either additional fxp0 or the new em0) remains the systems completely > unstable. > > In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this > 'gambling' seems to reach its climax. My kernel is built up with > SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately > the same way as described (running a while, then coredumping or freeze > at the stage after the amr0-RAID showed up in the kernel boot messages, > see the dmesg output below). > > I'm not an hardware expert, but all this wierd stuff looks like to me to be > a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations, > but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing > problems in the TYAN Thunder 2500 SMP environment. > > We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D, > AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is > FreeBSD 4.8 and this system never had any problem! > > I feel a little bit helpless this moment, because I think I tried every trick > and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD > 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after > booting at the same place (amr0 RAID recognition). > > Is there any help outside? > > I attach the kernel config file and the dmesg output. Please note: I disabled both > serial ports, the parallel port, sound and usb to get additional IRQs. But I have to > enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box. > > ==================================== > DMESG output > ==================================== > > Copyright (c) 1992-2003 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003 > root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS > Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000. > Timecounter "i8254" frequency 1193182 Hz > Timecounter "TSC" frequency 868644793 Hz > CPU: Intel Pentium III (868.64-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x683 Stepping = 3 > Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> > real memory = 2147483648 (2048 MB) > avail memory = 2085625856 (1989 MB) > Programming 16 pins in IOAPIC #0 > IOAPIC #0 intpin 2 -> irq 0 > Programming 16 pins in IOAPIC #1 > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 > cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 > io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 > io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 > netsmb_dev: loaded > Pentium Pro MTRR support enabled > npx0: <math processor> on motherboard > npx0: INT 16 interface > pcibios: BIOS version 2.10 > Using $PIR table, 12 entries at 0xc00fdf00 > pcib0: <Host to PCI bridge> at pcibus 0 on motherboard > pci0: <PCI bus> on pcib0 > IOAPIC #1 intpin 13 -> irq 2 > IOAPIC #1 intpin 12 -> irq 16 > IOAPIC #1 intpin 2 -> irq 17 > IOAPIC #1 intpin 7 -> irq 18 > pcib1: <PCI-PCI bridge> at device 0.1 on pci0 > pci1: <PCI bus> on pcib1 > IOAPIC #1 intpin 1 -> irq 19 > pci1: <display, VGA> at device 0.0 (no driver attached) > sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 > sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking > sym0: open drain IRQ line driver, using on-chip SRAM > sym0: using LOAD/STORE-based firmware. > sym0: handling phase mismatch from SCRIPTS. > sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 > sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking > sym1: open drain IRQ line driver, using on-chip SRAM > sym1: using LOAD/STORE-based firmware. > sym1: handling phase mismatch from SCRIPTS. > em0: <Intel(R) PRO/1000 Network Connection, Version - 1.5.31> port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0 > em0: Speed:1000 Mbps Duplex:Full > fxp0: <Intel 82557/8/9 EtherExpress Pro/100(B) Ethernet> port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0 > fxp0: Ethernet address 00:e0:81:00:f0:d7 > miibus0: <MII bus> on fxp0 > inphy0: <i82555 10/100 media interface> on miibus0 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > isab0: <PCI-ISA bridge> port 0x500-0x50f at device 15.0 on pci0 > isa0: <ISA bus> on isab0 > pci0: <mass storage, ATA> at device 15.1 (no driver attached) > pcib2: <ServerWorks host to PCI bridge> at pcibus 2 on motherboard > pci2: <PCI bus> on pcib2 > pcib3: <PCI-PCI bridge> at device 2.0 on pci2 > pci3: <PCI bus> on pcib3 > IOAPIC #1 intpin 11 -> irq 20 > IOAPIC #1 intpin 8 -> irq 21 > pcib4: <PCI-PCI bridge> at device 0.0 on pci3 > pci4: <PCI bus> on pcib4 > IOAPIC #1 intpin 10 -> irq 22 > amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4 > amr0: <LSILogic MegaRAID Enterprise 1600> Firmware G170, BIOS F316, 64MB RAM > pci3: <mass storage, SCSI> at device 1.0 (no driver attached) > pci3: <mass storage, SCSI> at device 2.0 (no driver attached) > orm0: <Option ROMs> at iomem 0xca000-0xcdfff,0xc0000-0xc9fff on isa0 > fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 > fdc0: FIFO enabled, 8 bytes threshold > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 > atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > psm0: <PS/2 Mouse> irq 12 on atkbdc0 > psm0: model IntelliMouse, device ID 3 > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <8 virtual consoles, flags=0x300> > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 > sio0: type 8250 or not responding > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > ppc0: parallel port not found. > unknown: <PNP0303> can't assign resources (port) > psmcpnp0: irq resource info is missing; assuming irq 12 > unknown: <PNP0700> can't assign resources (port) > ppc1: parallel port not found. > APIC_IO: Testing 8254 interrupt delivery > APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 > APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 > Timecounters tick every 1.000 msec > ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited > DUMMYNET initialized (011031) > Waiting 5 seconds for SCSI devices to settle > (noperiph:sym0:0:-1:-1): SCSI BUS reset delivered. > (noperiph:sym1:0:-1:-1): SCSI BUS reset delivered. > amrd0: <LSILogic MegaRAID logical drive> on amr0 > amrd0: 245014MB (501788672 sectors) RAID 5 (optimal) > > ===> freezing here! > > sa0 at sym1 bus 0 target 5 lun 0 > sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device > sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) > ch0 at sym1 bus 0 target 5 lun 1 > ch0: <HP C5713A H910> Removable Changer SCSI-2 device > ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) > ch0: 6 slots, 1 drive, 0 pickers, 0 portals > SMP: AP CPU #1 Launched! > Mounting root from ufs:/dev/amrd0s1a > cd0 at sym0 bus 0 target 3 lun 0 > cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device > cd0: 20.000MB/s transfers (20.000MHz, offset 16) > cd0: Attempt to query device size failed: NOT READY, Medium not present > > ======================== > KERNEL config file > ======================== > > machine i386 > cpu I686_CPU > ident ATMOS > > options SMP # Symmetric MultiProcessor Kernel > options APIC_IO # Symmetric (APIC) I/O > > maxusers 0 > > hints "ATMOS.hints" #Default places to look for devices. > > > #options COMPAT_FREEBSD4 > options SCHED_4BSD #4BSD scheduler > > #options SCHED_ULE > #options ADAPTIVE_MUTEXES > > #options PQ_CACHESIZE=256 > > options CPU_ENABLE_SSE > > options CLK_USE_TSC_CALIBRATION > #options HZ=1000 > > #makeoptions CONF_CFLAGS=-fno-builtin > #options MAXDSIZ=(1024UL*1024*1024) > #options MAXSSIZ=(128UL*1024*1024) > #options DFLDSIZ=(1024UL*1024*1024) > > options GEOM_AES > options GEOM_APPLE > options GEOM_BDE > options GEOM_BSD > options GEOM_GPT > options GEOM_MBR > options GEOM_PC98 > options GEOM_SUNLABEL > options GEOM_VOL > > options ROOTDEVNAME=\"ufs:amrd0s1a\" > > options INET #InterNETworking > #options INET6 #IPv6 communications protocols > options FFS #Berkeley Fast Filesystem > options SOFTUPDATES #Enable FFS soft updates support > options UFS_ACL #Support for access control lists > options UFS_DIRHASH #Improve performance on big directories > options NFSCLIENT #Network Filesystem Client > options NFSSERVER #Network Filesystem Server > options MSDOSFS #MSDOS Filesystem > options CD9660 #ISO 9660 Filesystem > options PROCFS #Process filesystem (requires PSEUDOFS) > options PSEUDOFS #Pseudo-filesystem framework > options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] > options SCSI_DELAY=5000 #Delay (in ms) before probing SCSI > > options SYSVSHM #SYSV-style shared memory > options SYSVMSG #SYSV-style message queues > options SYSVSEM #SYSV-style semaphores > > options NETSMB > options NETSMBCRYPTO > options LIBMCHAIN > options LIBICONV > > #options WATCHDOG > > options NETGRAPH > #options NETGRAPH_ASYNC > #options NETGRAPH_BPF > #options NETGRAPH_BRIDGE > #options NETGRAPH_CISCO > #options NETGRAPH_ECHO > #options NETGRAPH_ETHER > #options NETGRAPH_FRAME_RELAY > #options NETGRAPH_GIF > #options NETGRAPH_GIF_DEMUX > #options NETGRAPH_HOLE > #options NETGRAPH_IFACE > #options NETGRAPH_IP_INPUT > #options NETGRAPH_KSOCKET > #options NETGRAPH_L2TP > #options NETGRAPH_LMI > #options NETGRAPH_MPPC_ENCRYPTION > #options NETGRAPH_ONE2MANY > #options NETGRAPH_PPP > #options NETGRAPH_PPPOE > #options NETGRAPH_PPTPGRE > #options NETGRAPH_RFC1490 > #options NETGRAPH_SOCKET > #options NETGRAPH_SPLIT > #options NETGRAPH_TEE > #options NETGRAPH_TTY > #options NETGRAPH_UI > #options NETGRAPH_VJC > > options MROUTING > options IPFIREWALL > options IPFIREWALL_VERBOSE > options IPFIREWALL_FORWARD > #options IPFIREWALL_VERBOSE_LIMIT=100 > #options IPFIREWALL_DEFAULT_TO_ACCEPT > #options IPV6FIREWALL > #options IPV6FIREWALL_VERBOSE > #options IPV6FIREWALL_VERBOSE_LIMIT=100 > #options IPV6FIREWALL_DEFAULT_TO_ACCEPT > options IPDIVERT > #options IPFILTER > #options IPFILTER_LOG > #options IPFILTER_DEFAULT_BLOCK > options IPSTEALTH > > options RANDOM_IP_ID > > options ACCEPT_FILTER_DATA > #options ACCEPT_FILTER_HTTP > > options TCP_DROP_SYNFIN > options DUMMYNET > #options BRIDGE > > options QUOTA > > options _KPOSIX_PRIORITY_SCHEDULING > options P1003_1B_SEMAPHORES > > #options MAC > #options MAC_BIBA > #options MAC_BSDEXTENDED > #options MAC_DEBUG > #options MAC_IFOFF > #options MAC_LOMAC > #options MAC_MLS > #options MAC_NONE > #options MAC_PARTITION > #options MAC_SEEOTHERUIDS > #options MAC_TEST > > options KBD_INSTALL_CDEV # install a CDEV entry in /dev > > device isa > #options AUTO_EOI_1 > > device pci > > device agp > > # Floppy drives > device fdc > > # SCSI Controllers > device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') > #device ahc > > # SCSI peripherals > device scbus # SCSI bus (required) > device ch # SCSI media changers > device da # Direct Access (disks) > device sa # Sequential Access (tape etc) > device cd # CD > device pass # Passthrough device (direct SCSI access) > device ses # SCSI Environmental Services (and SAF-TE) > > > # RAID controllers > device amr # AMI MegaRAID > > > #options CHANGER_MIN_BUSY_SECONDS=2 > #options CHANGER_MAX_BUSY_SECONDS=10 > > #options SA_IO_TIMEOUT=4 > #options SA_SPACE_TIMEOUT=60 > #options SA_REWIND_TIMEOUT=(2*60) > #options SA_ERASE_TIMEOUT=(4*60) > #options SA_1FM_AT_EOD > > #options SCSI_PT_DEFAULT_TIMEOUT=60 > options SES_ENABLE_PASSTHROUGH > > > # atkbdc0 controls both the keyboard and the PS/2 mouse > device atkbdc # AT keyboard controller > device atkbd # AT keyboard > options ATKBD_DFLT_KEYMAP > makeoptions ATKBD_DFLT_KEYMAP=us.iso > > device psm # PS/2 mouse > > device vga # VGA video card driver > > device splash # Splash screen and screen saver support > > # syscons is the default console driver, resembling an SCO console > device sc > options MAXCONS=8 > > #options SC_ALT_MOUSE_IMAGE > options SC_DFLT_FONT > makeoptions SC_DFLT_FONT=cp850 > > options SC_DISABLE_DDBKEY > options SC_DISABLE_REBOOT > options SC_HISTORY_SIZE=512 > #options SC_MOUSE_CHAR=0x3 > options SC_PIXEL_MODE > options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) > options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN) > options SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK) > options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED) > #options SC_CUT_SPACES2TABS > #options SC_CUT_SEPCHARS=\"x09\" > #options SC_TWOBUTTON_MOUSE > #options SC_NO_CUTPASTE > #options SC_NO_FONT_LOADING > #options SC_NO_HISTORY > #options SC_NO_SYSMOUSE > #options SC_NO_SUSPEND_VTYSWITCH > > device npx > > #device pmtimer > > #device sio # 8250, 16[45]50 based serial ports > > # Parallel port > #device ppc > #device ppbus # Parallel port bus (required) > #device lpt # Printer > #device plip # TCP/IP over parallel > #device ppi # Parallel port interface device > #device vpo # Requires scbus and da > > > device miibus # MII bus support > device em > #device fxp # Intel EtherExpress PRO/100B (82557, 82558) > > device random # Entropy device > device loop # Network loopback > device ether # Ethernet support > #device tun # Packet tunnel. > device pty # Pseudo-ttys (telnet etc) > #device gif # IPv6 and IPv4 tunneling > #device faith # IPv6-to-IPv4 relaying (translation) > > device bpf # Berkeley packet filter > > > ------------------ > > > Thanks a lot for your help, > > Oliver > -- > MfG > O. Hartmann > > ohartman@mail.physik.uni-mainz.de > ------------------------------------------------------------------ > Systemadministration des Institutes fuer Physik der Atmosphaere (IPA) > ------------------------------------------------------------------ > Johannes Gutenberg Universitaet Mainz > Becherweg 21 > 55099 Mainz > > Tel: +496131/3924662 (Maschinenraum) > Tel: +496131/3924144 (Buero) > FAX: +496131/3923532 > _______________________________________________ > freebsd-smp@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F43BB52.5060503>