Date: Wed, 13 Aug 2003 10:59:19 +0200 (CEST) From: "Hartmann, O." <ohartman@klima.physik.uni-mainz.de> To: freebsd-smp@freebsd.org Cc: freebsd-stable@freebsd.org Subject: 5.1-R-p2 crashes on SMP with AMI RAID and Intel 1000/Pro Message-ID: <20030813103509.Q49991@mail.physik.uni-mainz.de>
next in thread | raw e-mail | index | archive | help
Dear Sirs. It seems to me a never ending story. We run a box with a TYAN Thunder 2500 Dual SMP mainboard, 2GB ECC Tyan certified memory, AMI Enterprise 1600 RAID adapter and additional Intel 1000/Pro server type (64 bit) GBit LAN NIC. With FreeBSD 4.8 this was stable, but to achive this state was really hard! It is a story similar to that what happend when we changed towards FreeBSD 5.1-RELEASE-p2 on this machine. It seems to be highly dependend in which PCI slot several cards are attached, so I will report this here also. Phenomenon: After a while the machine was running, the SMP kernel reboots spontanously. This is when heavy IO is done, compiling or, when in the morning time our department gets up and our staff connects to the samba server. Dependend on which devices are switched on or off by BIOS, the kernel freezes at the stage when the amr0 RAID got recognized. I can avoid this by enabling the built in NIC (fxp0). I can force this by putting the em0 NIC into another slot, for instance in the one remaining 64BIT/66MHz slot (which should be a separate bus). This 'game' was identical to that I had with FreeBSD 4.X - 4.8 and I found out, that putting an additional NIC into PCI slot No. 2 (counted from AGP slot on) made things clear, but using both NICs together (either additional fxp0 or the new em0) remains the systems completely unstable. In FreeBSD 5.1-RELEASE-p2 and especially in FreeBSD 5.1-CURRENT this 'gambling' seems to reach its climax. My kernel is built up with SCHED_4BSD because SCHED_ULE and ADAPTIVE_MUTEXES crashes immediately the same way as described (running a while, then coredumping or freeze at the stage after the amr0-RAID showed up in the kernel boot messages, see the dmesg output below). I'm not an hardware expert, but all this wierd stuff looks like to me to be a IRQ routing problem. I fiddled around with many hand-assigned IRQ configurations, but nothing helped. Either the Intel 1000/Pro or the AMI RAID causing problems in the TYAN Thunder 2500 SMP environment. We have also a SMP machine with a similar hardware, based on an ASUS CV4X-D, AMI Elite 1600 RAID controller and the same Intel em0 1GBit NIC. OS is FreeBSD 4.8 and this system never had any problem! I feel a little bit helpless this moment, because I think I tried every trick and something seems to be wrong with the combination TYAN Thunder 2500 and FreeBSD 5.X SMP. It is also very courios, that a kernel without SMP/IO_APIC freezes after booting at the same place (amr0 RAID recognition). Is there any help outside? I attach the kernel config file and the dmesg output. Please note: I disabled both serial ports, the parallel port, sound and usb to get additional IRQs. But I have to enable the built in NIC to get a bootable, but instable FreeBSD 5.1-R box. ==================================== DMESG output ==================================== Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.1-RELEASE-p2 #14: Wed Aug 13 09:47:00 CEST 2003 root@atmos.physik.uni-mainz.de:/usr/obj/usr/src/sys/ATMOS Preloaded elf kernel "/boot/kernel/kernel" at 0xc0458000. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 868644793 Hz CPU: Intel Pentium III (868.64-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> real memory = 2147483648 (2048 MB) avail memory = 2085625856 (1989 MB) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 3, version: 0x000f0011, at 0xfec01000 netsmb_dev: loaded Pentium Pro MTRR support enabled npx0: <math processor> on motherboard npx0: INT 16 interface pcibios: BIOS version 2.10 Using $PIR table, 12 entries at 0xc00fdf00 pcib0: <Host to PCI bridge> at pcibus 0 on motherboard pci0: <PCI bus> on pcib0 IOAPIC #1 intpin 13 -> irq 2 IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 2 -> irq 17 IOAPIC #1 intpin 7 -> irq 18 pcib1: <PCI-PCI bridge> at device 0.1 on pci0 pci1: <PCI bus> on pcib1 IOAPIC #1 intpin 1 -> irq 19 pci1: <display, VGA> at device 0.0 (no driver attached) sym0: <896> port 0xf800-0xf8ff mem 0xfeafe000-0xfeafffff,0xfeafac00-0xfeafafff irq 2 at device 1.0 on pci0 sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym1: <896> port 0xf400-0xf4ff mem 0xfeafc000-0xfeafdfff,0xfeafa800-0xfeafabff irq 16 at device 1.1 on pci0 sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. em0: <Intel(R) PRO/1000 Network Connection, Version - 1.5.31> port 0xfcc0-0xfcff mem 0xfeac0000-0xfeadffff irq 17 at device 4.0 on pci0 em0: Speed:1000 Mbps Duplex:Full fxp0: <Intel 82557/8/9 EtherExpress Pro/100(B) Ethernet> port 0xfc40-0xfc7f mem 0xfe900000-0xfe9fffff,0xfeaf9000-0xfeaf9fff irq 18 at device 7.0 on pci0 fxp0: Ethernet address 00:e0:81:00:f0:d7 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isab0: <PCI-ISA bridge> port 0x500-0x50f at device 15.0 on pci0 isa0: <ISA bus> on isab0 pci0: <mass storage, ATA> at device 15.1 (no driver attached) pcib2: <ServerWorks host to PCI bridge> at pcibus 2 on motherboard pci2: <PCI bus> on pcib2 pcib3: <PCI-PCI bridge> at device 2.0 on pci2 pci3: <PCI bus> on pcib3 IOAPIC #1 intpin 11 -> irq 20 IOAPIC #1 intpin 8 -> irq 21 pcib4: <PCI-PCI bridge> at device 0.0 on pci3 pci4: <PCI bus> on pcib4 IOAPIC #1 intpin 10 -> irq 22 amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf3ffffff irq 22 at device 0.0 on pci4 amr0: <LSILogic MegaRAID Enterprise 1600> Firmware G170, BIOS F316, 64MB RAM pci3: <mass storage, SCSI> at device 1.0 (no driver attached) pci3: <mass storage, SCSI> at device 2.0 (no driver attached) orm0: <Option ROMs> at iomem 0xca000-0xcdfff,0xc0000-0xc9fff on isa0 fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <8 virtual consoles, flags=0x300> sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 8250 or not responding sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled ppc0: parallel port not found. unknown: <PNP0303> can't assign resources (port) psmcpnp0: irq resource info is missing; assuming irq 12 unknown: <PNP0700> can't assign resources (port) ppc1: parallel port not found. APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 Timecounters tick every 1.000 msec ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited DUMMYNET initialized (011031) Waiting 5 seconds for SCSI devices to settle (noperiph:sym0:0:-1:-1): SCSI BUS reset delivered. (noperiph:sym1:0:-1:-1): SCSI BUS reset delivered. amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 245014MB (501788672 sectors) RAID 5 (optimal) ===> freezing here! sa0 at sym1 bus 0 target 5 lun 0 sa0: <HP C5713A H910> Removable Sequential Access SCSI-2 device sa0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) ch0 at sym1 bus 0 target 5 lun 1 ch0: <HP C5713A H910> Removable Changer SCSI-2 device ch0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit) ch0: 6 slots, 1 drive, 0 pickers, 0 portals SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/amrd0s1a cd0 at sym0 bus 0 target 3 lun 0 cd0: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 16) cd0: Attempt to query device size failed: NOT READY, Medium not present ======================== KERNEL config file ======================== machine i386 cpu I686_CPU ident ATMOS options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O maxusers 0 hints "ATMOS.hints" #Default places to look for devices. #options COMPAT_FREEBSD4 options SCHED_4BSD #4BSD scheduler #options SCHED_ULE #options ADAPTIVE_MUTEXES #options PQ_CACHESIZE=256 options CPU_ENABLE_SSE options CLK_USE_TSC_CALIBRATION #options HZ=1000 #makeoptions CONF_CFLAGS=-fno-builtin #options MAXDSIZ=(1024UL*1024*1024) #options MAXSSIZ=(128UL*1024*1024) #options DFLDSIZ=(1024UL*1024*1024) options GEOM_AES options GEOM_APPLE options GEOM_BDE options GEOM_BSD options GEOM_GPT options GEOM_MBR options GEOM_PC98 options GEOM_SUNLABEL options GEOM_VOL options ROOTDEVNAME=\"ufs:amrd0s1a\" options INET #InterNETworking #options INET6 #IPv6 communications protocols options FFS #Berkeley Fast Filesystem options SOFTUPDATES #Enable FFS soft updates support options UFS_ACL #Support for access control lists options UFS_DIRHASH #Improve performance on big directories options NFSCLIENT #Network Filesystem Client options NFSSERVER #Network Filesystem Server options MSDOSFS #MSDOS Filesystem options CD9660 #ISO 9660 Filesystem options PROCFS #Process filesystem (requires PSEUDOFS) options PSEUDOFS #Pseudo-filesystem framework options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=5000 #Delay (in ms) before probing SCSI options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores options NETSMB options NETSMBCRYPTO options LIBMCHAIN options LIBICONV #options WATCHDOG options NETGRAPH #options NETGRAPH_ASYNC #options NETGRAPH_BPF #options NETGRAPH_BRIDGE #options NETGRAPH_CISCO #options NETGRAPH_ECHO #options NETGRAPH_ETHER #options NETGRAPH_FRAME_RELAY #options NETGRAPH_GIF #options NETGRAPH_GIF_DEMUX #options NETGRAPH_HOLE #options NETGRAPH_IFACE #options NETGRAPH_IP_INPUT #options NETGRAPH_KSOCKET #options NETGRAPH_L2TP #options NETGRAPH_LMI #options NETGRAPH_MPPC_ENCRYPTION #options NETGRAPH_ONE2MANY #options NETGRAPH_PPP #options NETGRAPH_PPPOE #options NETGRAPH_PPTPGRE #options NETGRAPH_RFC1490 #options NETGRAPH_SOCKET #options NETGRAPH_SPLIT #options NETGRAPH_TEE #options NETGRAPH_TTY #options NETGRAPH_UI #options NETGRAPH_VJC options MROUTING options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_FORWARD #options IPFIREWALL_VERBOSE_LIMIT=100 #options IPFIREWALL_DEFAULT_TO_ACCEPT #options IPV6FIREWALL #options IPV6FIREWALL_VERBOSE #options IPV6FIREWALL_VERBOSE_LIMIT=100 #options IPV6FIREWALL_DEFAULT_TO_ACCEPT options IPDIVERT #options IPFILTER #options IPFILTER_LOG #options IPFILTER_DEFAULT_BLOCK options IPSTEALTH options RANDOM_IP_ID options ACCEPT_FILTER_DATA #options ACCEPT_FILTER_HTTP options TCP_DROP_SYNFIN options DUMMYNET #options BRIDGE options QUOTA options _KPOSIX_PRIORITY_SCHEDULING options P1003_1B_SEMAPHORES #options MAC #options MAC_BIBA #options MAC_BSDEXTENDED #options MAC_DEBUG #options MAC_IFOFF #options MAC_LOMAC #options MAC_MLS #options MAC_NONE #options MAC_PARTITION #options MAC_SEEOTHERUIDS #options MAC_TEST options KBD_INSTALL_CDEV # install a CDEV entry in /dev device isa #options AUTO_EOI_1 device pci device agp # Floppy drives device fdc # SCSI Controllers device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') #device ahc # SCSI peripherals device scbus # SCSI bus (required) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers device amr # AMI MegaRAID #options CHANGER_MIN_BUSY_SECONDS=2 #options CHANGER_MAX_BUSY_SECONDS=10 #options SA_IO_TIMEOUT=4 #options SA_SPACE_TIMEOUT=60 #options SA_REWIND_TIMEOUT=(2*60) #options SA_ERASE_TIMEOUT=(4*60) #options SA_1FM_AT_EOD #options SCSI_PT_DEFAULT_TIMEOUT=60 options SES_ENABLE_PASSTHROUGH # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard options ATKBD_DFLT_KEYMAP makeoptions ATKBD_DFLT_KEYMAP=us.iso device psm # PS/2 mouse device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc options MAXCONS=8 #options SC_ALT_MOUSE_IMAGE options SC_DFLT_FONT makeoptions SC_DFLT_FONT=cp850 options SC_DISABLE_DDBKEY options SC_DISABLE_REBOOT options SC_HISTORY_SIZE=512 #options SC_MOUSE_CHAR=0x3 options SC_PIXEL_MODE options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_NORM_REV_ATTR=(FG_YELLOW|BG_GREEN) options SC_KERNEL_CONS_ATTR=(FG_RED|BG_BLACK) options SC_KERNEL_CONS_REV_ATTR=(FG_BLACK|BG_RED) #options SC_CUT_SPACES2TABS #options SC_CUT_SEPCHARS=\"x09\" #options SC_TWOBUTTON_MOUSE #options SC_NO_CUTPASTE #options SC_NO_FONT_LOADING #options SC_NO_HISTORY #options SC_NO_SYSMOUSE #options SC_NO_SUSPEND_VTYSWITCH device npx #device pmtimer #device sio # 8250, 16[45]50 based serial ports # Parallel port #device ppc #device ppbus # Parallel port bus (required) #device lpt # Printer #device plip # TCP/IP over parallel #device ppi # Parallel port interface device #device vpo # Requires scbus and da device miibus # MII bus support device em #device fxp # Intel EtherExpress PRO/100B (82557, 82558) device random # Entropy device device loop # Network loopback device ether # Ethernet support #device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) #device gif # IPv6 and IPv4 tunneling #device faith # IPv6-to-IPv4 relaying (translation) device bpf # Berkeley packet filter ------------------ Thanks a lot for your help, Oliver -- MfG O. Hartmann ohartman@mail.physik.uni-mainz.de ------------------------------------------------------------------ Systemadministration des Institutes fuer Physik der Atmosphaere (IPA) ------------------------------------------------------------------ Johannes Gutenberg Universitaet Mainz Becherweg 21 55099 Mainz Tel: +496131/3924662 (Maschinenraum) Tel: +496131/3924144 (Buero) FAX: +496131/3923532
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030813103509.Q49991>