Date: Tue, 27 Sep 2005 15:12:31 -0400 (EDT) From: Rob Watt <rob@hudson-trading.com> To: Robert Watson <rwatson@FreeBSD.org> Cc: Rob Watt <rob@hudson-trading.com>, freebsd-hackers@FreeBSD.org, mikep@hudson-trading.com, freebsd-amd64@FreeBSD.org, Jason Carroll <jason@hudson-trading.com> Subject: Re: freebsd-5.4-stable panics Message-ID: <20050927140535.G50334@daemon.mistermishap.net> In-Reply-To: <20050925115912.H11229@fledge.watson.org> References: <da4a53d805092310237d732554@mail.gmail.com> <20050925115912.H11229@fledge.watson.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Sun, 25 Sep 2005, Robert Watson wrote: > > On Fri, 23 Sep 2005, Jason Carroll wrote: > 5B > > There seem to be 2 types of crashes we see with pretty different stack > > traces. What I'll call a type 1 crash, I believe, is often caused by > > one of the triggers I mention above. A type 2 crash appears to happen > > spontaneously after the machine has been running for a while. > > > > I poked around using kgdb in a core file from a type 2 crash, and it > > appeared the system hung closing sockets (specifically cleaning up > > multicast state i think) while cleaning up one of our multicast > > applications (note the trace through sys_exit). There's no reason this > > application should have been exiting unless it encountered some kind of > > error. > > Sounds nasty. It's possible the two panics are related, especially if > they involve a race in the multicast code, which could result in treading > on other kernel memory, potentially leading to the thread related panic. > My leaning would be that they are unrelated, but since we may be able to > eliminate the multicast one (see below), that would be a good starting > point. > > There are some other known stability nits in 6.x which are being worked > on, but in general the network stack stability is higher in 6.x than 5.x > when it comes to multicast due to the work I reference above. If you run > into any stability problems relating to the file system, set > debug.mpsafevfs=0 in loader.conf -- there are a few bug fixes relating to > running out of disk space or hitting quota limits that are fixed in HEAD, > but not yet backported to 6.x. Robert, Thanks for your quick response and suggestions. We have now experienced an additional type of crash. Type 3 is from 6.0-BETA5, it did not enter the debugger at all and we could not generate a core. Unfortunately the 6-BETA crash was completely different from everything we've seen so far. The panic was related to a page fault and 'top' was the active process. We are trying again to run our tests on 6.0, but if we keep encountering other bugs, then those other bugs may prevent us from determining if multicast is the problem. We also ran our applications in 5-STABLE without reading from or writing to disk (ie we ran the multicast data streams on a remote machine, and we told our listener/rebroadcaster apps not to write to disk). In this configuration we were able to run for 4 days without crashing. A few hours before the crash we had introduced disk activity (bonnie in a constant loop with 1G test file size). This crash was a type 1, and we were not able to save a core. The longest we had gone before without a crash was 6 hours, so it is possible that either load, or disk activity help trigger the bugs we have seen. files attached: kernel-conf.txt (6.0 kernel) type3-core.txt (copy of panic output to console) We will update you with more info from our 6.0 tests when we have it. We are in a bind right now. All modern hardware (ie emt64/amd64) only seems to work with versions of freebsd that aren't stable when running our applications. Many vendors do not even sell server hardware that is purely i386. We never encountered these types of problems on freebsd 4.x, and many of our 120+ i386 class machines that are running 4.x are showing their age and need to be replaced. Assuming that the problems we are experiencing are purely related to ths OS, we now don't have an OS to run on the newer hardware we've been buying. We really need to find a way to patch these problems or find a version of freebsd that supports our platform and is stable. Obviously we appreciate the hard work that all of you on the freebsd team do, and we are happy to do whatever we can to help squash these bugs. - Rob Watt [-- Attachment #2 --] # # GENERIC -- Generic kernel configuration file for FreeBSD/amd64 # # For more information on this file, please read the handbook section on # Kernel Configuration Files: # # http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.421.2.11.2.1 2005/04/09 17:28:37 kensmith Exp $ machine amd64 cpu HAMMER ident CUSTOM # To statically compile in device wiring instead of /boot/device.hints #hints "GENERIC.hints" # Default places to look for devices. makeoptions DEBUG=-g options KDB options DDB options BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN #makeoptions COPTFLAGS="-O -frename-registers -pipe" #options SCHED_ULE # ULE scheduler options SCHED_4BSD # 4BSD scheduler options INET # InterNETworking options INET6 # IPv6 communications protocols options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFS_ROOT # NFS usable as /, requires NFSCLIENT options NTFS # NT File System options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_GPT # GUID Partition Tables. options COMPAT_43 # Needed by COMPAT_LINUX32 options COMPAT_IA32 # Compatible with i386 binaries options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_LINUX32 # Compatible with i386 linux binaries options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. options ADAPTIVE_GIANT # Giant mutex is adaptive. options PREEMPTION # Enable kernel thread preemption options SMP # Workarounds for some known-to-be-broken chipsets (nVidia nForce3-Pro150) device atpic # 8259A compatability # Enabling NO_MIXED_MODE gives a performance improvement on some motherboards # but does not work with some boards (mostly nVidia chipset based). #options NO_MIXED_MODE # Don't penalize working chipsets # Linux 32-bit ABI support options LINPROCFS # Cannot be a module yet. # Bus support. Do not remove isa, even if you have no isa slots device acpi device isa device pci # Floppy drives device fdc # ATA and ATAPI devices device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives device atapist # ATAPI tape drives options ATA_STATIC_ID # Static device numbering # SCSI Controllers device ahc # AHA2940 and onboard AIC7xxx devices device ahd # AHA39320/29320 and onboard AIC79xx devices #device amd # AMD 53C974 (Tekram DC-390(T)) #device isp # Qlogic family #device ispfw # Firmware for QLogic HBAs- normally a module #device mpt # LSI-Logic MPT-Fusion #device ncr # NCR/Symbios Logic #device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') #device trm # Tekram DC395U/UW/F DC315U adapters #device adv # Advansys SCSI adapters #device adw # Advansys wide SCSI adapters device aic # Adaptec 15[012]x SCSI adapters, AIC-6[23]60. #device bt # Buslogic/Mylex MultiMaster SCSI adapters # SCSI peripherals device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers interfaced to the SCSI subsystem #device amr # AMI MegaRAID #device arcmsr # Areca SATA II RAID #device ciss # Compaq Smart RAID 5* #device dpt # DPT Smartcache III, IV - See NOTES for options #device iir # Intel Integrated RAID #device ips # IBM (Adaptec) ServeRAID #device mly # Mylex AcceleRAID/eXtremeRAID #device twa # 3ware 9000 series PATA/SATA RAID # RAID controllers device aac # Adaptec FSA RAID device aacp # SCSI passthrough for aac (requires CAM) #device ida # Compaq Smart RAID #device mlx # Mylex DAC960 family #XXX pointer/int warnings #device pst # Promise Supertrak SX6000 #device twe # 3ware ATA RAID # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc # PCCARD (PCMCIA) support # PCMCIA and cardbus bridge support #device cbb # cardbus (yenta) bridge #device pccard # PC Card (16-bit) bus #device cardbus # CardBus (32-bit) bus # Serial (COM) ports device sio # 8250, 16[45]50 based serial ports # Parallel port device ppc device ppbus # Parallel port bus (required) device lpt # Printer #device plip # TCP/IP over parallel device ppi # Parallel port interface device #device vpo # Requires scbus and da # If you've got a "dumb" serial or parallel PCI card that is # supported by the puc(4) glue driver, uncomment the following # line to enable it (connects to the sio and/or ppc drivers): #device puc # PCI Ethernet NICs. #device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel PRO/1000 adapter Gigabit Ethernet Card #device ixgb # Intel PRO/10GbE Ethernet Card #device txp # 3Com 3cR990 (``Typhoon'') #device vx # 3Com 3c590, 3c595 (``Vortex'') # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support #device bfe # Broadcom BCM440x 10/100 Ethernet device bge # Broadcom BCM570xx Gigabit Ethernet #device dc # DEC/Intel 21143 and various workalikes device fxp # Intel EtherExpress PRO/100B (82557, 82558) #device lge # Level 1 LXT1001 gigabit Ethernet #device nge # NatSemi DP83820 gigabit Ethernet #device pcn # AMD Am79C97x PCI 10/100 (precedence over 'lnc') #device re # RealTek 8139C+/8169/8169S/8110S #device rl # RealTek 8129/8139 #device sf # Adaptec AIC-6915 (``Starfire'') #device sis # Silicon Integrated Systems SiS 900/SiS 7016 #device sk # SysKonnect SK-984x & SK-982x gigabit Ethernet #device ste # Sundance ST201 (D-Link DFE-550TX) #device ti # Alteon Networks Tigon I/II gigabit Ethernet #device tl # Texas Instruments ThunderLAN #device tx # SMC EtherPower II (83c170 ``EPIC'') #device vge # VIA VT612x gigabit Ethernet #device vr # VIA Rhine, Rhine II #device wb # Winbond W89C840F #device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # ISA Ethernet NICs. pccard NICs included. #device cs # Crystal Semiconductor CS89x0 NIC # 'device ed' requires 'device miibus' # XXX kvtop brokenness, pointer/int warnings #device ed # NE[12]000, SMC Ultra, 3c503, DS8390 cards #device ex # Intel EtherExpress Pro/10 and Pro/10+ #device ep # Etherlink III based cards #device fe # Fujitsu MB8696x based cards # XXX kvtop brokenness, pointer/int warnings #device lnc # NE2100, NE32-VL Lance Ethernet cards #device sn # SMC's 9000 series of Ethernet chips #device xe # Xircom pccard Ethernet # Wireless NIC cards #device wlan # 802.11 support #device an # Aironet 4500/4800 802.11 wireless NICs. #device awi # BayStack 660 and others #device wi # WaveLAN/Intersil/Symbol 802.11 wireless NICs. # Pseudo devices. device loop # Network loopback device mem # Memory and kernel memory devices device io # I/O device device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface #device ehci # EHCI PCI->USB interface (USB 2.0) device usb # USB Bus (required) #device udbp # USB Double Bulk Pipe devices device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse #device urio # Diamond Rio 500 MP3 player #device uscanner # Scanners # USB Ethernet, requires mii #device aue # ADMtek USB Ethernet #device axe # ASIX Electronics USB Ethernet #device cdce # Generic USB over Ethernet #device cue # CATC USB Ethernet #device kue # Kawasaki LSI USB Ethernet #device rue # RealTek RTL8150 USB Ethernet # FireWire support #device firewire # FireWire bus code #device sbp # SCSI over FireWire (Requires scbus and da) #device fwe # Ethernet over FireWire (non-standard!) options IPFIREWALL options IPFIREWALL_VERBOSE [-- Attachment #3 --] kernel trap 12 with interrupts disabled fatal trap 12: page fault while in kernel mode cpuid=3; apicid=03 fault virtual address = 03 fault code = supervisor read, page not present instruction pointer = 0x8:ffffffff803b88ca stack pointer = 0x10:ffffffffb6639490 frame pointer = 0x10:ffffffffb66394f0 code segment = base 0x0; limit 0xfffff, type=0x1b = DPL=0, pres 1, long 1, def32=0, gran 1 processor eflags = resume, IOPL=0 current process = 48628 (top) did not enter DDB or generate core filehelp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050927140535.G50334>
