Date: Mon, 14 Nov 2005 08:33:03 +0100 (CET) From: "Rutger Bevaart" <rutger.bevaart@illian.net> To: freebsd-smp@freebsd.org Subject: Re: FreeBSD unstable on Dell 1750 using SMP? Message-ID: <10868.62.58.16.80.1131953583.squirrel@62.58.16.80>
next in thread | raw e-mail | index | archive | help
hello list, Our Dell 1750's and 1850's are still giving me headaches. Symptom: Our Dell PE1750 will reboot at random intervals (between 3 and 130 days), seemingly unrelated to the load of the system. Config: Dell PE1750, dual 3.06 Xeon (533FSB/512Kb), 2x512MB memory, Perc RAID (amr) with 3 drives in RAID5. No add-on cards. FreeBSD: Same problem on 5.3-RELEASE, 5.3-p3, 5.3-p-something and 5.4-p5. I disabled HTT in the BIOS but the machine rebooted 3 days later. Argh! Now I have no clue whatsoever on how to proceed. No kernel tweaks have been made and no strange software is running. No logs are written to /var/log/messages. Attached is my dmesg output. Anybody any clues on how to proceed? I like these boxes ;-) Thanks, Rutger --dmesg output > Copyright (c) 1992-2005 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 5.4-RELEASE-p5 #0: Sun Jul 24 15:57:47 CEST 2005 > root@darwin.illian.net:/usr/obj/usr/src/sys/darwin-smp > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3047.91-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > real memory = 1073573888 (1023 MB) > avail memory = 1041018880 (992 MB) > ACPI APIC Table: <DELL PE1750 > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 6 > ioapic0: Changing APIC ID to 8 > ioapic1: Changing APIC ID to 9 > ioapic2: Changing APIC ID to 10 > MADT: Forcing active-low polarity and level trigger for SCI > ioapic0 <Version 1.1> irqs 0-15 on motherboard > ioapic1 <Version 1.1> irqs 16-31 on motherboard > ioapic2 <Version 1.1> irqs 32-47 on motherboard > npx0: <math processor> on motherboard > npx0: INT 16 interface > acpi0: <DELL PE1750> on motherboard > acpi0: Power Button (fixed) > Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 > acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 > cpu0: <ACPI CPU> on acpi0 > cpu1: <ACPI CPU> on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pci0: <display, VGA> at device 14.0 (no driver attached) > atapci0: <ServerWorks CSB5 UDMA100 controller> port 0x8b0-0x8bf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 15.1 on pci0 > ata0: channel #0 on atapci0 > ata1: channel #1 on atapci0 > ohci0: <OHCI (generic) USB controller> mem 0xfe100000-0xfe100fff irq 11 at device 15.2 on pci0 > usb0: OHCI version 1.0, legacy support > usb0: SMM does not respond, resetting > usb0: <OHCI (generic) USB controller> on ohci0 > usb0: USB revision 1.0 > uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 4 ports with 4 removable, self powered > isab0: <PCI-ISA bridge> at device 15.3 on pci0 > isa0: <ISA bus> on isab0 > pcib1: <ACPI Host-PCI bridge> on acpi0 > pci4: <ACPI PCI bus> on pcib1 > amr0: <LSILogic MegaRAID 1.51> mem 0xfcd00000-0xfcd3ffff,0xf0000000-0xf7ffffff irq 18 at device 3.0 on pci4 > amr0: <LSILogic PERC 4/Di> Firmware 412W, BIOS H406, 128MB RAM > pcib2: <ACPI Host-PCI bridge> on acpi0 > pci3: <ACPI PCI bus> on pcib2 > pcib3: <ACPI Host-PCI bridge> on acpi0 > pci2: <ACPI PCI bus> on pcib3 > bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem 0xfcf20000-0xfcf2ffff,0xfcf30000-0xfcf3ffff irq 16 at device 0.0 on pci2 > miibus0: <MII bus> on bge0 > brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > bge0: Ethernet address: 00:11:43:5a:84:9d > bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2002> mem 0xfcf00000-0xfcf0ffff,0xfcf10000-0xfcf1ffff irq 17 at device 0.1 on pci2 > miibus1: <MII bus> on bge1 > brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto > bge1: Ethernet address: 00:11:43:5a:84:9e > pcib4: <ACPI Host-PCI bridge> on acpi0 > pci1: <ACPI PCI bus> on pcib4 > fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > sio0: type 16550A > orm0: <ISA Option ROMs> at iomem 0xec000-0xeffff,0xcb800-0xccfff,0xc8000-0xc8fff,0xc0000-0xc7fff on isa0 > pmtimer0 on isa0 > atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > kbd0 at atkbd0 > ppc0: parallel port not found. > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 10.000 msec > acd0: CDROM <SAMSUNG CD-ROM SN-124/N104> at ata1-master PIO4 > amrd0: <LSILogic MegaRAID logical drive> on amr0 > amrd0: 139760MB (286228480 sectors) RAID 5 (optimal) > ses0 at amr0 bus 0 target 6 lun 0 > ses0: <PE/PV 1x3 SCSI BP 1.1> Fixed Processor SCSI-2 device > ses0: SAF-TE Compliant Device > SMP: AP CPU #1 Launched! > Mounting root from ufs:/dev/amrd0s1a > WARNING: / was not properly dismounted > WARNING: /opt was not properly dismounted > /opt: mount pending error: blocks 108 files 1 > WARNING: /tmp was not properly dismounted > WARNING: /usr was not properly dismounted > WARNING: /usr/local was not properly dismounted > WARNING: /var was not properly dismounted > ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled --end of dmesg output > Suggest: Disable HT... > I had the same problem of unexplained server restarts on 5.3 last December > and strongly suspected a DMA issue or a dual channel memory paging foobar > when HT was enabled. The benefits of HT over simply running SMP on just > two physical CPUs are apparently zero in your case as loads seem light and > therefore spreading threads across more CPUs does not seem to be an > advantage. In short, turn hyperthreading off. In my case that's what I did > and the crashes (which also seemed to be about three days apart or so) > stopped. > > Good luck. > > best... Mike > > Message: 1 > Date: Fri, 15 Jul 2005 15:15:24 +0200 (CEST) > From: "Rutger Bevaart" <rutger.bevaart at illian.net> > Subject: FreeBSD unstable on Dell 1750 using SMP? > To: freebsd-smp at freebsd.org > Message-ID: <24434.193.172.18.3.1121433324.squirrel at 193.172.18.3> > Content-Type: text/plain;charset=iso-8859-1 > > > hello list, > > For the past year we've been running several Dell PowerEdge 1750 servers > on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons running > with HT enabled. This install has proven to be unstable in that the > machine will reboot between 3 days and 170 days without apparant reason. > No log is written. Other machines we have with a single CPU (HT enabled) > do not experience this problem. > > As it is present in both 4.x and 5.x and googling the last year has not > revealed similar experience I'm consulting this list. As all of these > machines are productions machines that have a continuous load (not heavly > load, but a light average - some peaks) it's not easy to experiment with > HT setting etc. I dislike driving to the datacenter for locked systems > with fubarred kernels ;-) > > The only error i've ever seen just before a reboot is "bge0: discard frame > w/o packet header" on the 5.3 machine. > > Any clues or help greatly appreciated! > > Regards > Rutger Bevaart > > > > ------------------------------ > > Message: 2 > Date: Fri, 15 Jul 2005 21:47:46 -0400 > From: Lucas Holt <luke at foolishgames.com> > Subject: Re: FreeBSD unstable on Dell 1750 using SMP? > To: Rutger Bevaart <rutger.bevaart at illian.net> > Cc: freebsd-smp at freebsd.org > Message-ID: <3713FA02-FDBB-4B24-A592-F55B7A485C26 at foolishgames.com> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > I can't speak for that config or network card, but I had a similar > experience with freebsd 5.2 and 5.3. I got unusual errors > occasionally for an rl nic and the machine randomly rebooted. > Sometimes nothing was logged. It turned out to be the network card. > I replaced the NIC with a 3com 3c905c TX and the problem went away. > > Its possible that the dell NICs are non standard and the driver isn't > handling them well. I've noticed problems with dell nics and > standard drivers in their other products (lattitude d800, etc). If > you really thought it was an SMP issue, I suppose you could compile > and run a non smp kernel as a test. > > Its also possible that "4" processor smp isn't as reliable as 2. I > have read about scalability issues in the past with large numbers of > cpus in freebsd. > > On Jul 15, 2005, at 9:15 AM, Rutger Bevaart wrote: > >> >> hello list, >> >> For the past year we've been running several Dell PowerEdge 1750 >> servers >> on FreeBSD 4.10, 4.11 and 5.3. All these machines have dual Xeons >> running >> with HT enabled. This install has proven to be unstable in that the >> machine will reboot between 3 days and 170 days without apparant >> reason. >> No log is written. Other machines we have with a single CPU (HT >> enabled) >> do not experience this problem. >> >> As it is present in both 4.x and 5.x and googling the last year has >> not >> revealed similar experience I'm consulting this list. As all of these >> machines are productions machines that have a continuous load (not >> heavly >> load, but a light average - some peaks) it's not easy to experiment >> with >> HT setting etc. I dislike driving to the datacenter for locked systems >> with fubarred kernels ;-) >> >> The only error i've ever seen just before a reboot is "bge0: >> discard frame >> w/o packet header" on the 5.3 machine. >> >> Any clues or help greatly appreciated! >> >> Regards >> Rutger Bevaart >> >> _______________________________________________ >> freebsd-smp at freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-smp >> To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org" >> > > > Lucas Holt > Luke at FoolishGames.com > ________________________________________________________ > FoolishGames.com (Jewel Fan Site) > JustJournal.com (Free blogging) > FoolishGames.net (Enemy Territory IoM site) > > Think PC.. in 2006 you can own an Apple PCintosh. Whats next, windows > works? > > > > ------------------------------ > > _______________________________________________ > freebsd-smp at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe at freebsd.org" > > End of freebsd-smp Digest, Vol 101, Issue 4 > ******************************************* > Rutger Bevaart :: illian.networks
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10868.62.58.16.80.1131953583.squirrel>