Date: Wed, 28 Feb 2007 12:21:08 -0500 From: alex@schnarff.com To: freebsd-questions@freebsd.org Cc: Jean Lagarde <jean.lagarde@gmail.com> Subject: Stability Issues on 5.4-RELEASE Box Message-ID: <20070228122108.bhd56o5wn4ss8c4g@mail.schnarff.com>
next in thread | raw e-mail | index | archive | help
Hello All, I've recently fallen into the task of administering a FreeBSD 5.4-RELEASE box that acts as the web server for a small non-profit that I volunteer for. Unfortunately, the system has been having some extremely vexing stability issues over the last month or so, which even my 6+ years of experience as an OpenBSD admin have not helped me track down. First things first, let me say explicitly that I'm not trying to say "FreeBSD sucks, it's not stable" or anything like that. It's a fine OS, and I'm sure that it's either faulty hardware or a misconfiguration of some sort causing these problems. :-) That said, here are some of the symptoms the box has been experiencing: * Occasional random reboots. I've only personally witnessed one, and they don't happen often, but any time a *NIX box just reboots for no apparent reason (there was no indication of a problem in any of the logs, at least that I could see), something really bad is going on. * Random extreme slowness when logging in via SSH, with the time to get a shell ranging from a second or two all the way up to 80 seconds. The box isn't busy enough that it's just slow due to load (especially since, once you're in, things fly), and it's not just a reverse DNS issue like I've seen on OpenBSD (this occurs even when logging in from locations listed in /etc/hosts that resolve properly out of that file). Until I upgraded to the current version of OpenSSL/OpenSSH, the box would occasionally just become unresponsive altogether over SSH, not allowing logins for 15+ minutes at a time. * Issues with files that are not found on startup sometimes, but are other times. Prime example: the Zope CMS system that's been installed failed to find libmysqlclient.so after a planned soft reboot, but found it with no trouble on a subsequent boot a few minutes later, with no config changes in between. * A warning in /var/log/messages that the root filesystem was full, when it was at 60% capacity (and something like 2% inode capacity); the problem has yet to repeat, though no files have been cleared off of that filesystem. * Random crashes of the Zope/Plone system that's running the main part of the web site. While I realize that, in and of itself, this means nothing about the stability of the underlying OS, in the context of all of the other things going on (as well as the fact that the Zope list has been unable to help figure out why it's crashing), it seems like it might be further evidence of a larger problem. Thus far, besides simply scanning log files, constantly watching "top" and "ps", etc., I've not been able to do much with the box. As I said, I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as the firewall (there was none before I arrived...don't even get me started on that). This weekend the guy who was the previous admin will be running a Memtest for me and disabling hyperthreading (which there's no performance justification for, and which has caused me stability issues at least on Linux in the past), since the server is in Oregon and I'm in the DC area. That's about the extent of what I've been able to do to date, since this is a production box. What I'd like to know from you guys is: * Am I justified in suspecting hyperthreading as a potential cause of instability? * Does 5.4-RELEASE have any known bugs that might cause stability issues like the ones I've described here? More importantly, would an upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of being generally more stable and/or having better hardware support? Would such an upgrade be possible/relatively painless to perform without being physically at a console, as has been the case with OpenBSD over the years? * Given my dmesg below, do you see any specific problems? * Do you have any other suggestions for debugging this problem? Thanks in advance for any help you can provide. :-) Alex Kirk dmesg: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RELEASE #0: Sun May 8 10:21:06 UTC 2005 root@harlow.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC ACPI APIC Table: <INTEL D945GTP > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3200.01-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 2137509888 (2038 MB) avail memory = 2086207488 (1989 MB) ioapic0: Changing APIC ID to 2 ioapic0 <Version 2.0> irqs 0-23 on motherboard npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <INTEL D945GTP> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Sleep Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <display, VGA> at device 2.0 (no driver attached) pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 28.2 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 28.3 on pci0 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 28.4 on pci0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 28.5 on pci0 pci5: <ACPI PCI bus> on pcib5 uhci0: <UHCI (generic) USB controller> port 0x2080-0x209f irq 23 at device 29.0 on pci0 usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x2060-0x207f irq 19 at device 29.1 on pci0 usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <UHCI (generic) USB controller> port 0x2040-0x205f irq 18 at device 29.2 on pci0 usb2: <UHCI (generic) USB controller> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <UHCI (generic) USB controller> port 0x2020-0x203f irq 16 at device 29.3 on pci0 usb3: <UHCI (generic) USB controller> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: <serial bus, USB> at device 29.7 (no driver attached) pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci6: <ACPI PCI bus> on pcib6 fxp0: <Intel 82550 Pro/100 Ethernet> port 0x1100-0x113f mem 0x88000000-0x8801ffff,0x88021000-0x88021fff irq 21 at device 0.0 on pci6 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:02:b3:d5:4d:3f ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0x1000-0x10ff mem 0x88020000-0x88020fff irq 22 at device 1.0 on pci6 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <GENERIC ATA controller> port 0x20b0-0x20bf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 atapci1: <GENERIC ATA controller> port 0x20a0-0x20af,0x20e8-0x20eb,0x20c0-0x20c7,0x20ec-0x20ef,0x20c8-0x20cf irq 19 at device 31.2 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) fdc0: <floppy drive controller> port 0x3f0,0x3f0-0x3f5 irq 6 drq 2 on acpi0 fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: <ISA Option ROMs> at iomem 0xcc800-0xccfff,0xcb000-0xcc7ff on isa0 pmtimer0 on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 RTC BIOS diagnostic error 80<clock_battery> Timecounter "TSC" frequency 3200012824 Hz quality 800 Timecounters tick every 10.000 msec acd0: CDRW <LITE-ON CD-RW SOHR-5239S/2S03> at ata0-slave PIO4 Interrupt storm detected on "irq19: uhci1+"; throttling interrupt source ad4: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata2-master UDMA33 ad5: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata2-slave UDMA33 ad6: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata3-master UDMA33 ad7: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata3-slave UDMA33 Waiting 15 seconds for SCSI devices to settle sa0 at ahc0 bus 0 target 6 lun 0 sa0: <SEAGATE DAT 9SP40-000 910B> Removable Sequential Access SCSI-3 device sa0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) Mounting root from ufs:/dev/ad4s1a IP Filter: v3.4.35 initialized. Default = pass all, Logging = enabled
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070228122108.bhd56o5wn4ss8c4g>