Date: Mon, 23 Feb 1998 12:17:24 -0700 (MST) From: Steve Grandi <grandi@noao.edu> To: freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG Subject: Re: I need a strategy for making my STABLE installation stable Message-ID: <Pine.LNX.3.96.980223121402.16854D-100000@mirfak.tuc.noao.edu> In-Reply-To: <Pine.LNX.3.96.980216115540.9052A-100000@mirfak.tuc.noao.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
A progress report.... First, I have sent this to both questions and stable (where I started this thread a week ago). If this is bad form, please rap my knuckles gently. >From last week: > My STABLE system hasn't been very stable: I've been averaging one system > crash a day for the past week or so. The frequency of crashes is > increasing with perhaps one crash a week averaged the past 3 months. I > need some help in devising a strategy to make things stable... > > The hardware: PentiumPro-200 (Venus Motherboard), 128 MB of RAM, Adaptec > 2940 Ultra-Wide SCSI controller, two Seagate ST32155W 2GB disks, a > Micropolis 3391WS 9GB disk, Plextor SCSI CD-ROM, Intel EtherExpress Pro > 10/100B Ethernet card. > > The System: FreeBSD 2.2.5-STABLE kept up-to-date via CVSUP > > What's the system doing: DNS server, Sendmail server, FTP server, Net News > server. > > Ever since I upgraded to 2.2.5-RELEASE in late November, I've seen far too > many system crashes. About half the time, the crash would be followed by > a reboot. The other half of the time the system would just hang with no > response from the console keyboard or active rlogin sessions (but > sometimes the system would still answer PINGs). Crashes seemed to follow > heavy disk I/O and/or paging (usually soon after an INN expire with a > 200MB+ history file). > So what strategy should I follow to make the system stable and make the Users > happy again? I received several very good pieces of advice and tried them out: unfortunately the crashes continued. I have replaced the motherboard and P6-200, memory (putting in parity memory with ECC enabled in the motherboard BIOS), the SCSI disk controller, the Ethernet card and the video card. All that remains from the original system is the power-supply, the keyboard and the three disks. I have also turned off Ultra SCSI speed in the Adaptec card's BIOS. Two of the crashes in the past week have generated dumps; the rest were hard hangs. Stack traces of these follow. I have since realized that I need to generate a "debug" kernel so that variable names show up in the dumps; next time! # strings kernel.0 | grep STABLE 2.2.5-STABLE @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998 # gdb -k kernel.0 vmcore.0 Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... IdlePTD 20c000 current pcb at 1ef358 panic: page fault #0 0xf01126f3 in boot () (kgdb) where #0 0xf01126f3 in boot () #1 0xf01129b2 in panic () #2 0xf01b7526 in trap_fatal () #3 0xf01b7014 in trap_pfault () #4 0xf01b6cef in trap () #5 0xf01b36f1 in pmap_qenter () #6 0xf012d15e in allocbuf () #7 0xf012caf8 in getblk () #8 0xf0196f15 in ffs_sbupdate () #9 0xf01969df in ffs_sync () #10 0xf01324e7 in sync () #11 0xf012d70f in vfs_update () #12 0xf010921a in kproc_start () #13 0xf01091b8 in main () # strings kernel.1 | grep STABLE 2.2.5-STABLE @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998 # gdb -k kernel.1 vmcore.1 Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... IdlePTD 20c000 current pcb at 1ef358 panic: vm_fork: u_map allocation failed #0 0xf01126f3 in boot () (kgdb) where #0 0xf01126f3 in boot () #1 0xf01129b2 in panic () #2 0xf01b387d in pmap_new_proc () #3 0xf01a2017 in vm_fork () #4 0xf010d76a in fork1 () #5 0xf010d2b0 in fork () #6 0xf01b7763 in syscall () #7 0x100482d5 in ?? () #8 0x1095 in ?? () This last panic was a "killer:" the 9GB /var/spool/news partition was toasted beyond recovery (defined as a case where fsck -y runs for half an hour and looks like it will run forever...). This happened before and the interesting thing is that the system stopped crashing while /var/spool/news filled up again! Indeed, the system hasn't crashed in the three days since this last "killer" crash while the mean-time-to-crash in the prior week was less than a day. Based on all of this, I have made two suppositions: 1) I have a software problem and not a hardware problem. 2) Something about a "fully loaded" inn netnews system (7 GB of articles, a history file exceeding 120 MB, etc.) tickles a bug in FreeBSD that causes the observed hangs and panics. Hence, I expect that my system will run fine for another few days until netnews fills up a good fraction of /var/spool/news. My plan is to do the following: 1) Wait for the next crash and capture a crash dump against the debug kernel. 2) After capturing said crash dump; install on the machine an old kernel "borrowed" from another machine: FreeBSD 2.2-970422-RELENG. Since my memory tells me that when this pre-2.2.2 system was running on the machine in question, the system stayed up for months despite a fully loaded inn. Can anyone think of other things to try or a useful strategy to follow? Steve Grandi, National Optical Astronomy Observatories, Tucson, Arizona USA Internet: grandi@noao.edu Voice: +1 520 318-8228 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.980223121402.16854D-100000>