From owner-freebsd-stable Mon Feb 23 12:24:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA05919 for freebsd-stable-outgoing; Mon, 23 Feb 1998 12:24:56 -0800 (PST) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from bsd3.nyct.net (bsd3.nyct.net [204.141.86.7]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA05784; Mon, 23 Feb 1998 12:24:14 -0800 (PST) (envelope-from myj@bsd3.nyct.net) Received: (from myj@localhost) by bsd3.nyct.net (8.8.8/8.8.5) id PAA26109; Mon, 23 Feb 1998 15:24:01 -0500 (EST) Date: Mon, 23 Feb 1998 15:24:01 -0500 (EST) From: Paul Sandys To: Steve Grandi cc: freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG Subject: Re: I need a strategy for making my STABLE installation stable In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk On Mon, 23 Feb 1998, Steve Grandi wrote: > Date: Mon, 23 Feb 1998 12:17:24 -0700 (MST) > From: Steve Grandi > To: freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG > Subject: Re: I need a strategy for making my STABLE installation stable > > A progress report.... > > First, I have sent this to both questions and stable (where I started this > thread a week ago). If this is bad form, please rap my knuckles gently. > > >From last week: > > > My STABLE system hasn't been very stable: I've been averaging one system > > crash a day for the past week or so. The frequency of crashes is > > increasing with perhaps one crash a week averaged the past 3 months. I > > need some help in devising a strategy to make things stable... > > > > The hardware: PentiumPro-200 (Venus Motherboard), 128 MB of RAM, Adaptec > > 2940 Ultra-Wide SCSI controller, two Seagate ST32155W 2GB disks, a > > Micropolis 3391WS 9GB disk, Plextor SCSI CD-ROM, Intel EtherExpress Pro > > 10/100B Ethernet card. > > > > The System: FreeBSD 2.2.5-STABLE kept up-to-date via CVSUP > > > > What's the system doing: DNS server, Sendmail server, FTP server, Net News > > server. > > > > Ever since I upgraded to 2.2.5-RELEASE in late November, I've seen far too > > many system crashes. About half the time, the crash would be followed by > > a reboot. The other half of the time the system would just hang with no > > response from the console keyboard or active rlogin sessions (but > > sometimes the system would still answer PINGs). Crashes seemed to follow > > heavy disk I/O and/or paging (usually soon after an INN expire with a > > 200MB+ history file). > > > So what strategy should I follow to make the system stable and make the Users > > happy again? > > I received several very good pieces of advice and tried them out: > unfortunately the crashes continued. I have replaced the motherboard and > P6-200, memory (putting in parity memory with ECC enabled in the motherboard > BIOS), the SCSI disk controller, the Ethernet card and the video card. All > that remains from the original system is the power-supply, the keyboard and > the three disks. I have also turned off Ultra SCSI speed in the Adaptec card's > BIOS. I suggest you try different power supply. I also spent 3 months replacing parts in one FreeBSD server with random crashes, and it ended up being the SCSI cable, which worked for 2 years straight before ..... > > Two of the crashes in the past week have generated dumps; the rest were hard > hangs. Stack traces of these follow. I have since realized that I need to > generate a "debug" kernel so that variable names show up in the dumps; next > time! > > # strings kernel.0 | grep STABLE > 2.2.5-STABLE > @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998 > > # gdb -k kernel.0 vmcore.0 > Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... > IdlePTD 20c000 > current pcb at 1ef358 > panic: page fault > #0 0xf01126f3 in boot () > (kgdb) where > #0 0xf01126f3 in boot () > #1 0xf01129b2 in panic () > #2 0xf01b7526 in trap_fatal () > #3 0xf01b7014 in trap_pfault () > #4 0xf01b6cef in trap () > #5 0xf01b36f1 in pmap_qenter () > #6 0xf012d15e in allocbuf () > #7 0xf012caf8 in getblk () > #8 0xf0196f15 in ffs_sbupdate () > #9 0xf01969df in ffs_sync () > #10 0xf01324e7 in sync () > #11 0xf012d70f in vfs_update () > #12 0xf010921a in kproc_start () > #13 0xf01091b8 in main () > > > # strings kernel.1 | grep STABLE > 2.2.5-STABLE > @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998 > > # gdb -k kernel.1 vmcore.1 > Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)... > IdlePTD 20c000 > current pcb at 1ef358 > panic: vm_fork: u_map allocation failed > #0 0xf01126f3 in boot () > (kgdb) where > #0 0xf01126f3 in boot () > #1 0xf01129b2 in panic () > #2 0xf01b387d in pmap_new_proc () > #3 0xf01a2017 in vm_fork () > #4 0xf010d76a in fork1 () > #5 0xf010d2b0 in fork () > #6 0xf01b7763 in syscall () > #7 0x100482d5 in ?? () > #8 0x1095 in ?? () > > This last panic was a "killer:" the 9GB /var/spool/news partition was > toasted beyond recovery (defined as a case where fsck -y runs for half > an hour and looks like it will run forever...). This happened before > and the interesting thing is that the system stopped crashing while > /var/spool/news filled up again! Indeed, the system hasn't crashed in the > three days since this last "killer" crash while the mean-time-to-crash > in the prior week was less than a day. > > Based on all of this, I have made two suppositions: > > 1) I have a software problem and not a hardware problem. > > 2) Something about a "fully loaded" inn netnews system (7 GB of articles, a > history file exceeding 120 MB, etc.) tickles a bug in FreeBSD that causes the > observed hangs and panics. > > Hence, I expect that my system will run fine for another few days until > netnews fills up a good fraction of /var/spool/news. > > My plan is to do the following: > > 1) Wait for the next crash and capture a crash dump against the debug kernel. > > 2) After capturing said crash dump; install on the machine an old kernel > "borrowed" from another machine: FreeBSD 2.2-970422-RELENG. Since my > memory tells me that when this pre-2.2.2 system was running on the machine > in question, the system stayed up for months despite a fully loaded inn. > > Can anyone think of other things to try or a useful strategy to follow? > > Steve Grandi, National Optical Astronomy Observatories, Tucson, Arizona USA > Internet: grandi@noao.edu Voice: +1 520 318-8228 > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > P. <-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-> < myj@nyct.net Paul Sandys | New York Connect http://www.nyct.net > < network operations manager | Total Solution provider > <-------------------------------------------------------------------------> < " BRINGING NEW YORK THE INTERNET SERVICES IT DESERVES " > <-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message