Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Feb 1998 15:24:01 -0500 (EST)
From:      Paul Sandys <myj@nyct.net>
To:        Steve Grandi <grandi@noao.edu>
Cc:        freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG
Subject:   Re: I need a strategy for making my STABLE installation stable
Message-ID:  <Pine.BSF.3.96.980223152013.26057A-100000@bsd3.nyct.net>
In-Reply-To: <Pine.LNX.3.96.980223121402.16854D-100000@mirfak.tuc.noao.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 23 Feb 1998, Steve Grandi wrote:

> Date: Mon, 23 Feb 1998 12:17:24 -0700 (MST)
> From: Steve Grandi <grandi@noao.edu>
> To: freebsd-stable@FreeBSD.ORG, freebsd-questions@FreeBSD.ORG
> Subject: Re: I need a strategy for making my STABLE installation stable
> 
> A progress report....
> 
> First, I have sent this to both questions and stable (where I started this
> thread a week ago).  If this is bad form, please rap my knuckles gently.
> 
> >From last week:
> 
> > My STABLE system hasn't been very stable: I've been averaging one system
> > crash a day for the past week or so.  The frequency of crashes is
> > increasing with perhaps one crash a week averaged the past 3 months.  I
> > need some help in devising a strategy to make things stable...
> > 
> > The hardware:  PentiumPro-200 (Venus Motherboard), 128 MB of RAM, Adaptec
> > 2940 Ultra-Wide SCSI controller, two Seagate ST32155W 2GB disks, a
> > Micropolis 3391WS 9GB disk, Plextor SCSI CD-ROM, Intel EtherExpress Pro
> > 10/100B Ethernet card. 
> > 
> > The System: FreeBSD 2.2.5-STABLE kept up-to-date via CVSUP
> > 
> > What's the system doing: DNS server, Sendmail server, FTP server, Net News
> > server.
> > 
> > Ever since I upgraded to 2.2.5-RELEASE in late November, I've seen far too
> > many system crashes.  About half the time, the crash would be followed by
> > a reboot.  The other half of the time the system would just hang with no
> > response from the console keyboard or active rlogin sessions (but
> > sometimes the system would still answer PINGs).  Crashes seemed to follow
> > heavy disk I/O and/or paging (usually soon after an INN expire with a
> > 200MB+ history file). 
> 
> > So what strategy should I follow to make the system stable and make the Users
> > happy again?
> 
> I received several very good pieces of advice and tried them out:
> unfortunately the crashes continued.  I have replaced the motherboard and
> P6-200, memory (putting in parity memory with ECC enabled in the motherboard
> BIOS), the SCSI disk controller, the Ethernet card and the video card.  All
> that remains from the original system is the power-supply, the keyboard and
> the three disks.  I have also turned off Ultra SCSI speed in the Adaptec card's
> BIOS.

I suggest you try different power supply. I also spent 3 months
replacing parts in one FreeBSD server with random crashes, and it ended up
being the SCSI cable, which worked for 2 years straight before .....

> 
> Two of the crashes in the past week have generated dumps; the rest were hard
> hangs.  Stack traces of these follow.  I have since realized that I need to
> generate a "debug" kernel so that variable names show up in the dumps; next
> time!
> 
> # strings kernel.0 | grep STABLE
> 2.2.5-STABLE
> @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998
> 
> # gdb -k kernel.0 vmcore.0
> Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)...
> IdlePTD 20c000
> current pcb at 1ef358
> panic: page fault
> #0  0xf01126f3 in boot ()
> (kgdb) where
> #0  0xf01126f3 in boot ()
> #1  0xf01129b2 in panic ()
> #2  0xf01b7526 in trap_fatal ()
> #3  0xf01b7014 in trap_pfault ()
> #4  0xf01b6cef in trap ()
> #5  0xf01b36f1 in pmap_qenter ()
> #6  0xf012d15e in allocbuf ()
> #7  0xf012caf8 in getblk ()
> #8  0xf0196f15 in ffs_sbupdate ()
> #9  0xf01969df in ffs_sync ()
> #10 0xf01324e7 in sync ()
> #11 0xf012d70f in vfs_update ()
> #12 0xf010921a in kproc_start ()
> #13 0xf01091b8 in main ()
> 
> 
> # strings kernel.1 | grep STABLE
> 2.2.5-STABLE
> @(#)FreeBSD 2.2.5-STABLE #0: Tue Feb 17 08:44:04 MST 1998
> 
> # gdb -k kernel.1 vmcore.1
> Copyright 1996 Free Software Foundation, Inc...(no debugging symbols found)...
> IdlePTD 20c000
> current pcb at 1ef358
> panic: vm_fork: u_map allocation failed
> #0  0xf01126f3 in boot ()
> (kgdb) where
> #0  0xf01126f3 in boot ()
> #1  0xf01129b2 in panic ()
> #2  0xf01b387d in pmap_new_proc ()
> #3  0xf01a2017 in vm_fork ()
> #4  0xf010d76a in fork1 ()
> #5  0xf010d2b0 in fork ()
> #6  0xf01b7763 in syscall ()
> #7  0x100482d5 in ?? ()
> #8  0x1095 in ?? ()
> 
> This last panic was a "killer:" the 9GB /var/spool/news partition was
> toasted beyond recovery (defined as a case where fsck -y runs for half
> an hour and looks like it will run forever...).  This happened before
> and the interesting thing is that the system stopped crashing while
> /var/spool/news filled up again!  Indeed, the system hasn't crashed in the
> three days since this last "killer" crash while the mean-time-to-crash
> in the prior week was less than a day.
> 
> Based on all of this, I have made two suppositions:
> 
> 1) I have a software problem and not a hardware problem.
> 
> 2) Something about a "fully loaded" inn netnews system (7 GB of articles, a
> history file exceeding 120 MB, etc.) tickles a bug in FreeBSD that causes the
> observed hangs and panics.
> 
> Hence, I expect that my system will run fine for another few days until
> netnews fills up a good fraction of /var/spool/news.
> 
> My plan is to do the following:
> 
> 1) Wait for the next crash and capture a crash dump against the debug kernel.
> 
> 2) After capturing said crash dump; install on the machine an old kernel
> "borrowed" from another machine: FreeBSD 2.2-970422-RELENG.  Since my
> memory tells me that when this pre-2.2.2 system was running on the machine
> in question, the system stayed up for months despite a fully loaded inn.
> 
> Can anyone think of other things to try or a useful strategy to follow?
> 
> Steve Grandi, National Optical Astronomy Observatories, Tucson, Arizona USA
> Internet: grandi@noao.edu  Voice: +1 520 318-8228
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
> 

P.

<-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_->
< myj@nyct.net   Paul Sandys   |   New York Connect   http://www.nyct.net >
< network operations manager   |   Total Solution provider                >
<------------------------------------------------------------------------->
<         " BRINGING NEW YORK THE INTERNET SERVICES IT DESERVES "         >
<-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_->


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.980223152013.26057A-100000>