From owner-freebsd-current  Sun Nov  8 13:26:24 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id NAA23583
          for freebsd-current-outgoing; Sun, 8 Nov 1998 13:26:24 -0800 (PST)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA23577
          for <current@FreeBSD.ORG>; Sun, 8 Nov 1998 13:26:23 -0800 (PST)
          (envelope-from jfieber@fallout.campusview.indiana.edu)
Received: from localhost (jfieber@localhost)
	by fallout.campusview.indiana.edu (8.9.1/8.9.1) with ESMTP id QAA18801;
	Sun, 8 Nov 1998 16:25:53 -0500 (EST)
Date: Sun, 8 Nov 1998 16:25:53 -0500 (EST)
From: John Fieber <jfieber@indiana.edu>
To: Eivind Eklund <eivind@yes.no>
cc: current@FreeBSD.ORG
Subject: Re: The infamous dying daemons bug
In-Reply-To: <19981108160934.30826@follo.net>
Message-ID: <Pine.BSF.4.05.9811081553230.482-100000@fallout.campusview.indiana.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Sun, 8 Nov 1998, Eivind Eklund wrote:

> On Sun, Nov 08, 1998 at 09:22:50AM -0500, John Fieber wrote:
> > One question: Is the problem "sticky"?  By that I mean, if it is
> > triggered by a memomry shortage, is something in the kernel
> > corrupted that tends to kill/corrupt daemons from that point in
> > time on, or is it just something that affects isolated processes.
> 
> All daemons running at that point seems to get something corrupted.
> If you restart the daemon, it won't happen again until you again run
> out of memory (or whatever it is that trigger the corruption).

I've just been re-examining log files.  What I see is that
problems always follow this message which never occurs more than
once during any give time the system is up:

   /kernel: swap_pager: suggest more swap space: 125 MB

It is always 125 MB...I'm still not completely clear on what that
number is, but anyway...

Here are some highlights from one particular system run where inetd
and httpd die.  I've omitted redundant "signal 11" lines since once
the process is corrupted, any connection attempt generates a slew of
them.

Nov  3 16:53:44 fallout /kernel: FreeBSD 3.0-CURRENT #17: Tue Nov  3 16:46:57 EST 1998
Nov  3 17:33:58 fallout /kernel: swap_pager: suggest more swap space: 125 MB
Nov  5 03:09:22 fallout /kernel: pid 15615 (inetd), uid 0: exited on signal 11
...I kill and restart inetd at some point in this interval...
Nov  5 09:42:25 fallout /kernel: pid 16904 (inetd), uid 0: exited on signal 11
...And again...
Nov  5 13:36:34 fallout /kernel: pid 17779 (inetd), uid 0: exited on signal 11
...And again, this time inetd has the "junk pointer" patchs from
   PR 8183 applied...
Nov  6 00:52:19 fallout /kernel: pid 19759 (httpd), uid 65534: exited on signal 11
Nov  6 03:14:47 fallout /kernel: pid 20245 (inetd), uid 0: exited on signal 11
...and I reboot in the morning...

There are no "swap_pager: out of swap" message anywhere in the logs
which go back to just before I switched from 2.2.7 to 3.0-BETA.  Any
memory shortages after the first "suggest more swap" message are not
being logged if they occur. 

Since this sample I've bumped swap from 128MB to 256MB and have not
had any problems yet.

Another curiosity, I'm getting some curiously garbled lines in the
log files:

Oct 25 09:11:00 fallout /kernel: pid 29392 (inetd
Oct 25 09:10:49 fallout inetd[180]: /usr/local/libexec/amanda/amandad[28958]: exit status 0xb
Oct 25 09:11:00 fallout /kernel: ), uid 0: exited on signal 11

-john


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message