From owner-freebsd-current Sun Nov 15 16:08:30 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA22795 for freebsd-current-outgoing; Sun, 15 Nov 1998 16:08:30 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA22790 for ; Sun, 15 Nov 1998 16:08:28 -0800 (PST) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id RAA25575; Sun, 15 Nov 1998 17:00:53 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpd025498; Sun Nov 15 17:00:42 1998 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id RAA04591; Sun, 15 Nov 1998 17:00:40 -0700 (MST) From: Terry Lambert Message-Id: <199811160000.RAA04591@usr05.primenet.com> Subject: Re: The infamous dying daemons bug To: bde@zeta.org.au (Bruce Evans) Date: Mon, 16 Nov 1998 00:00:40 +0000 (GMT) Cc: archie@whistle.com, phk@critter.freebsd.dk, current@FreeBSD.ORG In-Reply-To: <199811100634.RAA13398@godzilla.zeta.org.au> from "Bruce Evans" at Nov 10, 98 05:34:54 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >A static inetd sounds like a good experiment. > > I couldn't duplicate the dying daemons problem despite trying fairly > hard, and thought that this might be because I link everything in the > world static. I didn't try hard enough to downgrade to a default world. This data point, Dima's information, and the infromation from Garrett about it seeming to affect only swapped processes jives with my own previously stated intuition about the problem being related to mmap. I think we can be even more specific now, and postulate: The problem occurs when an image that is linked shared mmap's a library file and modifies a data page on that file, causing a copy-on-write, and for which the copied page is subsequently swapped to disk. There is apparently a reclaim error involving this page when the system later attempts to recover pages for its own use, and subsequent references to this data page by children of the parent process fail. We already know that there are dragons in the mmap code; I believe I actually slew the one that would under these circumstances: 1) Set up a cron job to run newsyslog once a minute 2) Cause swap to thrash 3) Do sysloging to force the logs to roll as a result of the newsyslog Note: The thrash is heavy swap load (NOT an out-of-swap condition!). Note: The cron program is known to do some evil things; specifically, it modifies return pwent buffers, resulting in copy-on-write, even though the pwent stuff is implemented as pages mapeed from a db file, and POSIX prohibits cron doing this. resulting in: 3) One or more pages from the password file are written to any open file, usually the one most frequently being the crontab because of the newsyslog runs. This one was (apparently) killed when the actual object size was used instead of bogusly rounding the object size to a page boundary. It seems that there is at least one additional case of a problem with mmap() here, given the apparent shared library inetraction... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message