Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Nov 1998 00:00:40 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        bde@zeta.org.au (Bruce Evans)
Cc:        archie@whistle.com, phk@critter.freebsd.dk, current@FreeBSD.ORG
Subject:   Re: The infamous dying daemons bug
Message-ID:  <199811160000.RAA04591@usr05.primenet.com>
In-Reply-To: <199811100634.RAA13398@godzilla.zeta.org.au> from "Bruce Evans" at Nov 10, 98 05:34:54 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >A static inetd sounds like a good experiment.
> 
> I couldn't duplicate the dying daemons problem despite trying fairly
> hard, and thought that this might be because I link everything in the
> world static.  I didn't try hard enough to downgrade to a default world.

This data point, Dima's information, and the infromation from Garrett
about it seeming to affect only swapped processes jives with my own
previously stated intuition about the problem being related to mmap.

I think we can be even more specific now, and postulate:


	The problem occurs when an image that is linked shared
	mmap's a library file and modifies a data page on that
	file, causing a copy-on-write, and for which the copied
	page is subsequently swapped to disk.

	There is apparently a reclaim error involving this page
	when the system later attempts to recover pages for its
	own use, and subsequent references to this data page by
	children of the parent process fail.

We already know that there are dragons in the mmap code; I believe
I actually slew the one that would under these circumstances:


	1)	Set up a cron job to run newsyslog once a minute
	2)	Cause swap to thrash
	3)	Do sysloging to force the logs to roll as a result
		of the newsyslog


	Note:	The thrash is heavy swap load (NOT an out-of-swap
		condition!).

	Note:	The cron program is known to do some evil things;
		specifically, it modifies return pwent buffers,
		resulting in copy-on-write, even though the pwent
		stuff is implemented as pages mapeed from a db
		file, and POSIX prohibits cron doing this.

resulting in:

	3)	One or more pages from the password file are written
		to any open file, usually the one most frequently
		being the crontab because of the newsyslog runs.

This one was (apparently) killed when the actual object size was
used instead of bogusly rounding the object size to a page boundary.


It seems that there is at least one additional case of a problem
with mmap() here, given the apparent shared library inetraction...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811160000.RAA04591>