Date: Mon, 16 Nov 1998 00:00:40 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: bde@zeta.org.au (Bruce Evans) Cc: archie@whistle.com, phk@critter.freebsd.dk, current@FreeBSD.ORG Subject: Re: The infamous dying daemons bug Message-ID: <199811160000.RAA04591@usr05.primenet.com> In-Reply-To: <199811100634.RAA13398@godzilla.zeta.org.au> from "Bruce Evans" at Nov 10, 98 05:34:54 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> >A static inetd sounds like a good experiment. > > I couldn't duplicate the dying daemons problem despite trying fairly > hard, and thought that this might be because I link everything in the > world static. I didn't try hard enough to downgrade to a default world. This data point, Dima's information, and the infromation from Garrett about it seeming to affect only swapped processes jives with my own previously stated intuition about the problem being related to mmap. I think we can be even more specific now, and postulate: The problem occurs when an image that is linked shared mmap's a library file and modifies a data page on that file, causing a copy-on-write, and for which the copied page is subsequently swapped to disk. There is apparently a reclaim error involving this page when the system later attempts to recover pages for its own use, and subsequent references to this data page by children of the parent process fail. We already know that there are dragons in the mmap code; I believe I actually slew the one that would under these circumstances: 1) Set up a cron job to run newsyslog once a minute 2) Cause swap to thrash 3) Do sysloging to force the logs to roll as a result of the newsyslog Note: The thrash is heavy swap load (NOT an out-of-swap condition!). Note: The cron program is known to do some evil things; specifically, it modifies return pwent buffers, resulting in copy-on-write, even though the pwent stuff is implemented as pages mapeed from a db file, and POSIX prohibits cron doing this. resulting in: 3) One or more pages from the password file are written to any open file, usually the one most frequently being the crontab because of the newsyslog runs. This one was (apparently) killed when the actual object size was used instead of bogusly rounding the object size to a page boundary. It seems that there is at least one additional case of a problem with mmap() here, given the apparent shared library inetraction... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811160000.RAA04591>