Date: Fri, 28 Aug 1998 11:08:06 +0300 (EEST) From: Alexander Litvin <archer@lucky.net> To: Archie Cobbs <archie@whistle.com> Cc: current@FreeBSD.ORG Subject: Re: encountered possible VM bug ? Message-ID: <199808280808.LAA00123@grape.carrier.kiev.ua> In-Reply-To: <199808272051.NAA27400@bubba.whistle.com>
next in thread | previous in thread | raw e-mail | index | archive | help
In article <199808272051.NAA27400@bubba.whistle.com> you wrote: >> GW> No, this is the ``daemons dying'' bug which nobody has fixed yet. >> GW> When the system runs out of swap, some random selection of processes >> GW> which are in swap get corrupted. Usually this results in a daemon >> GW> which dies whenever it fork()s, but sometimes it is manifested as >> GW> other sorts of corruption. The message you see from realloc is >> GW> indicative of a corrupted pointer. >> >> Really, I was under impression, that it is the problem just with fork(). >> But now I may confirm that processes get corrupted in different manners. >> E.g., I have now a specially written dummy daemon running, which I >> was able to corrupt (intentionally exhausting swap) in such a way that >> it successfully forks. Than child process sleeps (just to give me >> chance to attach to it with debugger), allocates memory, accesses it >> -- and during all that it doesn't get SIGSEGV. But then it dies when >> trying to syslog(3). It seems that the corruption is in mmaped ld.so >> or libc.3.1.so. >> >> If anybody cares, I may try to give any other details. AC> At Whistle, we've seen this bug every so often for a long time. AC> The common elements seem to be: AC> 1. memory mapping is in use AC> 2. a fork() is happening or just happened AC> But #1 and #2 are not necessarily both related to the same process. AC> This bug has been around for a *long* time, in both 2.x and 3.x. I saw bash exiting with SIGSEGV. It was not trying to fork some job. It was swapped out, I just hit <Enteer>, and it exited with signal 11. Cron sometimes seem to just stop forking cron jobs, when it is not segfaulting -- it just doesn't try to fork. AC> Running out of swap may or may not be related, not sure... I think AC> we've seen this when swap was not an issue. Perhaps running out of AC> swap amplifies the problem. AC> It's really hard to pin down, because the panic seems to come a AC> while after the initial damage is done. We've seen random processes AC> crashing every time they try to fork(), kernel panic's because of AC> some process being on two different queues at the same time (eg, AC> sleep and runnable), and other manifestations. AC> A common manifestation is that a file being written out contains AC> some random page of memory from some other file -- we think the other AC> file is a currently mmap'd file. In my case it seems that the process have some of its pages zeroed. At least here's the simpthom (I have it still running and segfaulting -- for investigation ;): root:~/dummy_daemon:grape:> gdb dummy_daemon 29643 [...] Attaching to program `/usr/home/archer/dummy_daemon/dummy_daemon', process 29643 Reading symbols from /usr/libexec/ld.so...done. Reading symbols from /usr/lib/aout/libc.so.3.1...done. Error accessing memory address 0x0: Bad address. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What exactly does that line mean? When I attach to not deseased dummy_daemon, it does not appear, instead I see: 0x20057c21 in nanosleep () AC> Julian and Terry can supply more details. AC> -Archie AC> ___________________________________________________________________________ AC> Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com --- It's lucky you're going so slowly, because you're going in the wrong direction. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808280808.LAA00123>