Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Nov 2002 09:55:22 +0100
From:      Daniel Lang <dl@leo.org>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc:        freebsd-hackers@FreeBSD.ORG, chopin@sgh.waw.pl
Subject:   Re: strange coredump in malloc_bytes()/libc in 4.7p2
Message-ID:  <20021128085522.GA64864@atrbg11.informatik.tu-muenchen.de>
In-Reply-To: <99290.1038340689@critter.freebsd.dk>
References:  <20021126131438.GC60278@atrbg11.informatik.tu-muenchen.de> <99290.1038340689@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help

--SLDf9lqlvOQaIe6s
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi Poul-Henning,

Poul-Henning Kamp wrote on Tue, Nov 26, 2002 at 08:58:09PM +0100:
[..]
> I think we can more or less conclude that something has trashed your
> memory.
> 
> I'd suggest you try to run your program with ElectricFence or similar.
[..]

I found the problem. It seems to result in a series of
unfortunate events. Although some reasons can be blamed on
the application (ircd), I believe I've found a possible problem
with malloc(), as well. I will explain in detail.

First I installed EFence and linked ircd against it,
just to find, that ircd with -lefence died under the 
debugger with "Program exited with 0377" (or similar). I would
have expected EF to SEGV, if a memory barrier was crossed...

Single-Stepping through the code inside of EFence's allocator, I finally
came across a mmap() call, that returned 0xffffffff, which is, as it
seems MAP_FAILED. For some unknown reason, the corresponding error message
was not printed, but this brought me on the right track.

Going up the stack frames, it became clear, that a ridicilous
amount of memory was tried to be allocated, finally leading to
this error. The amount of memory to be allocated came from a broken
tuning file, that is read on startup. *sigh*

So here is ircd to blame, not sanity-checking the tune-file.
I will discuss this with the developers separately.

But since mmap() returned a failure, I was curious, why 
malloc() did not cause a similar error, if ircd was run without
EFence.

First I checked, that the return of the call to malloc() is
0x0, as it should be. This was the case, and this case is
also handled in ircd's code. The process is not aborted, though.
When I continued in the debugger, the process dies soon
with the strange error in isatty(). Again I dug up a libc with
symbols. And stepped through malloc() as well.

Here is what I found:

malloc is called with an argument of -149139900,
this results in 4145827396 Bytes (interpreted as unsigned value).
This value is pageround() and shifted resulting in

1012165 pages

This value is passed to map_pages():

[..]
static void *
map_pages(size_t pages)
{
    caddr_t result, tail;

    result = (caddr_t)pageround((u_long)sbrk(0));
    tail = result + (pages << malloc_pageshift);

    if (brk(tail)) {
#ifdef EXTRA_SANITY
        wrterror("(ES): map_pages fails\n");
#endif /* EXTRA_SANITY */
        return 0;
    }
[..]

passing such a value to map_pages seems to result in an overflow
in the calculation of "tail":

(gdb) p result
$18 = 0x8f22000 <Error reading address 0x8f22000: Bad address>
(gdb) p tail
$17 = 0xe7000 <Error reading address 0xe7000: Bad address>

I understand, that result is the current 'break', the upper border
of the process' data segment, and tail should be the future
upper border, increased by the amount of pages.

brk() just sets the new value, but in this case of overflow, does
not increase the value, but _lowers_ it, to some utterly wrong
value. The call to brk() succeeds!

This seems to me _the_ place where the memory corruption actually
happens. A sanity check for the overflow may not be wrong, I'll attach
a patch.

However, I do not know, if this can be considered a bug.

The check after the call to brk():

if ((last_index+1) >= malloc_ninfo && !extend_pgdir(last_index))

fails, and malloc() returns 0x0 after all. 
Well, I'm not sure if programs are expected to exit immediately
after a malloc() fails, but I think, they are not necessarily.

Finally I included malloc_options="X" into the code, and, yes,
the program exited with abort() at a much more sensible 
location. I did not remember malloc options until today, alas. :-/

Ok, so far from me. Any comments about my discovery and patch appreciated.

Best regards,
 Daniel
-- 
IRCnet: Mr-Spock                      - All your .sigs are belong to us -
 Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/

--SLDf9lqlvOQaIe6s
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="malloc.patch"

--- src/lib/libc/stdlib/malloc.c.orig	Thu Nov 28 09:51:09 2002
+++ src/lib/libc/stdlib/malloc.c	Thu Nov 28 09:53:00 2002
@@ -307,6 +307,14 @@
     result = (caddr_t)pageround((u_long)sbrk(0));
     tail = result + (pages << malloc_pageshift);
 
+	/* check for overflow */
+	if(tail < result) {
+#ifdef EXTRA_SANITY
+	wrterror("(ES): overflow in map_pages; failed\n");
+#endif /* EXTRA_SANITY */
+	return 0;
+	}
+
     if (brk(tail)) {
 #ifdef EXTRA_SANITY
 	wrterror("(ES): map_pages fails\n");

--SLDf9lqlvOQaIe6s--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021128085522.GA64864>