From owner-freebsd-bugs Thu Sep 18 07:59:53 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id HAA11095 for bugs-outgoing; Thu, 18 Sep 1997 07:59:53 -0700 (PDT) Received: from citadel.cdsec.com (citadel.cdsec.com [192.96.22.18]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id HAA11072; Thu, 18 Sep 1997 07:59:32 -0700 (PDT) Received: (from nobody@localhost) by citadel.cdsec.com (8.8.5/8.6.9) id RAA14234; Thu, 18 Sep 1997 17:04:33 +0200 (SAT) Received: by citadel via recvmail id 14201; Thu Sep 18 17:04:26 1997 by gram.cdsec.com (8.8.5/8.8.5) id QAA00397; Thu, 18 Sep 1997 16:51:34 +0200 (SAT) From: Graham Wheeler Message-Id: <199709181451.QAA00397@cdsec.com> Subject: Bug in malloc/free (was: Memory leak in getservbyXXX?) To: phk@critter.freebsd.dk (Poul-Henning Kamp) Date: Thu, 18 Sep 1997 16:51:33 +0200 (SAT) Cc: hackers@freebsd.org, freebsd-bugs@freebsd.org, gram@gram.cdsec.com (Graham Wheeler) In-Reply-To: <2593.874490931@critter.freebsd.dk> from "Poul-Henning Kamp" at Sep 17, 97 12:08:51 pm X-Mailer: ELM [version 2.4 PL25-h4.1] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-bugs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi Poul and others This is a preliminary report, as it is still very early and the results we are seeing may be coincidental. Even after removing all non-reentrant calls from the SIGCHLD handler in our gateway program, adding asserts after every call to new, manually rechecking every call to memcpy, memset, strcpy, strcat, etc etc etc, we continued to experience the problems, namely the process either aborting in the malloc/free routines, or going into what appeared to be an infinite loop, where the program chewed CPU cycles but did nothing useful. In the latter case, sending a SIGABRT resulted in stack backtraces which also always ended in malloc() or free(). The recursive malloc call problem was solved by removing the non-reentrant calls, but it appears that heap corruption still takes place. This morning, as an act of near desperation, I linked the code with the libmalloc that is in the ports/devel directory. In fact I linked with the debug version libmalloc_d, which has far more comprehensive integrity checking. Since running the resulting version, there have been no crashes or freezes. As I say, it is still quite early to jump to conclusions, but the program has been running under heavy load for about three hours so far, whereas yesterday it would crash or freeze up within 30 minutes of starting. [For those who haven't been following this thread, this application is the core program in our firewall product, responsible for gatewaying almost all IP packets. The site in question has several thousand users who seem to do little other than surf the web. Thus the amount of memory allocation and freeing that takes place in this application is quite astronomical compared to typical applications. Amongst other allocations, every single packet is held in a dynamically allocated buffer, as is the state information associated with every TCP connection.] I'll follow up on this in the morning (South African time) - if the process is still running smoothly this would suggests that there may be a problem with the malloc/free code in libc. regards graham -- Dr Graham Wheeler E-mail: gram@cdsec.com Citadel Data Security Phone: +27(21)23-6065/6/7 Internet/Intranet Network Specialists Mobile: +27(83)-253-9864 Firewalls/Virtual Private Networks Fax: +27(21)24-3656 Data Security Products WWW: http://www.cdsec.com/