From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 18:46:03 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B3F6216A4CE for ; Mon, 7 Jun 2004 18:46:03 +0000 (GMT) Received: from parts-unknown.org (dsl093-170-248.sfo4.dsl.speakeasy.net [66.93.170.248]) by mx1.FreeBSD.org (Postfix) with SMTP id 6B59243D1D for ; Mon, 7 Jun 2004 18:46:03 +0000 (GMT) (envelope-from benfell@parts-unknown.org) Received: (qmail 13822 invoked by alias); 7 Jun 2004 18:45:52 -0000 Date: Mon, 7 Jun 2004 11:45:51 -0700 From: "David A. Benfell" To: current@freebsd.org Message-ID: <20040607184551.GA13787@parts-unknown.org> Mail-Followup-To: current@freebsd.org References: <20040607150956.GA7084@parts-unknown.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-stardate: [-29]2233.82 X-moon: The Moon is Waning Gibbous (72% of Full) User-Agent: Mutt/1.5.6i Subject: Re: file descripter leak in current with Qmail? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2004 18:46:03 -0000 On Mon, 07 Jun 2004 13:06:49 -0400, Robert Watson wrote: > > All of these limits have existed for quite a while, but typically aren't > run into since the default limits typically are pretty high for normal > application use. If necessary, you can raise the limit by tweaking the > global maximum using kern.maxfiles (either as a tunable or sysctl), and > then as needed adjusting the resource limits that qmail runs with. > Okay, so as a temporary measure, I've raised kern.maxfiles to 20000. I'm concerned about doing this; what I'm seeing suggests that system performance gets really ugly as the number of open files increases, even when it's still well below the old limit. > However, I think the more serious element here is the reason why you reach > the limit: this happens "naturally" under some workloads simply because of > large numbers of open files and network connections. However, in some > workloads, it's a symptom of a system or application bug, such as a > resource leak. The part that has me worried is that I'm hitting the limit now, when I wasn't before. Unfortunately, I haven't been keeping track of my upgrades in -CURRENT, so I can't really put a timeframe on when the problem arose, except that I didn't have the problem before my most recent upgrade a couple days ago. > > Because the resources were returned when qmail was killed, that largely > eliminates the possibility of a kernel resource leak (not entirely, but > largely), as most kernel resource leaks involving file descriptors have > the symptom that even after the process exits, the resources aren't > release (i.e., a reference counting bug or race). This suggests a user > space issue -- that doesn't eliminate a system bug, as it could be a bug > in a library that manages descriptors, but it also suggests the > possibility of an application bug, or at least, a poor application > interaction with a system bug. Occasionally, we've seen bugs in the > threading libraries that result in leaked descriptors, but my recollection > is that qmail doesn't use threads. So that suggests either a support > library (perhaps crypto or the like), or qmail itself. Or that you just > hit an extremely high load. :-) > > In terms of debugging it: your first task it to identify if there's one > process that's holding all the fd's, or if it is distributed over many > proceses. After that, you want to track down what kind of fd is being > left open, which may help you track down why it's left open... > I'm going to have to take this to the qmail list; people there might be able to track this down. Thanks! -- David Benfell, LCP benfell@parts-unknown.org --- Resume available at http://www.parts-unknown.org/resume.html