From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 17:07:46 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 919DE16A4CE for ; Mon, 7 Jun 2004 17:07:46 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 27E9143D53 for ; Mon, 7 Jun 2004 17:07:46 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i57H6nwT084379; Mon, 7 Jun 2004 13:06:50 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i57H6nND084376; Mon, 7 Jun 2004 13:06:49 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 7 Jun 2004 13:06:49 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: "David A. Benfell" In-Reply-To: <20040607150956.GA7084@parts-unknown.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: file descripter leak in current with Qmail? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2004 17:07:46 -0000 On Mon, 7 Jun 2004, David A. Benfell wrote: > I'm running current, with qmail and spamassassin. > > Having finally caught on to the new kernel build process (ahem), I'm now > having problems with the qmail UIDs (mostly for qmaild but occasionally > qmails) exceeding the openfiles limit. > > When I used sysctl to interrogate kern.openfiles, it said 1836. I have > not altered the default maximum. When I shut down qmail, it promptly > dropped to something like 180. When I restarted qmail, it went back up > to something like 194 but system responsiveness dropped through the > floor. In basically the time it's taken me to write this much, > kern.openfiles has climbed to something like 261. > > So I guess I have a couple of idiot questions to ask here: > > Is the kern.openfiles limit something (relatively) new? I was running > current before on this box but hadn't gotten through a build because I > hadn't caught on to the new kernel build process since before 5.2 was > released. Qmail was not a problem before. > > Is the correct response to this problem to raise the limit? If so, I > presume this would be done in rc.conf; what would be the corresponding > variable in rc.conf? > > In the time I've composed this far, system responsiveness seems to have > returned to normal and kern.openfiles has dropped to something like 221. > So I assume the responsiveness issue had to do with qmail trying to > catch up. > > I'm in between quarters in school right now, so I have a little time to > play with this if needed. Just to make sure we're clear on terminology, kern.openfiles is the number of open file descriptors in the system. Several resource limits impact the ability to allocate new file descriptors: - kern.maxfiles, the global maximum number of open file descriptors permitted. - Resource limits, which are per-process, and can be viewed for the current process using the "limits" command (or some variation depending on shell). - Real system memory constraints, which can result in allocation failures, etc, if exceeded. All of these limits have existed for quite a while, but typically aren't run into since the default limits typically are pretty high for normal application use. If necessary, you can raise the limit by tweaking the global maximum using kern.maxfiles (either as a tunable or sysctl), and then as needed adjusting the resource limits that qmail runs with. However, I think the more serious element here is the reason why you reach the limit: this happens "naturally" under some workloads simply because of large numbers of open files and network connections. However, in some workloads, it's a symptom of a system or application bug, such as a resource leak. Because the resources were returned when qmail was killed, that largely eliminates the possibility of a kernel resource leak (not entirely, but largely), as most kernel resource leaks involving file descriptors have the symptom that even after the process exits, the resources aren't release (i.e., a reference counting bug or race). This suggests a user space issue -- that doesn't eliminate a system bug, as it could be a bug in a library that manages descriptors, but it also suggests the possibility of an application bug, or at least, a poor application interaction with a system bug. Occasionally, we've seen bugs in the threading libraries that result in leaked descriptors, but my recollection is that qmail doesn't use threads. So that suggests either a support library (perhaps crypto or the like), or qmail itself. Or that you just hit an extremely high load. :-) In terms of debugging it: your first task it to identify if there's one process that's holding all the fd's, or if it is distributed over many proceses. After that, you want to track down what kind of fd is being left open, which may help you track down why it's left open... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research