From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 22 09:57:09 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0647337B401 for ; Tue, 22 Apr 2003 09:57:09 -0700 (PDT) Received: from smartrafficenter.org (pacer.smartrafficenter.org [207.14.56.3]) by mx1.FreeBSD.org (Postfix) with SMTP id 06F7D43FD7 for ; Tue, 22 Apr 2003 09:57:08 -0700 (PDT) (envelope-from kpieckiel@smartrafficenter.org) Received: (qmail 74753 invoked by uid 1500); 22 Apr 2003 16:57:02 -0000 Date: Tue, 22 Apr 2003 12:57:02 -0400 From: "Kevin A. Pieckiel" To: Terry Lambert Message-ID: <20030422165702.GC31530@pacer.dmz.smartrafficenter.org> References: <20030420210118.GA21255@pacer.dmz.smartrafficenter.org> <3EA32ADA.CC05B003@mindspring.com> <20030421113938.GA31530@pacer.dmz.smartrafficenter.org> <3EA43297.44FC6E44@mindspring.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="oLBj+sq0vYjzfsbl" Content-Disposition: inline In-Reply-To: <3EA43297.44FC6E44@mindspring.com> User-Agent: Mutt/1.4i cc: freebsd-hackers@freebsd.org Subject: Re: maxfiles, file table, descriptors, etc... X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Apr 2003 16:57:09 -0000 --oLBj+sq0vYjzfsbl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Apr 21, 2003 at 11:04:07AM -0700, Terry Lambert wrote: > Things which are allocated by the zone allocator at interrupt > time have a fixed amount of KVA that is set at boot time, before > the VM system is fully up. Even if it were not fully up, the > way it works is by preallocating an address space range to be > later filled in by physical pages (you cannot call malloc() at > interrupt time, but you can take a fault and fill in a backing > page). So the zone size for sockets (inpcb's, tcpcb's) is fixed > at boot time, even though it is derived from the "maxfiles". This--plus the references to zalloci(), zalloc(), and malloc() you gave--are starting to give me an understanding of this. At least, I recognize the differences you're explaining as well as the logic behind those differences. This is really starting to get fascinating. > A problem with the 5.x approach is that this means it's possible > to get NULL returns from allocation routines, when the system is > under memory pressure (because a mapping cannot be established), > when certain of those routines are expected to *never* fail to > obtain KVA space. This is a bit unnerving--or so it would seem, though I'm a bit lost on a couple points here. First, you said: > In 5.x, the zone limits are still fixed to a static boot-time > settable only value -- the same value -- but the actual zone > allocations take place later." =20 Okay, so the basically the kernel is told it has a certain amount of memory guaranteed to be available to it within a certain zone when in fact that memory is not (because it's allocated later, after a time when it may have already been allocated for another purpose). I see how this links to your parenthetical statement: > This > is a serious problem, and has yet to be correctly addressed in > the new allocator code (the problem occurs because the failure to > obtain a mapping occurs before the zone in question hits its > administrative limit). What I fail to see is why this scheme is decidedly "better" than that of the old memory allocator. I understand from the vm source that uma wants to avoid allocating pools of unused memory for the kernel--allocating memory on an as needed basis is a logical thing to do. But losing the guarantee that the allocation routines will not fail and not adjusting the calling functions of those routines seems a bit dumb (since, as you state, the kernel panics). I think this might be a trouble spot for me because of another question.... What is the correct way to address this in the new allocator code? I can come up with an option or two on my own... such as that to which I've already alluded: memory allocation routines that once guaranteed success can no longer be used in such a manner, thus the calling functions must be altered to take this into account. But this is certainly not trivial! And finally: > Basically, everywhere that calls zalloci() > is at risk of panic'ing under heavy load. Am I not getting a point here? I can't find any reference to zalloci() in the kernel source for 5.x (as of a 07 Apr 2003 cvs update on HEAD), and such circumstances don't apply to 4.x (which, of course, is where I DID find them after you mentioned them). > Correct. The file descriptors are dynamically allocated; or rather, > they are allocated incrementally, as needed, and since this is not > at interrupt time, the standard system malloc() can be used. A quick tangent.... when file descriptors are assigned and given to a running program, are they guaranteed to start from zero (or three if you don't close stdin, stdout, and stderr)? Or is this a byproduct of implementation across the realm of Unixes? > An interesting aside here is that the per process open file table, > which holds references to file for the process, is actually > allocated at power-of-2, meaning each time it needs to grow, the > size is doubled, using realloc(), instead of malloc(), to keep the > table allocation contiguous. This means if you use a lot of files, > it takes exponentially increasing time to open new files, since > realloc has to double the size, and then copy everything. For a > few files, this is OK; for 100,000+ files (or network connections) > in a single process, this starts to become a real source of overhead. Now this _IS_ interesting. I would think circumstances requiring 100,000+ files or net connections, though not uncommon, are certainly NOT in the vast majority, but would still have a bone to pick with this implementation. For example, a web server--from which most users expect (demand?) fast response time--that takes time to expand its file table during a connection or request would seem to have unreasonable response times. One would think there is a better way. How much of an issue is this really? (Afterall, I probably wouldn't have inquired about file limits, etc., in the first place if I wasn't intending on implementing something that will require a lot of connections.) Excellent info, Terry. Thanks for sharing it! Kevin pos +=3D screamnext[pos] /* does this goof up anywhere? */ -- Larry Wall in util.c from the perl source code --- This message was signed by GnuPG. E-Mail kpieckiel-pgp@smartrafficenter.org to receive my public key. You may also get my key from pgpkeys.mit.edu; my ID is 0xF1604E92 and will expire on 01 January 2004. --oLBj+sq0vYjzfsbl Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (FreeBSD) iD8DBQE+pXRdc3iJbvFgTpIRAt0jAJsF2q1ckS1xK2xbaQ1gSY+1Z2OSXwCcCJzx C5pFEwOsGCep9LJoVXM7Pho= =3vSG -----END PGP SIGNATURE----- --oLBj+sq0vYjzfsbl--