From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 22 00:04:12 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 54E8637B401 for ; Tue, 22 Apr 2003 00:04:12 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id B3E5243FAF for ; Tue, 22 Apr 2003 00:04:11 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0155.cvx21-bradley.dialup.earthlink.net ([209.179.192.155] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 197rp3-0002iM-00; Tue, 22 Apr 2003 00:04:10 -0700 Message-ID: <3EA4E91D.16AD8565@mindspring.com> Date: Tue, 22 Apr 2003 00:02:53 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bill Wells References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ba16ac8a01ad94978640be6d87fe35a3a2d4e88014a4647c350badd9bab72f9c350badd9bab72f9c cc: freebsd-hackers@freebsd.org Subject: Re: maxfiles, file table, descriptors, etc... X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Apr 2003 07:04:12 -0000 Bill Wells wrote: > > An interesting aside here is that the per process open file table, > > which holds references to file for the process, is actually > > allocated at power-of-2, meaning each time it needs to grow, the > > size is doubled, using realloc(), instead of malloc(), to keep the > > table allocation contiguous. This means if you use a lot of files, > > it takes exponentially increasing time to open new files, since > > realloc has to double the size, and then copy everything. For a > > few files, this is OK; for 100,000+ files (or network connections) > > in a single process, this starts to become a real source of overhead. > > Let's say you just allocated the 9th element. You started out with > a table of one element, doubled it (copying 1), doubled it to 4 > elements (copying 2 + 1 or 3), thence to 8 (copying 4 + 3 or 7) > and finally to 16 (copying 8 + 7 or 15). > > To allocate then 2**n + 1'th entry, you will have to do 2**(n+1) > copies. Thus you will do (2**(n+1) - 1) / (2**n + 1) copies per > allocated entry. This is the worst case, since additional > allocations up to 2**(n+1) don't incur any additional copies. > > The upper bound is two entries copied per entry allocated. This > makes exponential allocation a *linear time* algorithm. This just > isn't a major time cost unless the chunkiness of the time spent is > an issue. The copy time goes up as 2^N; the rate of copies required goes up as log2(N). Overall, it's true that the time, scaled linearly against N, and averaged, is a linear increase (e.g. multiply the two, and then treat as a continuous function, even though it's not). But I was more interested in instaneous response. Consider that the latency involved in the copy increases the delay for all other pending operations. For example, on an HTTP server, copying the file results in bursty response. Also realize that if you are under a "flash crowd" situation, this is *exactly* the point in time you *don't* want to increase your response latency, as requests will continue to pile up (e.g. requests as a result of being "slashdotted"). The obvious workaround is to dup2() something like /dev/null to fd 200,000, and grow the table once, up front. Still, it's a pain in the butt that the default algorithm takes more time at exactly the point where you don't have time available. And the "grow it once" approach has the drawback that that memory is now committed to a particular task, when it may turn out to be better utilized elsewhere, until load merits it. It's tempting to come up with a way to register fd's into a kqueue for incoming events, without adding a real per-process open file table entry for them (especiall for sockets), and avoid the overhead altogether. Of course, for something like that, struct fileops must die. -- Terry