From owner-freebsd-hackers Mon Feb 20 09:17:26 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id JAA28470 for hackers-outgoing; Mon, 20 Feb 1995 09:17:26 -0800 Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.9/8.6.6) with SMTP id JAA28464 for ; Mon, 20 Feb 1995 09:17:25 -0800 Received: by cs.weber.edu (4.1/SMI-4.1.1) id AA03273; Mon, 20 Feb 95 10:10:57 MST From: terry@cs.weber.edu (Terry Lambert) Message-Id: <9502201710.AA03273@cs.weber.edu> Subject: Re: getrlimit()/setrlimit() strangeness To: wpaul@skynet.ctr.columbia.edu (Wankle Rotary Engine) Date: Mon, 20 Feb 95 10:10:57 MST Cc: freebsd-hackers@FreeBSD.org In-Reply-To: <199502200656.BAA02173@skynet.ctr.columbia.edu> from "Wankle Rotary Engine" at Feb 20, 95 01:56:03 am X-Mailer: ELM [version 2.4dev PL52] Sender: hackers-owner@FreeBSD.org Precedence: bulk > The other day a user here asked about increasing the per-process limit > for the maximum number of open file descriptors (they have a server process > that needs to have many file descriptors open at once for some periods of > time). I put together the following test program to demonstrate how > getrlimit() and setrlimit() could be used for this purpose: [ ... ] > This attempts to set the number of permitted open file descriptors to > 1024, which is only possible if the hard limit is equal to or higher > than that. I decided to try this program on all the platforms I had > around to see just how portable it would be. Turns out that it works > fine on just about all of them -- except FreeBSD. :( [ ... ] > In FreeBSD-current, weird things happen. I'll use freefall as an > example since I tested this program there. (The same behavior > shows up on my office machine, only my default limits are different > becase my system configuration isn't the same as freefall's.) > > On freefall, I defined MAXCONNECTIONS to be 2048 instead of 1024 since > freefall's hard limit was higher than 1024. > > getrlimit() reported that the soft file descriptor limit was 128 (which > is correct) and that the hard limit was -1 (which is thoroughly bogus). > The sysctl command showed that the hard limit was 1320. Attempting to > set the soft and hard limits to 2048 appeared to succeed, but reading > back the limits afterwards showed that both limits were maxed out at > 1320. This behavior is not what I consider to be correct: the attempt > to raise the limits above the hard limit should have failed noisily; > instead it failed silently and the limits were trimmed at the hard > threshold. And the hard resource limit is most definitely being reported > incorectly. Why sysctl can see it properly but not getrlimit() I > have no idea. Yet. > > On my 1.1.5.1 system at home, the results were a little different > but equally broken: instead of -1, getrlimit() reported the hard > limit to be something in the neighborhood of MAXINT. Aside from that, > it behaved the same as freefall, which is to say it screwed up. > > Anybody else notice this? Better yet, anybody know how to fix it? :) This is part of the stuff that needs to be fixed for kernel and user space multithreading, and as a result of kernel multithreading, it also wants to be fixed for SMP. Take a look at the way the per process open file table maps into the system open file table, and the way the per process open file table is allocated for the process. In most UNIX implementations, what happens is the the per process open file table is allocated in chunks (usually chunks of 32), and is then chaned as a linked list of chunks. In SVR4, the kernel realloc is used to reallocate the structure as necessary to expand it. It turns out that this is about 30% more efficient for your typical programs (this caveat because bash is not a typical program and will screw you on nearly every platform as it tries to move the real handles it maintains around to not conflict with pipes and/or assigned descriptors). The problem is that even when the size is increased, since BSD is not using the SunOS approach and is not using the SVR4 approach, it is doomed to failure. You can not allow an increase to take place, even if requested. In effect, it might even be possible to write off the end of the list and blow kernel memory, although blowing it to something "useful" instead of just resulting in "denial of service" is another matter, and I think is statistically improbable, since the values being blown in are vnode addresses and are therefore not very predicatable. Even if you could predict, I think that getting a usable value is another matter. If someone goes in to fix this, I'd suggest hash-collapse for the system open file table so that there are not multiple instances of multiple system open file table entries pointing to the same vnode. I'd also suggest a reference count on the structure itself, and I'd suggest moving the current file offset into a per process specific area; the current location is bogus for threading. The current system open file limit ideal is also bogus without the hash collapse, since it refers to the limit on open files for all processes instead of the limit on open unique files for the system. If you really care about threading, atomic see/read and seek/write system calls (I believe SVR4 calls these pread/pwrite) should also be implemented to avoid seek/seek/read/read and other race conditions resulting from the offset being a shared quantity (shared only between threads using the same context, if the other suggested changes are implemented). Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.