From owner-svn-src-head@FreeBSD.ORG Fri Apr 26 18:37:13 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E288C6FA; Fri, 26 Apr 2013 18:37:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 6AB0B17ED; Fri, 26 Apr 2013 18:37:13 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r3QIaueP013094 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 27 Apr 2013 04:37:02 +1000 Date: Sat, 27 Apr 2013 04:36:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jilles Tjoelker Subject: Re: svn commit: r249859 - head/lib/libc/sys In-Reply-To: <20130426155933.GA24412@stack.nl> Message-ID: <20130427030857.L2680@besplex.bde.org> References: <201304242124.r3OLOZW5034818@svn.freebsd.org> <20130425204458.F1034@besplex.bde.org> <20130426155933.GA24412@stack.nl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Gu4aYTJC c=1 sm=1 a=P_x8KVnH094A:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=uiiGxAOqq18A:10 a=0qMOogwszCEQbqkq7PUA:9 a=CjuIK1q_8ugA:10 a=DoCJ9_kpjpoDdCIq:21 a=oUpQ4aSxQuHIGM3f:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Apr 2013 18:37:14 -0000 On Fri, 26 Apr 2013, Jilles Tjoelker wrote: > On Thu, Apr 25, 2013 at 09:56:01PM +1000, Bruce Evans wrote: >> On Wed, 24 Apr 2013, Jilles Tjoelker wrote: > >>> Log: >>> getdtablesize(2): Describe what this function actually does. >>> ... >>> .Sh NAME >>> .Nm getdtablesize >>> -.Nd get descriptor table size >>> +.Nd get file descriptor limit > >> Now its name doesn't match its description, and the reason for this is >> not documented. > > This seems to be the case on most systems that have this function. No reason not to document it. >> This function is almost obsolete. In POSIX, it is spelled {OPEN_MAX} >> or sysconf(__SC_OPEN_MAX). It is sometimes misspelled OPEN_MAX. > > There is a difference between sysconf(_SC_OPEN_MAX) and getdtablesize(): > the latter also takes maxfilesperproc and rctl(8) rules into account. Yes, only sysconf(_SC_OPEN_MAX) (and getrlimit(..., RLIMIT_NOFILE)) are broken. I'm not familiar with rctl. It seems to break this some more: in kern_descrip.c, all the places that use the rlimit clamp it to maxfileperproc, so the effective rlimit is always the clamped value and this is what sysconf() and getrlimit() should return too. Now for rctl, only 1 of these places (namely getdtablesize()) clamps it further to the RACCT_NOFILE limit. Other places include do_dup(). do_dup() checks the RACCT_NOFILE much later. I think this gives the same limit, but at least the errno when the limit RACCT_NOFILE limit is exceeded but the others aren't is wrong in all cases. (The early errno for (new >= maxfd) is (flags & DUP_FCNTL ? EINVAL : EBADF). The later errno for racct_set(... RACCT_NOFILE, new + 1) is always EMFILE. EMFILE is not even a possible errno for dup2(). >> I prepared to remove the broken definition of OPEN_MAX, but never committed >> the final step. /usr/src has very few misuses of OPEN_MAX now, so removing >> the definition wouldn't be too hard. Most uses are in compatibility >> cruft. E.g., the following from >> crypto/openssh/openbsd-compat/bsd-closefrom.c >> which is confused about related things: > >> [snip] > If that code is compiled at all, it is a bug. We have closefrom() and > OpenSSH should use it. It almost certainly uses it. However, a quick grep for getdtablesize() shows many applications using it, and most of them use it to give the top of a close() loop. These should be converted to use something like closefrom(). getdtablesize() is used much more than sysconf() or getrlimit() for this. This is done mostly in old BSD applications. However, the worst uses that I noticed are in the relatively new ppp application. There are about 40 calls to getdtablesize() in /usr/src. 7 of these are in ppp. 1 of the 7 is for the close() loop. 4 of the 7 are for fcntl(... F_SETFD ...) loops. The other 2 are for a home made FD_SETSIZE and a home made FD_ZERO(). These essentially initialize FD_SETSIZE to the variable getdtablesize(). A table of bits of that size is allocated. A table of bits of that size is cleared. Since getdtablesize() is variable, buffer overruns and underruns occur if someone changes getdtablesize() while the process is running (it can be changed by the maxfileperproc sysctl and now by rctl). But the huge table is not always passed to select() (its size doesn't seem to be recorded anywhere, so it can also be overrun or underrun by FD_SET() on it, where the fd for FD_SET is acquired before or after getdtablesize() changes). >> ... in 4.4BSD and FreeBSD, both sysconf(_SC_OPEN_MAX) are just wrappers for >> the resource limit (sysconf() is a libc wrapper and getdtablesize() is >> a syscall wrapper). Actually, in FreeBSD, getdtablesize() is not even the >> rlmint -- it is the min() of the rlimit and the global sysctl integer >> maxfilesperproc. Here the bug is in the rlimit. For the rlimit, >> maxfilesperproc is only used when the rlimit is set and when it is used >> in the kernel. But when the rlimit is returned to userland, via >> getrlimit(), maxfilesperproc is not used, so the rlimit may be wrong if >> maxfileperproc was lowered after setting the rlimit. > > I don't like the idea of rlimits changing of "their own will". Changing > maxfileperproc at run time is going to be a bit nasty in any case but > seems not as bad as changing rlimits of running processes. I don't like this either. maxfileperproc wouldn't exist if my axe was sharper in 1995 when it was committed. rctl is less hackish, but causes similar problems. Anyway, since the effective limit does change, getrlimit() should keep up with the change like getdtablesize() does. Otherwise processes can't tell what the rlimit actually is, or needs to know that getdtablesize() must be used instead of getrlimit() to determine the actual limit. maxfileperproc and the rctl limit are only needed to prevent growth of the table. Once a large table has been allocated, it is not useful to prevent dup2() from using a large fd for which space has already been allocated. But the clamps to maxfileperproc prevent this. Maybe the clamps to the rctl limit don't prevent this, since they are applied later and only cause failure if the allocation fails. But then since they are applied at the same time in getdtablesize(), they prevent applications from knowing the actual limit in another way. I think this is now further complicated by sparse tables. >> Actually, {OPEN_MAX} is guaranteed by POSIX to be at least >> {_POSIX_OPEN_MAX}, and {_POSIX_OPEN_MAX} is precisely 20. But these >> guarantees and similar ones for stdio's FOPEN_MAX have always been >> broken in FreeBSD, since anyone can reduce the rlimit below 20. >> ... > Recent versions of POSIX allow {OPEN_MAX} to be based on the rlimit. In > that case, it may change when the rlimit is changed. I think this is more or less required (and implicit before). But how can anything work if it there is no API like closefrom() and there is no way to determine the maximum open fd. getdtablesize() would be more useful if it actually returned the table size. Then its old use of closing all files up to the table size (less 1) would keep working OK iff the table size is small. Of course, the table must remain large enough to keep holding the maximum open fd. If getdtablesize() did that, then it would match its name again. But it hasn't matched since 386BSD or earlier -- in FreeBSD-1, it returned essentially the same as now: (just the rlimit, so it doesn't cover the maximum open fd if anyone reduces the rlimit). >>> The >>> .Fn getdtablesize >>> -system call returns the size of this table. >>> +system call returns the maximum number of file descriptors >>> +that the current process may open. > >> Actually, the process may open more than this number, after raising its >> (soft) rlimit, if this is possible. > > True, but this requires different function calls than just ones > allocating file descriptors. Not too bad if the rlimit can only be changed by the process itself, but I'd still like this to be documented explicitly. POSIX doesn't have the complications of maxfileperproc or rctl. These are harder to document. About all you can say is that the results returned by getdtablesize(), getrlimit() and sysconf() are volatile so that they become unusable before they can be used :-(. >>> +.Xr closefrom 2 , >>> .Xr dup 2 , >>> -.Xr open 2 , >>> -.Xr select 2 >>> +.Xr getrlimit 2 , >>> +.Xr sysconf 2 >>> .Sh HISTORY >>> The >>> .Fn getdtablesize > > I suppose rctl(8) can be added here. dup(2) needs similar changes (I didn't notice if you changed it recently). It says that `oldd' must be < getdtablesize(), but actually, `oldd' must just be open. It doesn't say that `newd' must be < getdtablesize() (modulo complications for the rctl case), except indirectly in the ERRORS section, it says that EBADF occurs when `newd' exceeds the maximum allowable descriptor number. Here "allowable" has a wrong tense ("allowed" would make more sense), and what this maximum is is not described. POSIX uses better wording for this of course ("... when `fildes' is greater than or equal to {OPEN_MAX}"). This would be better still with "greater than or equal" spelled ">=". Bruce