From owner-freebsd-hackers@FreeBSD.ORG Sun May 19 16:17:39 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 780257C2; Sun, 19 May 2013 16:17:39 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 069D3F30; Sun, 19 May 2013 16:17:39 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 47B08358C65; Sun, 19 May 2013 18:17:38 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 2362D28493; Sun, 19 May 2013 18:17:38 +0200 (CEST) Date: Sun, 19 May 2013 18:17:38 +0200 From: Jilles Tjoelker To: Paul LeoNerd Evans Subject: Re: Managing userland data pointers in kqueue/kevent Message-ID: <20130519161737.GA26506@stack.nl> References: <20130513185357.1c552be5@shy.leonerd.org.uk> <20130513191513.786f4f02@shy.leonerd.org.uk> <8A02C28F-89CB-4AE3-A91A-89565F041FDE@gmail.com> <20130513194411.5a2dfa2e@shy.leonerd.org.uk> <519327DF.6060002@freebsd.org> <20130515132959.7f113255@shy.leonerd.org.uk> <20130515133458.41f980e9@shy.leonerd.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130515133458.41f980e9@shy.leonerd.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, Adrian Chadd , Eugen-Andrei Gavriloaie X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 May 2013 16:17:39 -0000 On Wed, May 15, 2013 at 01:34:58PM +0100, Paul LeoNerd Evans wrote: > On Wed, 15 May 2013 13:29:59 +0100 > Paul "LeoNerd" Evans wrote: > > Is that not the exact thing I suggested? > > The "extension to create register a kevent to catch these events" is > > that you put the EV_DROPWATCH bit flag in the event at the time you > > register it. > > The "returned event [that] could have all the appropriate informaiton > > for the event being dropped" is that you receive an event with > > EV_DROPPED set on it. It being a real event includes of course the > > udata pointer, so you can handle it. > In fact, to requote the original PR I wrote[1] on the subject: > --- > I propose the addition of a new flag applicable to any kevent watch > structure, documented thusly: > The flags field can contain the following values: > .. > EV_DROPWATCH Requests that the kernel will send an EV_DROPPED event > on this watch when it has finished watching it for any > reason, including EV_DELETE, expiry because of > EV_ONESHOT, or because the filehandle was closed by > close(2). > > EV_DROPPED This flag is returned by the kernel if it is now about > to drop the watch. After this flag has been received, > no further events will occur on this watch. > This flag then makes it trivial to build a generic wrapper for kqueue > that can always manage its memory correctly. > a) at EV_ADD time, simply set flags |= EV_DROPWATCH > b) after an event has been processed that included the EV_DROPPED > flag, free() the pointer given in the udata field. An important detail is missing: how do you avoid using up all kernel memory on knotes if someone keeps adding new file descriptors with EV_ADD | EV_DROPWATCH and closing the file descriptors again without ever draining the kqueue? This problem did not use to exist for file descriptor events before: the number of such knotes was limited to the number of open file descriptors. However, it does already exist for most of the other event types. For example, pwait -v will return the exit status even if it was suspended (^Z) while the process terminated and the parent reaped the zombie. For EVFILT_TIMER, the worst effect is a denial of service of EVFILT_TIMER on all other processes in the system. EVFILT_USER does not appear to check anything and appears to allow arbitrary kernel memory consumption. The EVFILT_TIMER needs to keep its global limit and EVFILT_USER needs something similar. For the rest, call an event that is no longer associated to a kernel object (e.g. EVFILT_READ whose file descriptor is closed, EVFILT_PROC whose process has terminated and been reaped by the parent or EVFILT_AIO whose I/O request is completed) "unbound". The number of events that are not unbound is limited by existing limits on the other kernel objects. A possible fix is to reject (such as with [ENOMEM]) adding new events when there are too many unbound events in the queue already. The application should then allow kevent() to return pending events first before it adds new ones. If the kernel returns unbound events in preference to other events, a kevent() call with nevents >= 2 * nchanges cannot result in a net increase in the number of current and potential unbound events, since it allows the kernel to return (and forget) as many unbound events as it may add (nchanges entries are required for EV_ERROR leaving nchanges for returning other events). > It is not required that these two flags have distinct values; since > one is userland->kernel and the other kernel->userland, they could for > neatness reuse the same bit field. I think it would be consistent with other EV_* to use the same name and value for both. -- Jilles Tjoelker