From owner-freebsd-hackers Sun Jul 4 3:16:24 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from herring.nlsystems.com (nlsys.demon.co.uk [158.152.125.33]) by hub.freebsd.org (Postfix) with ESMTP id 05A5F14C0B for ; Sun, 4 Jul 1999 03:16:15 -0700 (PDT) (envelope-from dfr@nlsystems.com) Received: from salmon.nlsystems.com (salmon.nlsystems.com [10.0.0.3]) by herring.nlsystems.com (8.9.3/8.8.8) with ESMTP id LAA21547; Sun, 4 Jul 1999 11:20:22 +0100 (BST) (envelope-from dfr@nlsystems.com) Date: Sun, 4 Jul 1999 11:15:13 +0100 (BST) From: Doug Rabson To: Jonathan Lemon Cc: hackers@freebsd.org, grog@freebie.lemis.com, peter@netplex.com.au Subject: Re: poll() scalability In-Reply-To: <19990704000042.59954@right.PCS> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 4 Jul 1999, Jonathan Lemon wrote: > > This is an earlier posting that I attempted to make, perhaps > it can provide a starting point for discussion. While this > is already implemented, I'm not adverse to tossing it all for > something better. > -- > Jonathan > > > ----- Forwarded message from owner-freebsd-arch@FreeBSD.ORG ----- > > Date: Mon, 5 Apr 1999 17:42:02 -0500 > From: Jonathan Lemon > To: freebsd-arch@freebsd.org > > I'd like to open discussion on adding a new interface to FreeBSD, > specifically, a variant of poll(). > > The problem is that poll() (and select(), as well) do not scale > well as the number of open file descriptors increases. When there > are a large number of descriptors under consideration, poll() takes > an inordinate amount of time. For the purposes of my particular > application, "large" translates into roughly 40K descriptors. > > As having to walk this descriptor list (and pass it between user and > kernel space) is unpalatable, I would like to have the interface > simply take a "change" list instead. The kernel would keep the > state of the descriptors being examined, and would in turn, return > a short list of descriptors that actually had any activity. > > In essence, I want to move the large "struct pollfd" array that I > have into the kernel, and then instruct the kernel to add/remove > entries from this array, and only return the array subset which > has activity. How does the kernel manage this? Does each process potentially store a struct pollfd in struct proc? This seems a bit limiting since it forces a program to have exactly one call to poll. Peter's description of David Filo's event queue thing seems to solve that problem by introducing a new kernel object (the event queue). > > A possible (actually, my current) implementation looks like this: > > struct fd_change { > short fd; > short events; > }; Limited to 32767 file descriptors. Trivial to change though. Do you remove a fd from the list by setting events to 0? > > int > new_poll( > int nchanges; // entries in new changelist > struct fd_change *changelist; // changes to be made > int n_events; // max size of output list > struct fd_change *event; // returned list of events > int timeout; // timeout (same as poll) > ) > > Where the returned value is either an error, or the number of events > stored in the returned changelist. > > Some pseudo-code that would exercise the interface: > > struct fd_change fc[ MAXCHANGE ]; > > fc[0].fd = 20; > fc[0].events = ADD | READ ; // add, mark read "interest" > > fc[1].fd = -1; // ignore this one > > fc[2].fd = 32; > fc[2].events = DELETE ; // delete previous fd > > fc[3].fd = 46; > fc[3].events = WRITE ; // ask for writable events > > n_changes = new_poll(4, fc, MAXCHANGE, fc, -1); > > > Comments? Note that I haven't discussed the implementation details; > the implementation is done, and can probably be altered/improved, > but I would like to solicit feedback on the feasability of the interface. As I said before I'm uneasy about the kernel tracking the state (list of fds) in the process. A separate kernel object would be a much cleaner solution and would be usable by a program which called poll in many different ways. With this api, a library would be unable to use the new interface since it would not know the new_poll state setup by the main program and would not be able to change it without potentially breaking the caller's state. -- Doug Rabson Mail: dfr@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message