Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Jul 1999 11:15:13 +0100 (BST)
From:      Doug Rabson <dfr@nlsystems.com>
To:        Jonathan Lemon <jlemon@americantv.com>
Cc:        hackers@freebsd.org, grog@freebie.lemis.com, peter@netplex.com.au
Subject:   Re: poll() scalability
Message-ID:  <Pine.BSF.4.10.9907041107490.15087-100000@salmon.nlsystems.com>
In-Reply-To: <19990704000042.59954@right.PCS>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 4 Jul 1999, Jonathan Lemon wrote:

> 
> This is an earlier posting that I attempted to make, perhaps
> it can provide a starting point for discussion.  While this 
> is already implemented, I'm not adverse to tossing it all for
> something better.
> --
> Jonathan
> 
> 
> ----- Forwarded message from owner-freebsd-arch@FreeBSD.ORG -----
> 
> Date: Mon, 5 Apr 1999 17:42:02 -0500
> From: Jonathan Lemon <jlemon@cs.wisc.edu>
> To: freebsd-arch@freebsd.org
> 
> I'd like to open discussion on adding a new interface to FreeBSD,
> specifically, a variant of poll().
> 
> The problem is that poll() (and select(), as well) do not scale
> well as the number of open file descriptors increases.  When there
> are a large number of descriptors under consideration, poll() takes
> an inordinate amount of time.  For the purposes of my particular 
> application, "large" translates into roughly 40K descriptors.
> 
> As having to walk this descriptor list (and pass it between user and
> kernel space) is unpalatable, I would like to have the interface
> simply take a "change" list instead.  The kernel would keep the 
> state of the descriptors being examined, and would in turn, return
> a short list of descriptors that actually had any activity.
> 
> In essence, I want to move the large "struct pollfd" array that I 
> have into the kernel, and then instruct the kernel to add/remove
> entries from this array, and only return the array subset which
> has activity.

How does the kernel manage this? Does each process potentially store a
struct pollfd in struct proc? This seems a bit limiting since it forces a
program to have exactly one call to poll.

Peter's description of David Filo's event queue thing seems to solve that
problem by introducing a new kernel object (the event queue).

> 
> A possible (actually, my current) implementation looks like this:
> 
> struct fd_change {
>         short   fd;
>         short   events; 
> }; 

Limited to 32767 file descriptors. Trivial to change though. Do you remove
a fd from the list by setting events to 0?

> 
> int
> new_poll(
> 	int nchanges;			// entries in new changelist
> 	struct fd_change *changelist;	// changes to be made
> 	int n_events;			// max size of output list
> 	struct fd_change *event;	// returned list of events
> 	int timeout;			// timeout (same as poll)
> )
> 
> Where the returned value is either an error, or the number of events
> stored in the returned changelist.
> 
> Some pseudo-code that would exercise the interface:
> 
> 	struct fd_change fc[ MAXCHANGE ];
> 
> 	fc[0].fd = 20;
> 	fc[0].events = ADD | READ ;	// add, mark read "interest"
> 
> 	fc[1].fd = -1;			// ignore this one
> 
> 	fc[2].fd = 32;
> 	fc[2].events = DELETE ;		// delete previous fd
> 
> 	fc[3].fd = 46;
> 	fc[3].events = WRITE ;		// ask for writable events
> 
> 	n_changes = new_poll(4, fc, MAXCHANGE, fc, -1);
> 
> 
> Comments?  Note that I haven't discussed the implementation details;
> the implementation is done, and can probably be altered/improved, 
> but I would like to solicit feedback on the feasability of the interface.

As I said before I'm uneasy about the kernel tracking the state (list of
fds) in the process. A separate kernel object would be a much cleaner
solution and would be usable by a program which called poll in many
different ways.

With this api, a library would be unable to use the new interface since it
would not know the new_poll state setup by the main program and would not
be able to change it without potentially breaking the caller's state.

--
Doug Rabson				Mail:  dfr@nlsystems.com
Nonlinear Systems Ltd.			Phone: +44 181 442 9037




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9907041107490.15087-100000>