Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Jul 1999 14:43:52 -0500
From:      Jonathan Lemon <jlemon@americantv.com>
To:        Zach Brown <zab@zabbo.net>
Cc:        Mike Smith <mike@smith.net.au>, hackers@FreeBSD.ORG
Subject:   Re: poll() scalability
Message-ID:  <19990705144352.55649@right.PCS>
In-Reply-To: <Pine.LNX.4.10.9907050010030.5548-100000@hoser>; from Zach Brown on Jul 07, 1999 at 01:10:38AM -0400
References:  <19990704175106.56355@right.PCS> <Pine.LNX.4.10.9907050010030.5548-100000@hoser>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 07, 1999 at 01:10:38AM -0400, Zach Brown wrote:
> the sigio/siginfo model is a few orders of magnitude cheaper than
> poll/select as you scale the number of fds you're watching.  The reasons
> for this being that select()/poll() have that large chunk of state to
> throw around every syscall, and the real world behaviour of very rarely
> ever returning more than than a few active pollfds

Yes; that's effectively what the "delta" method I'm using now gets
rid of; it only passes around state changes instead of the entire state.
I agree that passing around the entire state is quite un-efficient; that's
what I would like to get rid of.  We just need to agree on a new method.


> with the sigio/siginfo model you register your interest in the fd at fd
> creation. from then on, when a POLL_ event happens on the fd we notice
> that it has an rt signal queue registered and a siginfo struct is tacked
> onto the end.  these code paths can be nice and light.  the siginfo
> enqueueing can be pointed at multiple queues by registering a process
> group with F_SETOWN, etc.

Yes, but I also need support for temporarily "de-registering" interest
in an fd, as well selectively choosing read/write/close events.


> its important to notice that we don't actually use signal delivery for
> this sigio/siginfo stuff, we mask the signal and use signwaitinfo() to
> block or pop the next siginfo struct off the queue.  dealing with async
> signals jumping in would be annoying, and to do it right one would
> probably want to simply enqueue the siginfo delivered to the signal
> handler into a nice fifo that the real thread of execution would deal
> with.. instead of doing all this grossness, we just let the kernel
> maintain the siginfo queue.

In this case, it doesn't seem all that different than a poll() type
call, or an event queue that Peter was talking about.  If the signal
is blocked, then sigwaitinfo() effectively becomes a poll(), but with
no timeout parameter.


> its quite like the 'delta poll' system proposed, but with differently
> inelegant semantics.  I'd say if one were to design an event
> queueing/notification system and add a new api for it, we'd want to do it
> correctly from the get-go and lose the similarity to existing interfaces
> entirely unless they really makes sense to behave like them (which it
> doesn't in the poll() case, imho)

I agree.  One aspect of the design that should be explored is whether
the new call should report "events", or "state".  My current implementation
reports state; (think of it as a level-triggered design), while the 
siginfo approach appears to be more of an "edge-triggered" design.  I
just looked at Banga's USENIX paper and they have a nice discussion of  
this issue.  

 
> 	setup sigio and such on new fd (dorky, we have to do this in
> 		linux rather than inheriting it from the listening fd.
> 		but it has yet to show up on the profile radar, so, 
> 		whatever :))

Hah.  It showed up on my profiling; the kernel I'm running has routines
so the child fd inherits certain settings from the parent.


> 	read() in the header (usually done in one read, but rarely
> 		will block and require falling back to a POLL_IN on
> 		the new fd)

Well, we _NEVER_ want to block.  Ever.  And in my particular case, 
it is quite common for the header/data to be spread out over several
reads.


> of course, this could change if you had a situation where you could burn
> through events like nothing else and simply couldn't deal with the
> lock-step..

Correct.  I need this for a web caching proxy.  The above loop won't work
in my particular case.
		

> > Also, I would guess that you would start getting into locking problems,
> > and how to cancel a signal which has already been posted. 
> 
> locking problems?

For asynchronous signal delivery, you alluded to this problem earlier 
as well.  Since you're blocking signals, this isn't a problem.


> yes, the possibility of getting stale events in the queue is _annoying_.  
> This is going to be a problem in any system that passes state deltas to
> the process in a queued manner.  hacks could be put in, and perhaps
> should, to remove events in the queue for a fd when it is closed, etc.
>
> take the web server case again.  it is quite possible to close() an fd
> while there is an event queued for it, and then accept() a new fd that now
> has a bogus event coming down the pipe for it. I get around this garbage
> in the cheesy web server by doing deferred close()s on fds based on the
> length of the queue when I stopped being interested in the fd (and as such
> turned off sigio delivery).  Its gross.

Exactly.  Sometimes, we just want to close() an fd, even if there are 
pending events, and then immediately re-open with a new fd.  Deferred
closes are not an option.  What I do at the moment is remove queued 
events/state when the fd is closed.  (actually, my implementation sucks
a bit, as I re-scan the state for this particular case).


> but even with these problems, the rt signal queue is quite powerful.  to
> do better would require a fair bit of engineering, and one might quickly
> be bogged down in featuritis.

Well, I'm sure that we have a lot of engineering talent around here.  :-)
--
Jonathan


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990705144352.55649>