Date: Mon, 5 Jul 1999 14:43:52 -0500 From: Jonathan Lemon <jlemon@americantv.com> To: Zach Brown <zab@zabbo.net> Cc: Mike Smith <mike@smith.net.au>, hackers@FreeBSD.ORG Subject: Re: poll() scalability Message-ID: <19990705144352.55649@right.PCS> In-Reply-To: <Pine.LNX.4.10.9907050010030.5548-100000@hoser>; from Zach Brown on Jul 07, 1999 at 01:10:38AM -0400 References: <19990704175106.56355@right.PCS> <Pine.LNX.4.10.9907050010030.5548-100000@hoser>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 07, 1999 at 01:10:38AM -0400, Zach Brown wrote: > the sigio/siginfo model is a few orders of magnitude cheaper than > poll/select as you scale the number of fds you're watching. The reasons > for this being that select()/poll() have that large chunk of state to > throw around every syscall, and the real world behaviour of very rarely > ever returning more than than a few active pollfds Yes; that's effectively what the "delta" method I'm using now gets rid of; it only passes around state changes instead of the entire state. I agree that passing around the entire state is quite un-efficient; that's what I would like to get rid of. We just need to agree on a new method. > with the sigio/siginfo model you register your interest in the fd at fd > creation. from then on, when a POLL_ event happens on the fd we notice > that it has an rt signal queue registered and a siginfo struct is tacked > onto the end. these code paths can be nice and light. the siginfo > enqueueing can be pointed at multiple queues by registering a process > group with F_SETOWN, etc. Yes, but I also need support for temporarily "de-registering" interest in an fd, as well selectively choosing read/write/close events. > its important to notice that we don't actually use signal delivery for > this sigio/siginfo stuff, we mask the signal and use signwaitinfo() to > block or pop the next siginfo struct off the queue. dealing with async > signals jumping in would be annoying, and to do it right one would > probably want to simply enqueue the siginfo delivered to the signal > handler into a nice fifo that the real thread of execution would deal > with.. instead of doing all this grossness, we just let the kernel > maintain the siginfo queue. In this case, it doesn't seem all that different than a poll() type call, or an event queue that Peter was talking about. If the signal is blocked, then sigwaitinfo() effectively becomes a poll(), but with no timeout parameter. > its quite like the 'delta poll' system proposed, but with differently > inelegant semantics. I'd say if one were to design an event > queueing/notification system and add a new api for it, we'd want to do it > correctly from the get-go and lose the similarity to existing interfaces > entirely unless they really makes sense to behave like them (which it > doesn't in the poll() case, imho) I agree. One aspect of the design that should be explored is whether the new call should report "events", or "state". My current implementation reports state; (think of it as a level-triggered design), while the siginfo approach appears to be more of an "edge-triggered" design. I just looked at Banga's USENIX paper and they have a nice discussion of this issue. > setup sigio and such on new fd (dorky, we have to do this in > linux rather than inheriting it from the listening fd. > but it has yet to show up on the profile radar, so, > whatever :)) Hah. It showed up on my profiling; the kernel I'm running has routines so the child fd inherits certain settings from the parent. > read() in the header (usually done in one read, but rarely > will block and require falling back to a POLL_IN on > the new fd) Well, we _NEVER_ want to block. Ever. And in my particular case, it is quite common for the header/data to be spread out over several reads. > of course, this could change if you had a situation where you could burn > through events like nothing else and simply couldn't deal with the > lock-step.. Correct. I need this for a web caching proxy. The above loop won't work in my particular case. > > Also, I would guess that you would start getting into locking problems, > > and how to cancel a signal which has already been posted. > > locking problems? For asynchronous signal delivery, you alluded to this problem earlier as well. Since you're blocking signals, this isn't a problem. > yes, the possibility of getting stale events in the queue is _annoying_. > This is going to be a problem in any system that passes state deltas to > the process in a queued manner. hacks could be put in, and perhaps > should, to remove events in the queue for a fd when it is closed, etc. > > take the web server case again. it is quite possible to close() an fd > while there is an event queued for it, and then accept() a new fd that now > has a bogus event coming down the pipe for it. I get around this garbage > in the cheesy web server by doing deferred close()s on fds based on the > length of the queue when I stopped being interested in the fd (and as such > turned off sigio delivery). Its gross. Exactly. Sometimes, we just want to close() an fd, even if there are pending events, and then immediately re-open with a new fd. Deferred closes are not an option. What I do at the moment is remove queued events/state when the fd is closed. (actually, my implementation sucks a bit, as I re-scan the state for this particular case). > but even with these problems, the rt signal queue is quite powerful. to > do better would require a fair bit of engineering, and one might quickly > be bogged down in featuritis. Well, I'm sure that we have a lot of engineering talent around here. :-) -- Jonathan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990705144352.55649>