From owner-freebsd-hackers Sun Jul 4 22:11:28 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from hoser.devel (hoser.devel.redhat.com [207.175.42.139]) by hub.freebsd.org (Postfix) with ESMTP id A94EC14EAB for ; Sun, 4 Jul 1999 22:11:24 -0700 (PDT) (envelope-from zab@zabbo.net) Received: from localhost (zab@localhost) by hoser.devel (8.9.3/8.9.3) with ESMTP id BAA17867; Mon, 5 Jul 1999 01:10:38 -0400 X-Authentication-Warning: hoser.devel: zab owned process doing -bs Date: Mon, 5 Jul 1999 01:10:38 -0400 (EDT) From: Zach Brown X-Sender: zab@hoser To: Jonathan Lemon Cc: Mike Smith , hackers@FreeBSD.ORG Subject: Re: poll() scalability In-Reply-To: <19990704175106.56355@right.PCS> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, 4 Jul 1999, Jonathan Lemon wrote: > I would think that a system that uses callbacks (like POSIX's completion > signals) would be more expensive than a call like poll() or select(). the sigio/siginfo model is a few orders of magnitude cheaper than poll/select as you scale the number of fds you're watching. The reasons for this being that select()/poll() have that large chunk of state to throw around every syscall, and the real world behaviour of very rarely ever returning more than than a few active pollfds with the sigio/siginfo model you register your interest in the fd at fd creation. from then on, when a POLL_ event happens on the fd we notice that it has an rt signal queue registered and a siginfo struct is tacked onto the end. these code paths can be nice and light. the siginfo enqueueing can be pointed at multiple queues by registering a process group with F_SETOWN, etc. ( and yes, the siginfo struct has stuff for telling what process just sent you a signal via kill(), posix timers, normal signal delivery, telling things about the child that just send you sigchld, faulting addr for segv and friends, in addition to the band (POLL_) info for sigio ) its important to notice that we don't actually use signal delivery for this sigio/siginfo stuff, we mask the signal and use signwaitinfo() to block or pop the next siginfo struct off the queue. dealing with async signals jumping in would be annoying, and to do it right one would probably want to simply enqueue the siginfo delivered to the signal handler into a nice fifo that the real thread of execution would deal with.. instead of doing all this grossness, we just let the kernel maintain the siginfo queue. its quite like the 'delta poll' system proposed, but with differently inelegant semantics. I'd say if one were to design an event queueing/notification system and add a new api for it, we'd want to do it correctly from the get-go and lose the similarity to existing interfaces entirely unless they really makes sense to behave like them (which it doesn't in the poll() case, imho) > Also, you really want to return more than one event at at time in > order to amortize the cost of the system call over several events, this > doesn't seem possible with callbacks (or upcalls). yes, that would be a nice behaviour, but I haven't seen it become a real issue yet. the sigwaitinfo() syscall is just so much lighter than all the other things going on in the situation where you actually use this system. for example, using this to serve a web page in the super fast case looks something like: sigwaitinfo() - aha, POLL_IN on listening socket.. accept() new fd setup sigio and such on new fd (dorky, we have to do this in linux rather than inheriting it from the listening fd. but it has yet to show up on the profile radar, so, whatever :)) read() in the header (usually done in one read, but rarely will block and require falling back to a POLL_IN on the new fd) parse header, ideally hash/lookup. write() out the precalced header and premapped data. perhaps a writev() if you're a wimp :) :) so even in the ridiculously light path of a cheating caching webserver, the overhead of copying the siginfo over is dwarfed by the rest of the stuff we're doing in response to the event. of course, this could change if you had a situation where you could burn through events like nothing else and simply couldn't deal with the lock-step.. > Also, I would guess that you would start getting into locking problems, > and how to cancel a signal which has already been posted. locking problems? yes, the possibility of getting stale events in the queue is _annoying_. This is going to be a problem in any system that passes state deltas to the process in a queued manner. hacks could be put in, and perhaps should, to remove events in the queue for a fd when it is closed, etc. take the web server case again. it is quite possible to close() an fd while there is an event queued for it, and then accept() a new fd that now has a bogus event coming down the pipe for it. I get around this garbage in the cheesy web server by doing deferred close()s on fds based on the length of the queue when I stopped being interested in the fd (and as such turned off sigio delivery). Its gross. but even with these problems, the rt signal queue is quite powerful. to do better would require a fair bit of engineering, and one might quickly be bogged down in featuritis. -- zach - - - - - - 007 373 5963 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message