Date: Fri, 9 Apr 1999 02:03:02 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: dot@dotat.at (Tony Finch) Cc: smp@FreeBSD.ORG Subject: Re: concurrent select()s on listen socket broken under SMP Message-ID: <199904090203.TAA14284@usr04.primenet.com> In-Reply-To: <E10VKig-0006Y3-00@fanf.noc.demon.net> from "Tony Finch" at Apr 8, 99 08:44:10 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> We discovered something odd this evening: > > We have a modified thttpd which forks several times between opening > its listen socket and dropping into the big select() loop. There is a > difference in behaviour between uniprocessor machines and SMP machines > when a connection arrives. > > On a uniprocessor machine, select() only tells one process that a > connection is available. On a dual processor machine, two processes > are told that a connection is available: both processes then go on to > accept() the connection and one of them succeeds but the other blocks. > This upsets thttpd greatly because it expects the accept() to be > instantaneous for the purpose of calculating timeouts. > > Because we are on a bastard deadline our current fix is to use a > uniprocessor kernel, but this is a little bit wasteful. A fix would be > nice... IMO, a socket in a state wating for a connection should not select true to multiple user space processes simultaneously. This is pretty obviously an SMP specific problem, since the uniprocessor case works around the problem. I think that what you would need would be a fix in the select code itself to respect the big giant lock. Looking at this code, it appears that it is not SMP safe. It calculates things based on state before a sleep, and then uses the cached information after the tsleep in the case of a retry. The poll call appears to have the same problem. Since only one process can be in the kernel at a time, it seems to me that what's happening is that the process that is in the kernel tsleep's, allowing another process to enter the kernel, and then at interrupt time, the wakeup causes both to run, one on each processor. And then one of them blocks in the accept() after it loses the race. One fix should be to use the wakeup_one(). It looks like this is broken, though, since it allows a "goto restart" if the process it wanted to wakeup is asleep. In terms of using a select on an fd that represents a mux in order to do "hot scheduling", this appears to be a lose anyway; processes selecting on the same ident should be serviced in LIFO order, given an LRU policy for paging (a fix for this would probably make FreeBSD a much faster server for "work to do" services that run multiple identical engine processes and make no state distinction between them -- e.g., Apache). Another fix might be to obey the lock around the select. This is mildly problematic, since it means that if you have something blocked on it, another process can not select on the same object unlit the first is done. This is both FIFO (and therefore bad, though "fair"), and it doesn't take into account kernel reentrancy on the basis of an interrupt on the same object (e.g. where the wakeup is called). You could probably hack this for network connections, specifically, under the theory that it's not a real interrupt, but a NETISR, that is doing the call, so you are both running in the same protection conflict domain for the lock. This would basically boil down to adding a field to the select structure, blocking entries on sleep, and not blocking them when they want to wakeup. This would leave a "thundering herd" waiting to sleep on the object that just selected true, but it would be one running on a single processor in the kernel. Probably the easiest way to accomplish this would be to make up a structure, call it an "smp_tsleep_context", pass it to an "smp_tsleep" function, and then dereference the select object ident out of the structure. Then you: put an smp_sleep_context in the struct you are passing a "ident", NOT as the first entry (or the ident would be the same) set the "ident" member to point to the real "ident" call smp_sleep smp_tsleep: grab the SMP lock (implicit) sleep on the addr of the sleep context, if it's in use if it's not, set it "in use" sleep on the ident when you wake up, set the context not in use wakeup the address of the context return as if fdrom a normal tsleep This means that the wakeups on the context only occur in kernel mode, not interrupt mode, while the wakeups on the ident can occur at interrupt. This basically means bloating the structure you are selecting on in order to make it SMP safe. This is obviously a trivial implementation; a high granularity SMP kernel would be able to do this serialization with far less work and far less bloat. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904090203.TAA14284>