Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Apr 1999 02:03:02 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dot@dotat.at (Tony Finch)
Cc:        smp@FreeBSD.ORG
Subject:   Re: concurrent select()s on listen socket broken under SMP
Message-ID:  <199904090203.TAA14284@usr04.primenet.com>
In-Reply-To: <E10VKig-0006Y3-00@fanf.noc.demon.net> from "Tony Finch" at Apr 8, 99 08:44:10 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> We discovered something odd this evening:
> 
> We have a modified thttpd which forks several times between opening
> its listen socket and dropping into the big select() loop. There is a
> difference in behaviour between uniprocessor machines and SMP machines
> when a connection arrives.
> 
> On a uniprocessor machine, select() only tells one process that a
> connection is available. On a dual processor machine, two processes
> are told that a connection is available: both processes then go on to
> accept() the connection and one of them succeeds but the other blocks.
> This upsets thttpd greatly because it expects the accept() to be
> instantaneous for the purpose of calculating timeouts.
> 
> Because we are on a bastard deadline our current fix is to use a
> uniprocessor kernel, but this is a little bit wasteful. A fix would be
> nice...

IMO, a socket in a state wating for a connection should not select
true to multiple user space processes simultaneously.

This is pretty obviously an SMP specific problem, since the
uniprocessor case works around the problem.

I think that what you would need would be a fix in the select
code itself to respect the big giant lock.

Looking at this code, it appears that it is not SMP safe.  It
calculates things based on state before a sleep, and then uses
the cached information after the tsleep in the case of a retry.

The poll call appears to have the same problem.


Since only one process can be in the kernel at a time, it seems
to me that what's happening is that the process that is in the
kernel tsleep's, allowing another process to enter the kernel,
and then at interrupt time, the wakeup causes both to run, one
on each processor.

And then one of them blocks in the accept() after it loses the
race.


One fix should be to use the wakeup_one().  It looks like this
is broken, though, since it allows a "goto restart" if the process
it wanted to wakeup is asleep.  In terms of using a select on
an fd that represents a mux in order to do "hot scheduling", this
appears to be a lose anyway; processes selecting on the same ident
should be serviced in LIFO order, given an LRU policy for paging (a
fix for this would probably make FreeBSD a much faster server for
"work to do" services that run multiple identical engine processes
and make no state distinction between them -- e.g., Apache).


Another fix might be to obey the lock around the select.  This
is mildly problematic, since it means that if you have something
blocked on it, another process can not select on the same object
unlit the first is done.  This is both FIFO (and therefore bad,
though "fair"), and it doesn't take into account kernel reentrancy
on the basis of an interrupt on the same object (e.g. where the
wakeup is called).

You could probably hack this for network connections, specifically,
under the theory that it's not a real interrupt, but a NETISR, that
is doing the call, so you are both running in the same protection
conflict domain for the lock.

This would basically boil down to adding a field to the select
structure, blocking entries on sleep, and not blocking them when
they want to wakeup.

This would leave a "thundering herd" waiting to sleep on the
object that just selected true, but it would be one running on a
single processor in the kernel.


Probably the easiest way to accomplish this would be to make up
a structure, call it an "smp_tsleep_context", pass it to an
"smp_tsleep" function, and then dereference the select object
ident out of the structure.

Then you:

	put an smp_sleep_context in the struct you are passing
		a "ident", NOT as the first entry (or the ident
		would be the same)
	set the "ident" member to point to the real "ident"
	call smp_sleep

smp_tsleep:
	grab the SMP lock (implicit)
	sleep on the addr of the sleep context, if it's in use
		if it's not, set it "in use"
		sleep on the ident
		when you wake up, set the context not in use
		wakeup the address of the context
		return as if fdrom a normal tsleep


This means that the wakeups on the context only occur in kernel
mode, not interrupt mode, while the wakeups on the ident can
occur at interrupt.


This basically means bloating the structure you are selecting on
in order to make it SMP safe.

This is obviously a trivial implementation; a high granularity SMP
kernel would be able to do this serialization with far less work
and far less bloat.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904090203.TAA14284>