Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Apr 1999 00:39:19 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        peter@netplex.com.au (Peter Wemm)
Cc:        tlambert@primenet.com, barney@databus.com, smp@FreeBSD.ORG, dot@dotat.at
Subject:   Re: concurrent select()s on listen socket broken under SMP
Message-ID:  <199904100039.RAA05457@usr01.primenet.com>
In-Reply-To: <199904090652.OAA23047@spinner.netplex.com.au> from "Peter Wemm" at Apr 9, 99 02:52:13 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> This is *NOT* a SMP specific problem, it's a generic problem in the socket
> API and implementation.  You can trigger it on a UP kernel too, it just
> happens to be real easy under SMP.
> 
> See rev 1.52 and 1.54 of uipc_socket.c and related files for a description
> of what's really going on.  (the problem there is from a different angle,
> it deals with remote users being able to trigger the race remotely
> as a denial-of-service attack).
> 
> The original poster didn't mention what version of FreeBSD was in use. 
> 3.1-RELEASE has this bug...  It's fixed in 4.0 and 3.1-STABLE (after
> 26-feb-99).

He said he was running a curtrent version of 3.1-STABLE.


> Terry, rest assured, select() and poll() are quite SMP safe.
> 
> The other serious problem is select collisions.  I'd strongly advise using
> a mutex so that only one process is ever in a select() for waiting for
> connections on a socket.  If two or more processes select() waiting for the
> listening socket to become readable, when it does - a select collision
> happens.  ie: *EVERY SINGLE PROCESS* that was asleep in select gets woken
> up, and runs, and scans it's select tables, and 99% go back to sleep.

Yes.  This is the classic "thundering herd" problem for which wakeup_one()
(or wake_one()) was invented.

The real problem is that for some type of events, you want all
of the processes waiting on the event to go, and on other types of
events, you want only one process to go.

I think for a socket on which a listen has been called, it's pretty
pbvious that you can only service a single accept at a time, and
so based on the character of the object selected, wakeup_one() is
the correct call.

I think that select collisions should be outlawed, unless the
descriptor in question explicitly permits them, but I'd be OK
with having to change the current behaviour by via fcntl on a
per fd basis.


> If you have a couple of hundred processes asleep in select(), this
> hits like a bomb going off.  You can get a spontanious load average
> jump from 0.2 to 100+ in an instant, depending on when loadav() runs.

And this is a design error, which should be corrected, whether you
argue that it's a scheduler problem or a wakeup problem.  Dynix and
SVR4.2 both corrected this problem a while ago, using a wake_one()
approach.


> Select collisions are a well understood problem, and there is no cheap,
> easy, trivial fix that scales well.  (Sure, making the selinfo struct have
> 5 pid's rather than 1 would work for <= 6 processes waiting, but the
> problem is still there for 6.  you can't allocate memory for it due to
> the nature of the implementation without modifying the driver interface
> so that there is an "d_unselect()" handler or something - selinfo's get
> released long after the process has given up and gone onto something
> else.  Changing the driver API _again_ isn't something we want to do
> yet...)

I think what you want is a LIFO queue for people selecting on an fd,
and perhaps a per fd fcntl'able flag that can turn that into a FIFO
policy instead for resources for which there are interactive waits
(clearly, for a server, LIFO is the correct ordering; at Novell, my
suggestion that a LIFO service order be used for the NCP streams mux
resulted in a 35% increase in throughput as a result of reduced
paging load, making the UNIX soloution faster than Native NetWare).

One possibile soloution, which would do dick for the SMP implications,
would be to add a "time_in_select", and then have wakeup_one() only
wake the one who called last, using a linear iteration.  This is
basically what it does now (why the hell does it wakeup more than
one *just* because the first one is swapped out?!?).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904100039.RAA05457>