From owner-freebsd-smp Thu Apr 8 23:55: 5 1999 Delivered-To: freebsd-smp@freebsd.org Received: from spinner.netplex.com.au (spinner.netplex.com.au [202.12.86.3]) by hub.freebsd.org (Postfix) with ESMTP id EB76614F6F for ; Thu, 8 Apr 1999 23:54:50 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from spinner.netplex.com.au (localhost [127.0.0.1]) by spinner.netplex.com.au (8.9.3/8.9.3/Netplex) with ESMTP id OAA23047; Fri, 9 Apr 1999 14:52:13 +0800 (WST) (envelope-from peter@spinner.netplex.com.au) Message-Id: <199904090652.OAA23047@spinner.netplex.com.au> X-Mailer: exmh version 2.0.2 2/24/98 To: Terry Lambert Cc: barney@databus.com (Barney Wolff), smp@FreeBSD.ORG, Tony Finch Subject: Re: concurrent select()s on listen socket broken under SMP In-reply-to: Your message of "Fri, 09 Apr 1999 02:03:33 GMT." <199904090203.TAA14310@usr04.primenet.com> Date: Fri, 09 Apr 1999 14:52:13 +0800 From: Peter Wemm Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert wrote: > > Seems to me you need to set the socket non-blocking, and then handle > > the EWOULDBLOCK on the fd that loses. I don't think this should be > > considered a kernel error. > > It's SMP specific. This is *NOT* a SMP specific problem, it's a generic problem in the socket API and implementation. You can trigger it on a UP kernel too, it just happens to be real easy under SMP. See rev 1.52 and 1.54 of uipc_socket.c and related files for a description of what's really going on. (the problem there is from a different angle, it deals with remote users being able to trigger the race remotely as a denial-of-service attack). The original poster didn't mention what version of FreeBSD was in use. 3.1-RELEASE has this bug... It's fixed in 4.0 and 3.1-STABLE (after 26-feb-99). Terry, rest assured, select() and poll() are quite SMP safe. The other serious problem is select collisions. I'd strongly advise using a mutex so that only one process is ever in a select() for waiting for connections on a socket. If two or more processes select() waiting for the listening socket to become readable, when it does - a select collision happens. ie: *EVERY SINGLE PROCESS* that was asleep in select gets woken up, and runs, and scans it's select tables, and 99% go back to sleep. If you have a couple of hundred processes asleep in select(), this hits like a bomb going off. You can get a spontanious load average jump from 0.2 to 100+ in an instant, depending on when loadav() runs. Select collisions are a well understood problem, and there is no cheap, easy, trivial fix that scales well. (Sure, making the selinfo struct have 5 pid's rather than 1 would work for <= 6 processes waiting, but the problem is still there for 6. you can't allocate memory for it due to the nature of the implementation without modifying the driver interface so that there is an "d_unselect()" handler or something - selinfo's get released long after the process has given up and gone onto something else. Changing the driver API _again_ isn't something we want to do yet...) Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message