From owner-freebsd-net@FreeBSD.ORG  Sat Mar 30 20:22:50 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id EA21DAB1;
 Sat, 30 Mar 2013 20:22:50 +0000 (UTC)
 (envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8])
 by mx1.freebsd.org (Postfix) with ESMTP id B9C15A0F;
 Sat, 30 Mar 2013 20:22:50 +0000 (UTC)
Received: from bitblocks.com (localhost [127.0.0.1])
 by mail.bitblocks.com (Postfix) with ESMTP id F218BB82A;
 Sat, 30 Mar 2013 13:22:49 -0700 (PDT)
To: John-Mark Gurney <jmg@funkthat.com>
Subject: Re: close(2) while accept(2) is blocked
In-reply-to: Your message of "Sat, 30 Mar 2013 09:14:34 PDT."
 <20130330161434.GG76354@funkthat.com>
References: <515475C7.6010404@FreeBSD.org>
 <CANVK_QgnC-pLGwh7Oad87JO_z1WmLeY3kfT9HhdpSzMnpjdNgA@mail.gmail.com>
 <20130329235431.32D7FB82A@mail.bitblocks.com>
 <20130330161434.GG76354@funkthat.com>
Comments: In-reply-to John-Mark Gurney <jmg@funkthat.com>
 message dated "Sat, 30 Mar 2013 09:14:34 -0700."
Date: Sat, 30 Mar 2013 13:22:49 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20130330202249.F218BB82A@mail.bitblocks.com>
Cc: freebsd-net@freebsd.org, Carl Shapiro <carl.shapiro@gmail.com>,
 Andriy Gapon <avg@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 30 Mar 2013 20:22:51 -0000

On Sat, 30 Mar 2013 09:14:34 PDT John-Mark Gurney <jmg@funkthat.com> wrote:
> 
> As someone else pointed out in this thread, if a userland program
> depends upon this behavior, it has a race condition in it...
> 
> Thread 1		Thread 2		Thread 3
> 						enters routine to read
> enters routine to close
> calls close(3)
> 			open() returns 3
> 						does read(3) for orignal fd
> 
> How can the original threaded program ensure that thread 2 doesn't
> create a new fd in between?  So even if you use a lock, this won't
> help, because as far as I know, there is no enter read and unlock
> mutex call yet...

It is worse. Consider:

	fd = open(file,...);
	read(fd, ...);

No guarantee read() gets data from the same opened file!
Another thread could've come along, closed fd and pointed it
to another file. So nothing is safe. Might as well stop using
threads, right?!

We are talking about cooperative threads where you don't have
to assume the worst case.  Here not being notified on a close
event can complicate things. As an example, I have done
something like this in the past: A frontend process validating
TCP connections and then passing on valid TCP connections to
another process for actual service (via sendmsg() over a unix
domain). All the worker threads in service process can do a
recvmsg() on the same fd. They process whatever tcp connection
they get. Now what happens when the frontend process is
restarted for some reason?  All the worker threads need to
eventually reconnect to a new unix domain posted by the new
frontend process. You can handle this multiple ways but
terminating all the blocking syscalls on the now invalid fd is
the simplest solution from a user perspective.

> I decided long ago that this is only solvable by proper use of locking
> and ensuring that if you call close (the syscall), that you do not have
> any other thread that may use the fd.  It's the close routine's (not
> syscall) function to make sure it locks out other threads and all other
> are out of the code path that will use the fd before it calls close..

If you lock before close(), you have to lock before every
other syscall on that fd. That complicates userland coding and
slows down things when this can be handled more simply in the
kernel.

Another usecase is where N worker threads all accept() on the
same fd. Single threading using a lock defeats any performance
gain.

> If someone could describe how this new eject a person from read could
> be done in a race safe way, then I'd say go ahead w/ it...  Otherwise
> we're just moving the race around, and letting people think that they
> have solved the problem when they haven't...

In general it just makes sense to notify everyone waiting on
something that the situation has changed and they are going to
have to wait forever.  The kernel should already have the
necessary information about which threads are sleeping on a
fd. Wake them all up. On being awakened they see that the fd
is no longer valid and all return with a count of data already
read or -1 and EBADF. Doing the equivalent in userland is
complicated.

Carl has pointed out how BSD and Linux have required a
workaround compared to Solaris and OS X (in Java and IIRC, the
Go runtime). Seems like we have a number of usecases and this
is something worth fixing.