From owner-freebsd-arch@FreeBSD.ORG  Wed Dec 20 16:22:14 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E814216A500;
	Wed, 20 Dec 2006 16:22:13 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0A13343CA7;
	Wed, 20 Dec 2006 16:21:48 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id A3F4246E2C;
	Wed, 20 Dec 2006 10:48:59 -0500 (EST)
Date: Wed, 20 Dec 2006 15:48:59 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Daniel Eischen <deischen@freebsd.org>
In-Reply-To: <Pine.GSO.4.64.0612130918140.13170@sea.ntplx.net>
Message-ID: <20061220153126.G85384@fledge.watson.org>
References: <32874.1165905843@critter.freebsd.dk>
	<Pine.GSO.4.64.0612121543220.8780@sea.ntplx.net>
	<200612132010.49601.davidxu@freebsd.org>
	<Pine.GSO.4.64.0612130918140.13170@sea.ntplx.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Dec 2006 16:22:14 -0000


On Wed, 13 Dec 2006, Daniel Eischen wrote:

> [CC trimmed]
>
> On Wed, 13 Dec 2006, David Xu wrote:
>
>> On Wednesday 13 December 2006 04:49, Daniel Eischen wrote:
>>> 
>>> Well, if threads waiting on IO are interruptable by signals, can't we make 
>>> a new signal that's only used by the kernel and send it to all threads 
>>> waiting on IO for that descriptor? When it gets out to actually setup the 
>>> signal handler, it just resumes like it is returning from an SA_RESTART 
>>> signal handler (which according to another posting would reissue the IO 
>>> command and get EBADF).
>> 
>> Even if you have implemented the close() with the interruption, another 
>> thread openning a file still can reuse the file handle immediately, 
>> according to specifications, the lowest free file handle will be returned, 
>> if SA_RESTART is used, the interrupted thread restart the syscall, it will 
>> be using a wrong file, I think even if we have implemented the feature in 
>> kernel, useland threads still has serious race to fix.
>
> If you use a special signal that is only used for this purpose, there is no 
> reason you have to try the IO operation again.  You can just return EBADF.
>
> Anyway, this was just a thought/idea.  I don't mean to argue against any of 
> the other reasons why this isn't a good idea.

Whatever may be implemented to solve this issue will require a fairly serious 
re-working of how we implement file descriptor reference counting in the 
kernel.  Do you propose similar "cancellation" of other system calls blocked 
on the file descriptor, including select(), etc?  Typically these system calls 
interact with the underlying object associated with the file descriptor, not 
the file descriptor itself, and often, they act directly on the object and 
release the file descriptor before performing their operation.  I think before 
we can put any reasonable implementation proposal on the table, we need a 
clear set of requirements:

- What is the scope of cancellation?  Are we cancelling oustanding
   simultaneous I/O operations on the same fd index in the process, use of any
   fd pointing at the same open file entry in the process (i.e., all dup'd
   instances), or the same open file entry across all processes?  I've been
   presuming only use of the same fd index in the same process is relevant, but
   if so, let's make sure we state that.  If not, what do we mean?

- Exactly which potentially blocking operations will be cancelled as a result
   of close() of an "in use" file descriptor?  read()?  write()?  sendfile()?
   connect()?  ioctl()?  select()?  poll()?  close()?  Is the set of possible
   cancellation points equal to the existing set of interruptible sleeps?
   Notice that in our current implementation, objects are often reached using a
   file descriptor, but then separately referenced for the duration of the
   operation, with the file descriptor being released.  This means that we
   currently don't maintain any useful list of threads currently interacting
   with the file descriptor, and only have a limited notion of which threads
   are interacting with the underlying object.

- What semantics are expected regarding the underlying object when an
   operation is cancelled due to simultaneous close() on the same file
   descriptor?  Keep in mind that the underlying object may be referenced by
   other file descriptor indexes pointing at the same open file state (shared
   offset, etc).  For example, if we cancel connect(), is it safe to say that
   what we've done is cancel the wait for connect() to complete, rather than
   the connection operation itself, which may continue and be visible on other
   file descriptor indexes referencing the same object, or to other processes
   also referencing it?

While providing Solaris-like semantics here makes some amount of sense, this 
is a very tricky area, and one where we're still refining performance 
behavior, reference counting behavior, etc.  I don't think there will be any 
easy answers, and we need to think through the semantic and performance 
implications of any change very carefully before starting to implement.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Wed Dec 20 18:28:22 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8482916A407
	for <freebsd-arch@freebsd.org>; Wed, 20 Dec 2006 18:28:22 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D906043C9F
	for <freebsd-arch@freebsd.org>; Wed, 20 Dec 2006 18:28:21 +0000 (GMT)
	(envelope-from deischen@freebsd.org)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBKIIBUN022183; 
	Wed, 20 Dec 2006 13:18:11 -0500 (EST)
Date: Wed, 20 Dec 2006 13:18:11 -0500 (EST)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: Robert Watson <rwatson@freebsd.org>
In-Reply-To: <20061220153126.G85384@fledge.watson.org>
Message-ID: <Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
References: <32874.1165905843@critter.freebsd.dk>
	<Pine.GSO.4.64.0612121543220.8780@sea.ntplx.net>
	<200612132010.49601.davidxu@freebsd.org>
	<Pine.GSO.4.64.0612130918140.13170@sea.ntplx.net>
	<20061220153126.G85384@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Message whitelisted by DRAC access database, not delayed by
	milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]);
	Wed, 20 Dec 2006 13:18:11 -0500 (EST)
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net)
Cc: David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Dec 2006 18:28:22 -0000

On Wed, 20 Dec 2006, Robert Watson wrote:

>
> On Wed, 13 Dec 2006, Daniel Eischen wrote:
>
>> 
>> Anyway, this was just a thought/idea.  I don't mean to argue against any of 
>> the other reasons why this isn't a good idea.
>
> Whatever may be implemented to solve this issue will require a fairly serious 
> re-working of how we implement file descriptor reference counting in the 
> kernel.  Do you propose similar "cancellation" of other system calls blocked 
> on the file descriptor, including select(), etc?  Typically these system 
> calls interact with the underlying object associated with the file 
> descriptor, not the file descriptor itself, and often, they act directly on 
> the object and release the file descriptor before performing their operation. 
> I think before we can put any reasonable implementation proposal on the 
> table, we need a clear set of requirements:

[ ... ]

> While providing Solaris-like semantics here makes some amount of sense, this 
> is a very tricky area, and one where we're still refining performance 
> behavior, reference counting behavior, etc.  I don't think there will be any 
> easy answers, and we need to think through the semantic and performance 
> implications of any change very carefully before starting to implement.

I don't think the behavior here has to be any different that
what we currently (or desire to) do with regard to (unblocked)
signals interrupting threads waiting on IO.  You can spend
a lot of time thinking about how close() should affect IO
operations on the same file descriptor, but a very simple
approach is to treat them the same as if the operations were
interrupted by a signal.  I'm not suggesting it is implemented
the same way, just that it seems to make a lot of sense to me
that the behavior is consistent between the two.

-- 
DE

From owner-freebsd-arch@FreeBSD.ORG  Thu Dec 21 00:20:14 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from localhost.my.domain (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 5030E16A403;
	Thu, 21 Dec 2006 00:20:14 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
From: David Xu <davidxu@freebsd.org>
To: freebsd-arch@freebsd.org,
 Daniel Eischen <deischen@freebsd.org>
Date: Thu, 21 Dec 2006 08:20:09 +0800
User-Agent: KMail/1.8.2
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
In-Reply-To: <Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200612210820.09955.davidxu@freebsd.org>
Cc: Robert Watson <rwatson@freebsd.org>
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2006 00:20:14 -0000

On Thursday 21 December 2006 02:18, Daniel Eischen wrote:
> On Wed, 20 Dec 2006, Robert Watson wrote:
> > On Wed, 13 Dec 2006, Daniel Eischen wrote:
> >> Anyway, this was just a thought/idea.  I don't mean to argue against any
> >> of the other reasons why this isn't a good idea.
> >
> > Whatever may be implemented to solve this issue will require a fairly
> > serious re-working of how we implement file descriptor reference counting
> > in the kernel.  Do you propose similar "cancellation" of other system
> > calls blocked on the file descriptor, including select(), etc?  Typically
> > these system calls interact with the underlying object associated with
> > the file descriptor, not the file descriptor itself, and often, they act
> > directly on the object and release the file descriptor before performing
> > their operation. I think before we can put any reasonable implementation
> > proposal on the table, we need a clear set of requirements:
>
> [ ... ]
>
> > While providing Solaris-like semantics here makes some amount of sense,
> > this is a very tricky area, and one where we're still refining
> > performance behavior, reference counting behavior, etc.  I don't think
> > there will be any easy answers, and we need to think through the semantic
> > and performance implications of any change very carefully before starting
> > to implement.
>
> I don't think the behavior here has to be any different that
> what we currently (or desire to) do with regard to (unblocked)
> signals interrupting threads waiting on IO.  You can spend
> a lot of time thinking about how close() should affect IO
> operations on the same file descriptor, but a very simple
> approach is to treat them the same as if the operations were
> interrupted by a signal.  I'm not suggesting it is implemented
> the same way, just that it seems to make a lot of sense to me
> that the behavior is consistent between the two.

I think the main concern is if we will record every thread using a
fd, that means, when you call read() on a fd, you record your
thread pointer into the fd's thread list, when one wants to close
the fd, it has to notify all the threads in the list, set a flag
for each thread, the flag indicates a thread is interrupted
because the fd was closed, when the thread returns from deep code
path to read() syscall, it should check the flag, and return EBADF to
user if it was set. whatever, a reserved signal or TDF_INTERRUPT may
interrupt a thread. but since there are many file operations, I don't
know if we are willing to pay such overheads to every file syscall, 
extra locking is not welcomed.

Regards,
David Xu

From owner-freebsd-arch@FreeBSD.ORG  Thu Dec 21 13:02:53 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E495516A4FC;
	Thu, 21 Dec 2006 13:02:52 +0000 (UTC)
	(envelope-from prvs=jelischer=5032be78a@ironport.com)
Received: from a50.ironport.com (a50.ironport.com [63.251.108.112])
	by mx1.freebsd.org (Postfix) with ESMTP id C20F713C478;
	Thu, 21 Dec 2006 13:02:08 +0000 (UTC)
	(envelope-from prvs=jelischer=5032be78a@ironport.com)
DomainKey-Signature: s=key512; d=ironport.com; c=nofws; q=dns;
	b=V+OKw0zuVq8ZsZGjGy8vtEEmEVbfZ8JijHzBCQeFqlFgVZOUbBcKfeDfU7OFBmtkxT9x+A+pP2Rtf0a5caa58Q==;
Received: from unknown (HELO [10.251.18.229]) ([10.251.18.229])
	by a50.ironport.com with ESMTP; 20 Dec 2006 17:48:02 -0800
Message-ID: <4589E7D2.9010608@ironport.com>
Date: Wed, 20 Dec 2006 17:48:02 -0800
From: Julian Elischer <jelischer@ironport.com>
User-Agent: Thunderbird 1.5.0.8 (Macintosh/20061025)
MIME-Version: 1.0
To: David Xu <davidxu@freebsd.org>
References: <32874.1165905843@critter.freebsd.dk>	<20061220153126.G85384@fledge.watson.org>	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
In-Reply-To: <200612210820.09955.davidxu@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Daniel Eischen <deischen@freebsd.org>, Robert Watson <rwatson@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2006 13:02:53 -0000

David Xu wrote:
> On Thursday 21 December 2006 02:18, Daniel Eischen wrote:
>> On Wed, 20 Dec 2006, Robert Watson wrote:
>>> On Wed, 13 Dec 2006, Daniel Eischen wrote:
>>>> Anyway, this was just a thought/idea.  I don't mean to argue against any
>>>> of the other reasons why this isn't a good idea.
>>> Whatever may be implemented to solve this issue will require a fairly
>>> serious re-working of how we implement file descriptor reference counting
>>> in the kernel.  Do you propose similar "cancellation" of other system
>>> calls blocked on the file descriptor, including select(), etc?  Typically
>>> these system calls interact with the underlying object associated with
>>> the file descriptor, not the file descriptor itself, and often, they act
>>> directly on the object and release the file descriptor before performing
>>> their operation. I think before we can put any reasonable implementation
>>> proposal on the table, we need a clear set of requirements:
>> [ ... ]
>>
>>> While providing Solaris-like semantics here makes some amount of sense,
>>> this is a very tricky area, and one where we're still refining
>>> performance behavior, reference counting behavior, etc.  I don't think
>>> there will be any easy answers, and we need to think through the semantic
>>> and performance implications of any change very carefully before starting
>>> to implement.
>> I don't think the behavior here has to be any different that
>> what we currently (or desire to) do with regard to (unblocked)
>> signals interrupting threads waiting on IO.  You can spend
>> a lot of time thinking about how close() should affect IO
>> operations on the same file descriptor, but a very simple
>> approach is to treat them the same as if the operations were
>> interrupted by a signal.  I'm not suggesting it is implemented
>> the same way, just that it seems to make a lot of sense to me
>> that the behavior is consistent between the two.
> 
> I think the main concern is if we will record every thread using a
> fd, that means, when you call read() on a fd, you record your
> thread pointer into the fd's thread list, when one wants to close
> the fd, it has to notify all the threads in the list, set a flag
> for each thread, the flag indicates a thread is interrupted
> because the fd was closed, when the thread returns from deep code
> path to read() syscall, it should check the flag, and return EBADF to
> user if it was set. whatever, a reserved signal or TDF_INTERRUPT may
> interrupt a thread. but since there are many file operations, I don't
> know if we are willing to pay such overheads to every file syscall, 
> extra locking is not welcomed.

I think you are only intersted in treads that are sleeping..
so you allow a sleeping thread to save a pointer to the fd (or whatever) 
on which it is sleeping, along with the sleep address.

items that are not sleeping are either already returning, or are going 
to sleep, in which case they can check at that time.

> 
> Regards,
> David Xu
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"

From owner-freebsd-arch@FreeBSD.ORG  Thu Dec 21 13:45:58 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 984F716A509;
	Thu, 21 Dec 2006 13:45:58 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 4560C13C475;
	Thu, 21 Dec 2006 13:45:58 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id E39DC47112;
	Thu, 21 Dec 2006 05:38:33 -0500 (EST)
Date: Thu, 21 Dec 2006 10:38:33 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: David Xu <davidxu@freebsd.org>
In-Reply-To: <200612210820.09955.davidxu@freebsd.org>
Message-ID: <20061221102909.O83974@fledge.watson.org>
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Daniel Eischen <deischen@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2006 13:45:58 -0000


On Thu, 21 Dec 2006, David Xu wrote:

> On Thursday 21 December 2006 02:18, Daniel Eischen wrote:
>> On Wed, 20 Dec 2006, Robert Watson wrote:
>>> On Wed, 13 Dec 2006, Daniel Eischen wrote:
>>>> Anyway, this was just a thought/idea.  I don't mean to argue against any
>>>> of the other reasons why this isn't a good idea.
>>>
>>> Whatever may be implemented to solve this issue will require a fairly 
>>> serious re-working of how we implement file descriptor reference counting 
>>> in the kernel.  Do you propose similar "cancellation" of other system 
>>> calls blocked on the file descriptor, including select(), etc?  Typically 
>>> these system calls interact with the underlying object associated with the 
>>> file descriptor, not the file descriptor itself, and often, they act 
>>> directly on the object and release the file descriptor before performing 
>>> their operation. I think before we can put any reasonable implementation 
>>> proposal on the table, we need a clear set of requirements:
>>
>> [ ... ]
>>
>>> While providing Solaris-like semantics here makes some amount of sense, 
>>> this is a very tricky area, and one where we're still refining performance 
>>> behavior, reference counting behavior, etc.  I don't think there will be 
>>> any easy answers, and we need to think through the semantic and 
>>> performance implications of any change very carefully before starting to 
>>> implement.
>>
>> I don't think the behavior here has to be any different that what we 
>> currently (or desire to) do with regard to (unblocked) signals interrupting 
>> threads waiting on IO.  You can spend a lot of time thinking about how 
>> close() should affect IO operations on the same file descriptor, but a very 
>> simple approach is to treat them the same as if the operations were 
>> interrupted by a signal.  I'm not suggesting it is implemented the same 
>> way, just that it seems to make a lot of sense to me that the behavior is 
>> consistent between the two.
>
> I think the main concern is if we will record every thread using a fd, that 
> means, when you call read() on a fd, you record your thread pointer into the 
> fd's thread list, when one wants to close the fd, it has to notify all the 
> threads in the list, set a flag for each thread, the flag indicates a thread 
> is interrupted because the fd was closed, when the thread returns from deep 
> code path to read() syscall, it should check the flag, and return EBADF to 
> user if it was set. whatever, a reserved signal or TDF_INTERRUPT may 
> interrupt a thread. but since there are many file operations, I don't know 
> if we are willing to pay such overheads to every file syscall, extra locking 
> is not welcomed.

Yes, as well as adding quite a bit of complexity and opening the door for some 
rather odd/unfortunate races.  You can inspect the bulk of the Solaris 
implementation by looking at three spots:

http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=closeandsetf 
http://fxr.watson.org/fxr/ident?v=OPENSOLARIS;i=post_syscall 
http://fxr.watson.org/fxr/search?v=OPENSOLARIS&string=MUSTRETURN

In closeandsetf(), you can see that an additional layer of indirection 
associated with the file descriptor is maintained in order to count consumers 
of a particular fd, not just the open file record, and the set of active fds 
for each thread is maintained.  When a close() is performed and there are 
still other open consumers, the process is suspended and all threads are 
inspected to see if the fd is active for the thread, in which case a thread 
flag indicating that a stale fd is set.  I believe that the interrupt here is 
an implicit part of the process suspend/restart, and in post_syscall() the 
EINTR returns are remapped to EBADF.

That extra level of indirection and use tracking will be both complex and a 
performance hit in a critical kernel path.  I'm not opposed to investigating 
implementing something along these lines, but I think we should defer this for 
some time while we sort out more pressing issues in our kernel file 
descriptor/socket/etc code and revist this in a few months.  We will need to 
carefully evaluate the performance costs, and if they are significant, figure 
out how to avoid this causing a significant hit.  It's worth observing that 
removing one level of reference counting from the socket send/receive paths 
(using the file descriptor reference instead of the socket reference) made a 
5%+ difference in high speed send performance.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Thu Dec 21 15:22:17 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8E92716A412;
	Thu, 21 Dec 2006 15:22:17 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E78913C44B;
	Thu, 21 Dec 2006 15:22:17 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id A4B8B46FC2;
	Thu, 21 Dec 2006 10:22:16 -0500 (EST)
Date: Thu, 21 Dec 2006 15:22:16 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Julian Elischer <jelischer@ironport.com>
In-Reply-To: <4589E7D2.9010608@ironport.com>
Message-ID: <20061221152115.U83974@fledge.watson.org>
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Daniel Eischen <deischen@freebsd.org>, David Xu <davidxu@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2006 15:22:17 -0000

On Wed, 20 Dec 2006, Julian Elischer wrote:

>> I think the main concern is if we will record every thread using a fd, that 
>> means, when you call read() on a fd, you record your thread pointer into 
>> the fd's thread list, when one wants to close the fd, it has to notify all 
>> the threads in the list, set a flag for each thread, the flag indicates a 
>> thread is interrupted because the fd was closed, when the thread returns 
>> from deep code path to read() syscall, it should check the flag, and return 
>> EBADF to user if it was set. whatever, a reserved signal or TDF_INTERRUPT 
>> may interrupt a thread. but since there are many file operations, I don't 
>> know if we are willing to pay such overheads to every file syscall, extra 
>> locking is not welcomed.
>
> I think you are only intersted in treads that are sleeping.. so you allow a 
> sleeping thread to save a pointer to the fd (or whatever) on which it is 
> sleeping, along with the sleep address.
>
> items that are not sleeping are either already returning, or are going to 
> sleep, in which case they can check at that time.

Hence my question about select and poll: should they throw an exception state 
when a file descriptor is closed out from under them?  They often sleep on 
hundreds or thousands of file descriptors, and not just one.

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Thu Dec 21 17:15:49 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 73F2516A412;
	Thu, 21 Dec 2006 17:15:49 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10])
	by mx1.freebsd.org (Postfix) with ESMTP id F2AA613C462;
	Thu, 21 Dec 2006 17:15:48 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBLGeV9x020613; 
	Thu, 21 Dec 2006 11:40:31 -0500 (EST)
Date: Thu, 21 Dec 2006 11:40:31 -0500 (EST)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: Robert Watson <rwatson@freebsd.org>
In-Reply-To: <20061221152115.U83974@fledge.watson.org>
Message-ID: <Pine.GSO.4.64.0612211136160.29290@sea.ntplx.net>
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
	<20061221152115.U83974@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Message whitelisted by DRAC access database, not delayed by
	milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]);
	Thu, 21 Dec 2006 11:40:31 -0500 (EST)
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net)
Cc: Julian Elischer <jelischer@ironport.com>, David Xu <davidxu@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Dec 2006 17:15:49 -0000

On Thu, 21 Dec 2006, Robert Watson wrote:

> On Wed, 20 Dec 2006, Julian Elischer wrote:
>
>>> I think the main concern is if we will record every thread using a fd, 
>>> that means, when you call read() on a fd, you record your thread pointer 
>>> into the fd's thread list, when one wants to close the fd, it has to 
>>> notify all the threads in the list, set a flag for each thread, the flag 
>>> indicates a thread is interrupted because the fd was closed, when the 
>>> thread returns from deep code path to read() syscall, it should check the 
>>> flag, and return EBADF to user if it was set. whatever, a reserved signal 
>>> or TDF_INTERRUPT may interrupt a thread. but since there are many file 
>>> operations, I don't know if we are willing to pay such overheads to every 
>>> file syscall, extra locking is not welcomed.
>> 
>> I think you are only intersted in treads that are sleeping.. so you allow a 
>> sleeping thread to save a pointer to the fd (or whatever) on which it is 
>> sleeping, along with the sleep address.
>> 
>> items that are not sleeping are either already returning, or are going to 
>> sleep, in which case they can check at that time.
>
> Hence my question about select and poll: should they throw an exception state 
> when a file descriptor is closed out from under them?  They often sleep on 
> hundreds or thousands of file descriptors, and not just one.

Yes, I would think so.  Solaris behaves this way also, although
there seems to be a bug in Solaris 8 (version tested) in that
select() returns -1 but errno isn't properly set (it is 0).

-- 
DE

From owner-freebsd-arch@FreeBSD.ORG  Fri Dec 22 02:18:15 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8AB3D16A40F;
	Fri, 22 Dec 2006 02:18:15 +0000 (UTC)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168])
	by mx1.freebsd.org (Postfix) with ESMTP id 36F8F13C458;
	Fri, 22 Dec 2006 02:18:15 +0000 (UTC)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: from hydrogen.funkthat.com (ohaeya3em79mpkgl@localhost.funkthat.com
	[127.0.0.1])
	by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id kBM211Yg031508; 
	Thu, 21 Dec 2006 18:01:01 -0800 (PST)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: (from jmg@localhost)
	by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id kBM211Ml031507;
	Thu, 21 Dec 2006 18:01:01 -0800 (PST) (envelope-from jmg)
Date: Thu, 21 Dec 2006 18:01:01 -0800
From: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
To: Robert Watson <rwatson@FreeBSD.org>
Message-ID: <20061222020101.GC4982@funkthat.com>
Mail-Followup-To: Robert Watson <rwatson@FreeBSD.org>,
	Julian Elischer <jelischer@ironport.com>,
	Daniel Eischen <deischen@freebsd.org>,
	David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
	<20061221152115.U83974@fledge.watson.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20061221152115.U83974@fledge.watson.org>
User-Agent: Mutt/1.4.2.1i
X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386
X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31  96 7A 22 B3 D8 56 36 F4
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
Cc: Daniel Eischen <deischen@FreeBSD.org>,
	Julian Elischer <jelischer@ironport.com>,
	David Xu <davidxu@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 02:18:15 -0000

Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000:
> >I think you are only intersted in treads that are sleeping.. so you allow 
> >a sleeping thread to save a pointer to the fd (or whatever) on which it is 
> >sleeping, along with the sleep address.
> >
> >items that are not sleeping are either already returning, or are going to 
> >sleep, in which case they can check at that time.
> 
> Hence my question about select and poll: should they throw an exception 
> state when a file descriptor is closed out from under them?  They often 
> sleep on hundreds or thousands of file descriptors, and not just one.

IMO, your program is buggy if you close the file descriptor before
everything is out of the kernel wrt the fd...  It means that your close
statement isn't waiting for things to be cleanly shut down, and that
you still have dangling reference counts to the parts of the code that
is in the kernel...

I used to expect something similar w/ an kqueue based event driven
web server, and found that I had bugs due to assuming that I could
close it whenever I want...  What happens if you close the fd between
the time select returns and you process it?  What happens if the fd
gets closed, and another thread (or an earlier fd that accepts
connections) reuses that fd?  And then youre state machine isn't read
to get an event since it isn't suppose to get one yet...

The kernel isn't buggy wrt closing a fd when another thread is using
it, it's the program that's buggy...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Fri Dec 22 02:43:15 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.org
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by hub.freebsd.org (Postfix) with ESMTP id 37E8816A407;
	Fri, 22 Dec 2006 02:43:11 +0000 (UTC)
	(envelope-from davidxu@freebsd.org)
Message-ID: <458B4641.5080808@freebsd.org>
Date: Fri, 22 Dec 2006 10:43:13 +0800
From: David Xu <davidxu@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.13) Gecko/20061204
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
References: <32874.1165905843@critter.freebsd.dk>	<20061220153126.G85384@fledge.watson.org>	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>	<200612210820.09955.davidxu@freebsd.org>	<4589E7D2.9010608@ironport.com>	<20061221152115.U83974@fledge.watson.org>
	<20061222020101.GC4982@funkthat.com>
In-Reply-To: <20061222020101.GC4982@funkthat.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Daniel Eischen <deischen@FreeBSD.org>,
	Julian Elischer <jelischer@ironport.com>,
	Robert Watson <rwatson@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 02:43:15 -0000

John-Mark Gurney wrote:
> Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000:
> 
>>>I think you are only intersted in treads that are sleeping.. so you allow 
>>>a sleeping thread to save a pointer to the fd (or whatever) on which it is 
>>>sleeping, along with the sleep address.
>>>
>>>items that are not sleeping are either already returning, or are going to 
>>>sleep, in which case they can check at that time.
>>
>>Hence my question about select and poll: should they throw an exception 
>>state when a file descriptor is closed out from under them?  They often 
>>sleep on hundreds or thousands of file descriptors, and not just one.
> 
> 
> IMO, your program is buggy if you close the file descriptor before
> everything is out of the kernel wrt the fd...  It means that your close
> statement isn't waiting for things to be cleanly shut down, and that
> you still have dangling reference counts to the parts of the code that
> is in the kernel...
> 
> I used to expect something similar w/ an kqueue based event driven
> web server, and found that I had bugs due to assuming that I could
> close it whenever I want...  What happens if you close the fd between
> the time select returns and you process it?  What happens if the fd
> gets closed, and another thread (or an earlier fd that accepts
> connections) reuses that fd?  And then youre state machine isn't read
> to get an event since it isn't suppose to get one yet...
> 
> The kernel isn't buggy wrt closing a fd when another thread is using
> it, it's the program that's buggy...
> 

I agree with you here, as I said before, kernel may can do things
correctly, but user code has to struggle with race condition between 
multiple threads, so if user code still has to work out a way to avoid 
many race conditions, why don't they just use a signal to interrupt
target thread and do synchronization between threads. The requested
extra close() feature seems to be a wrongly defined problem.

Regards,
David Xu


From owner-freebsd-arch@FreeBSD.ORG  Fri Dec 22 03:36:00 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B7ECC16A4A7;
	Fri, 22 Dec 2006 03:36:00 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 5EC7413C447;
	Fri, 22 Dec 2006 03:36:00 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBM3Zwao029546; 
	Thu, 21 Dec 2006 22:35:59 -0500 (EST)
Date: Thu, 21 Dec 2006 22:35:58 -0500 (EST)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
In-Reply-To: <20061222020101.GC4982@funkthat.com>
Message-ID: <Pine.GSO.4.64.0612212227410.2250@sea.ntplx.net>
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
	<20061221152115.U83974@fledge.watson.org>
	<20061222020101.GC4982@funkthat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Message whitelisted by DRAC access database, not delayed by
	milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]);
	Thu, 21 Dec 2006 22:35:59 -0500 (EST)
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net)
Cc: Julian Elischer <jelischer@ironport.com>,
	Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 03:36:00 -0000

On Thu, 21 Dec 2006, John-Mark Gurney wrote:

> Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000:
>>> I think you are only intersted in treads that are sleeping.. so you allow
>>> a sleeping thread to save a pointer to the fd (or whatever) on which it is
>>> sleeping, along with the sleep address.
>>>
>>> items that are not sleeping are either already returning, or are going to
>>> sleep, in which case they can check at that time.
>>
>> Hence my question about select and poll: should they throw an exception
>> state when a file descriptor is closed out from under them?  They often
>> sleep on hundreds or thousands of file descriptors, and not just one.
>
> IMO, your program is buggy if you close the file descriptor before
> everything is out of the kernel wrt the fd...  It means that your close
> statement isn't waiting for things to be cleanly shut down, and that
> you still have dangling reference counts to the parts of the code that
> is in the kernel...
>
> I used to expect something similar w/ an kqueue based event driven
> web server, and found that I had bugs due to assuming that I could
> close it whenever I want...  What happens if you close the fd between
> the time select returns and you process it?  What happens if the fd
> gets closed, and another thread (or an earlier fd that accepts
> connections) reuses that fd?  And then youre state machine isn't read
> to get an event since it isn't suppose to get one yet...
>
> The kernel isn't buggy wrt closing a fd when another thread is using
> it, it's the program that's buggy...

I agree also, but hanging without return isn't very detectable.
The best thing to do is to tell the programmer that he is doing
something stupid, and returning with an error is the way that
it is typically done.  Solaris seems to have jumped through
some hoops to achieve this behavior, so I doubt it is without
merit.  OTOH, I'm not going to argue that it is one of the
more important things we should be worried about ;-)

-- 
DE

From owner-freebsd-arch@FreeBSD.ORG  Fri Dec 22 04:07:40 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5795016A403;
	Fri, 22 Dec 2006 04:07:40 +0000 (UTC)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: from hydrogen.funkthat.com (gate.funkthat.com [69.17.45.168])
	by mx1.freebsd.org (Postfix) with ESMTP id 130AF13C41A;
	Fri, 22 Dec 2006 04:07:40 +0000 (UTC)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: from hydrogen.funkthat.com (kci2bk4mc4426o7z@localhost.funkthat.com
	[127.0.0.1])
	by hydrogen.funkthat.com (8.13.6/8.13.3) with ESMTP id kBM47d0G033271; 
	Thu, 21 Dec 2006 20:07:39 -0800 (PST)
	(envelope-from jmg@hydrogen.funkthat.com)
Received: (from jmg@localhost)
	by hydrogen.funkthat.com (8.13.6/8.13.3/Submit) id kBM47d8w033270;
	Thu, 21 Dec 2006 20:07:39 -0800 (PST) (envelope-from jmg)
Date: Thu, 21 Dec 2006 20:07:38 -0800
From: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
To: Daniel Eischen <deischen@freebsd.org>
Message-ID: <20061222040738.GD4982@funkthat.com>
Mail-Followup-To: Daniel Eischen <deischen@freebsd.org>,
	Julian Elischer <jelischer@ironport.com>,
	Robert Watson <rwatson@freebsd.org>, David Xu <davidxu@freebsd.org>,
	freebsd-arch@freebsd.org
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
	<20061221152115.U83974@fledge.watson.org>
	<20061222020101.GC4982@funkthat.com>
	<Pine.GSO.4.64.0612212227410.2250@sea.ntplx.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.GSO.4.64.0612212227410.2250@sea.ntplx.net>
User-Agent: Mutt/1.4.2.1i
X-Operating-System: FreeBSD 5.4-RELEASE-p6 i386
X-PGP-Fingerprint: B7 EC EF F8 AE ED A7 31  96 7A 22 B3 D8 56 36 F4
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
Cc: Julian Elischer <jelischer@ironport.com>,
	Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 04:07:40 -0000

Daniel Eischen wrote this message on Thu, Dec 21, 2006 at 22:35 -0500:
> On Thu, 21 Dec 2006, John-Mark Gurney wrote:
> 
> >Robert Watson wrote this message on Thu, Dec 21, 2006 at 15:22 +0000:
> >>>I think you are only intersted in treads that are sleeping.. so you allow
> >>>a sleeping thread to save a pointer to the fd (or whatever) on which it 
> >>>is
> >>>sleeping, along with the sleep address.
> >>>
> >>>items that are not sleeping are either already returning, or are going to
> >>>sleep, in which case they can check at that time.
> >>
> >>Hence my question about select and poll: should they throw an exception
> >>state when a file descriptor is closed out from under them?  They often
> >>sleep on hundreds or thousands of file descriptors, and not just one.
> >
> >IMO, your program is buggy if you close the file descriptor before
> >everything is out of the kernel wrt the fd...  It means that your close
> >statement isn't waiting for things to be cleanly shut down, and that
> >you still have dangling reference counts to the parts of the code that
> >is in the kernel...
> >
> >I used to expect something similar w/ an kqueue based event driven
> >web server, and found that I had bugs due to assuming that I could
> >close it whenever I want...  What happens if you close the fd between
> >the time select returns and you process it?  What happens if the fd
> >gets closed, and another thread (or an earlier fd that accepts
> >connections) reuses that fd?  And then youre state machine isn't read
> >to get an event since it isn't suppose to get one yet...
> >
> >The kernel isn't buggy wrt closing a fd when another thread is using
> >it, it's the program that's buggy...
> 
> I agree also, but hanging without return isn't very detectable.

It's a lot more detectable than working 99% more of the time and
failing when things get correupted due to a race.. :)

> The best thing to do is to tell the programmer that he is doing
> something stupid, and returning with an error is the way that
> it is typically done.  Solaris seems to have jumped through

As long as it's EDOOFUS...  I don't see any other error that would
be approriate...

> some hoops to achieve this behavior, so I doubt it is without
> merit.  OTOH, I'm not going to argue that it is one of the
> more important things we should be worried about ;-)

As long as it doesn't cost much more to do it...  Hanging is just as
good of an indication as returning an error...  And I'd say it's better
as it forces the buggy software to be fixed as opposed to simply ignoring
the error which is likely what the programmer will do...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@FreeBSD.ORG  Fri Dec 22 04:16:45 2006
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: freebsd-arch@freebsd.org
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E028A16A403;
	Fri, 22 Dec 2006 04:16:45 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10])
	by mx1.freebsd.org (Postfix) with ESMTP id 8661E13C442;
	Fri, 22 Dec 2006 04:16:45 +0000 (UTC)
	(envelope-from deischen@freebsd.org)
Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11])
	by mail.ntplx.net (8.13.8/8.13.8/NETPLEX) with ESMTP id kBM4GiRX028600; 
	Thu, 21 Dec 2006 23:16:44 -0500 (EST)
Date: Thu, 21 Dec 2006 23:16:44 -0500 (EST)
From: Daniel Eischen <deischen@freebsd.org>
X-X-Sender: eischen@sea.ntplx.net
To: John-Mark Gurney <gurney_j@resnet.uoregon.edu>
In-Reply-To: <20061222040738.GD4982@funkthat.com>
Message-ID: <Pine.GSO.4.64.0612212310080.2302@sea.ntplx.net>
References: <32874.1165905843@critter.freebsd.dk>
	<20061220153126.G85384@fledge.watson.org>
	<Pine.GSO.4.64.0612201308220.23942@sea.ntplx.net>
	<200612210820.09955.davidxu@freebsd.org>
	<4589E7D2.9010608@ironport.com>
	<20061221152115.U83974@fledge.watson.org>
	<20061222020101.GC4982@funkthat.com>
	<Pine.GSO.4.64.0612212227410.2250@sea.ntplx.net>
	<20061222040738.GD4982@funkthat.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Message whitelisted by DRAC access database, not delayed by
	milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]);
	Thu, 21 Dec 2006 23:16:44 -0500 (EST)
X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net)
Cc: Julian Elischer <jelischer@ironport.com>,
	Robert Watson <rwatson@freebsd.org>,
	David Xu <davidxu@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: close() of active socket does not work on FreeBSD 6
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Daniel Eischen <deischen@freebsd.org>
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Dec 2006 04:16:46 -0000

On Thu, 21 Dec 2006, John-Mark Gurney wrote:

> Daniel Eischen wrote this message on Thu, Dec 21, 2006 at 22:35 -0500:
>> On Thu, 21 Dec 2006, John-Mark Gurney wrote:
>>
>>> I used to expect something similar w/ an kqueue based event driven
>>> web server, and found that I had bugs due to assuming that I could
>>> close it whenever I want...  What happens if you close the fd between
>>> the time select returns and you process it?  What happens if the fd
>>> gets closed, and another thread (or an earlier fd that accepts
>>> connections) reuses that fd?  And then youre state machine isn't read
>>> to get an event since it isn't suppose to get one yet...
>>>
>>> The kernel isn't buggy wrt closing a fd when another thread is using
>>> it, it's the program that's buggy...
>>
>> I agree also, but hanging without return isn't very detectable.
>
> It's a lot more detectable than working 99% more of the time and
> failing when things get correupted due to a race.. :)

I dunno, I think returning an appropriate error on the actual
call(s) that are problematic is easier to detect than trying
to figure out just what is causing the hang, corruption,
whatever.  Perhaps I mean "debug" instead of "detect".

>> The best thing to do is to tell the programmer that he is doing
>> something stupid, and returning with an error is the way that
>> it is typically done.  Solaris seems to have jumped through
>
> As long as it's EDOOFUS...  I don't see any other error that would
> be approriate...

EBADF.  That's what Solaris returns and makes more sense
to me.

>> some hoops to achieve this behavior, so I doubt it is without
>> merit.  OTOH, I'm not going to argue that it is one of the
>> more important things we should be worried about ;-)
>
> As long as it doesn't cost much more to do it...  Hanging is just as
> good of an indication as returning an error...  And I'd say it's better
> as it forces the buggy software to be fixed as opposed to simply ignoring
> the error which is likely what the programmer will do...

Yes, unfortunately, ignoring the error would probably happen
a lot.

-- 
DE