From owner-freebsd-arch@FreeBSD.ORG  Sun Sep  2 04:17:54 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 06FBB16A41B
	for <freebsd-arch@freebsd.org>; Sun,  2 Sep 2007 04:17:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 893AD13C4CE
	for <freebsd-arch@freebsd.org>; Sun,  2 Sep 2007 04:17:53 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	(c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l824H2m2011606
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 2 Sep 2007 14:17:32 +1000
Date: Sun, 2 Sep 2007 14:17:02 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Jilles Tjoelker <jilles@stack.nl>
In-Reply-To: <20070901224025.GA97796@stack.nl>
Message-ID: <20070902131910.H46281@delplex.bde.org>
References: <1188600721.1255.11.camel@shumai.marcuscom.com>
	<20070901112600.GA33832@stack.nl>
	<1188660782.41727.5.camel@shumai.marcuscom.com>
	<20070901224025.GA97796@stack.nl>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Joe Marcus Clarke <marcus@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Understanding interrupted system calls
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Sep 2007 04:17:54 -0000

On Sun, 2 Sep 2007, Jilles Tjoelker wrote:

> On Sat, Sep 01, 2007 at 11:33:02AM -0400, Joe Marcus Clarke wrote:
>> However, I'm curious as to my other point in this thread.  Why should
>> one need to re-register the default signal handlers to get a syscall to
>> return EINTR?  Or should ERESTART be caught and turned into EINTR then
>> return to the caller (as in kern_connect())?
>
> It is intended that most blocking system calls are not interrupted by
> signals.  This saves the programmer some checks on EINTR.

Yes, I think this is just the historical default for BSD.  Not
restarting syscalls is very unusual in BSD, and this was implemented
by defaulting ps_sigintr to all unset (?).  Before sigaction(2) existed,
signal(3) was probably signal(2), and most programs used only signal()
and got the default.  siginterrupt(3) was more probably siginterrupt(2),
and the few programs that understand this stuff and want to change the
default had to use it to get EINTR.  Now with sigaction(2), the default
doesn't apply to userland, since setting up a signal catcher requires
using sigaction(2) which sets the ps_sigintr bit to the inverse of the
SA_RESTART bit in the sigaction data.  However, the default might still
affect operation in the kernel for programs that never call sigaction().
I think it shouldn't have any effect except to return to near the
syscall entry point to decide what to do.  Lower levels should see
ERESTART and return to near the syscall entry point.  Similarly if
sigaction() is used to change the default.

> Some system calls, e.g. connect(), read/write/etc that have already
> committed some data and under BSD also select/poll/kqueue do not restart
> and always return EINTR or partial success.  In the kernel code, this
> appears as changing ERESTART to EINTR.

This translation happens at a high level, but the translation of ps_sigintr
happens at the [*]sleep() level.  So ps_sigintr has a significant affect
at a low level, but only (?) to select between ERESTART and EINTR.

>From Jilles' previous reply:

>>> The problem seems to be the following code in
>>> src/sys/dev/syscons/syscons.c, in case VT_WAITACTIVE in scioctl():
>>> 
>>> 	while ((error=tsleep(&scp->smode, PZERO|PCATCH,
>>> 			     "waitvt", 0)) == ERESTART) ;
>>> 
>>> If a signal is caught and system call restart is enabled for that
>>> signal, this makes it spin in a tight loop, waiting in vain for the
>>> signal to go away.  The idea of ERESTART is that the syscall function
>>> returns it and then the signal handler is entered.  If and when the
>>> signal handler returns, it will return to the system call instruction,
>>> restarting it (perhaps this is optimized to avoid the switch to userland
>>> and back).  With EINTR, the signal handler would return to directly
>>> after the system call instruction.
>>> 
>>> The fixed version would then be
>>> 
>>> 	error = tsleep(&scp->smode, PZERO|PCATCH, "waitvt", 0);

I think this is right.  The kernel should never loop on ERESTART like this.
Please fix the remaining style bug in it (missing spaces around binary
operator).

Another problem here is that tty drivers should rarely or never use
tsleep().  They should use ttysleep() so as to check for the tty being
revoked.  After revoke(), ttysleep() returns ERESTART to make lower
levels return to near the syscall entry point (where the syscall is
normally restarted and fails because the file descriptor has been moved
to deadfs) provided there are no broken lower levels that loop on
ERESTART like the above.  Not using ttysleep() doesn't seem to cause
any problems here (it actually avoids the buggy loop).

Bruce