From owner-freebsd-arch@FreeBSD.ORG  Thu Apr  3 03:21:36 2003
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A92C737B401
	for <freebsd-arch@freebsd.org>; Thu,  3 Apr 2003 03:21:36 -0800 (PST)
Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net
	[207.217.120.188])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 17C2843FAF
	for <freebsd-arch@freebsd.org>; Thu,  3 Apr 2003 03:21:36 -0800 (PST)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0015.cvx22-bradley.dialup.earthlink.net ([209.179.198.15]
	helo=mindspring.com)
	by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 1912mh-0005sn-00; Thu, 03 Apr 2003 03:21:32 -0800
Message-ID: <3E8C18CC.AF2C6B7F@mindspring.com>
Date: Thu, 03 Apr 2003 03:19:40 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Igor Sysoev <is@rambler-co.ru>
References: <Pine.BSF.4.21.0304031307220.32175-100000@is>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4718c3e02cdc50a5e8ae02ba12f90e767548b785378294e88350badd9bab72f9c350badd9bab72f9c
cc: freebsd-arch@freebsd.org
Subject: Re: libthr and 1:1 threading.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Apr 2003 11:21:37 -0000

Igor Sysoev wrote:
> If a process caused a page fault or memory mapping fault at user level
> where do you suppose to return in user space after a fault was just queued ?
> To the same instruction that caused this fault ?

Yes.  And then return as if the fault had failed.

Only it's not fatal, because you only return out of the code if
the fd was marked async; otherwise you go to sleep on the buffer
getting filled in by an I/O initiated by the fault.  If you do
return, you don't crash the process, you return with EAGAIN.

The trap handler provides the context for the delayed operation.

> With threads you can run another thread in such situation.

Yes.  And save the 20ms per fault that Robert Watson estimated
was the reason for the performance difference between libc_r
("user space threads") and libthr ("1:1 kernel threads").


> BTW what do you mean by 'async fd' in Solaris ?
> O_ASYNC ? I do not see it in Solaris 8.
> O_NONBLOCK ? It does not matter for disk files.

O_NONBLOCK.

Examining the Solaris 8 sources, it seems to have been removed from
the disk I/O modules, and applies only to socktpi.c, ptm.c, audio_mc.c,
ecpp.c, and envctrl.c.  Apparently, it's also missing from the tty
code, which goes against what Matt claimed, actually.

This is unfortunate, in that it leaves me without an easily
accessible example, unless you have a USL UNIX source license?

Is there any chance you have legal access to the Solaris 2.2 or
even the 2.4 source code, which is immediately following the project
for integration between USL and SunSoft?

Or the USL SVR4.0.2 or SVR4.2 source code?


> aioread() or aio_read() ? They are library calls that implemented
> via additional LWP for regular disk files.

I know this.  This was basically what Julian and Matt had discussed
as a means of implementing AIO in FreeBSD, rather than using system
calls.


> >> Certainly, you can argue that the application should be structured
> >> to make all I/O explicit and asynchronous, but for various reasons,
> >> that's not the case :-).
> > 
> >The mmap'ed file case is obviously not something that can be
> >handled without an explicit contract between user and kernel
> >for notification of the pagein temporary failure (I would use
> >a signal for that, probably, as a gross first approximation,
> >but per-process signal handling is currently not happy...).
> 
> And what do you suppose to do in a signal handler ?
> Using some non-reenterant library functions ?

No.  Call the user thread scheduler as a result of a fault that
is normally not trappable because it resulted from a memory access
to an mmap()'ed region of the address space, rather than resulting
from an explicit system call.  There is no system call context when
a trap like that occurs, there is only a trap context.

A signal would allow you to force a user threads context switch
for a thread whose only reason it can't run is that running it
would result in a page fault and delay all the other runnable
threads that aren't waiting on a condition that would result in
a page fault.

The signal is just to get back to user space so you can force the
faulting thread to yield and restart the operation by being
rescheduled later, after the fault has been satisfied by the
kernel's I/O subsystem.


-- Terry