From owner-freebsd-current@FreeBSD.ORG  Mon Jun 28 06:54:14 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7173A16A4CE; Mon, 28 Jun 2004 06:54:14 +0000 (GMT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 26EAA43D53; Mon, 28 Jun 2004 06:54:14 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i5S6rvds076566;	Sun, 27 Jun 2004 23:53:57 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i5S6rufW076565;
	Sun, 27 Jun 2004 23:53:56 -0700 (PDT)
	(envelope-from dillon)
Date: Sun, 27 Jun 2004 23:53:56 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200406280653.i5S6rufW076565@apollo.backplane.com>
To: Robert Watson <rwatson@freebsd.org>
References: <Pine.NEB.3.96L.1040627111717.66958B-100000@fledge.watson.org>
cc: freebsd-current@freebsd.org
cc: Cordula's Web <cpghost@cordula.ws>
cc: alex@hightemplar.com
Subject: Re: HEADSUP:  ibcs2 and svr4 compat headed for history
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Jun 2004 06:54:14 -0000

    Because of a desire to maintain / have / create compatibility with
    other operating systems, including remaining compatible with FreeBSD-4
    and adding FreeBSD-5 compatibility (possibly), as well as Linux,
    and of course other architectures that might be used far less....  I
    have for the last year been thinking very carefully about the issue 
    of the compatibility code we have in the kernel.

    The problem that I see is not so much that the compatibility code 
    exists, but that it exists in the kernel.  I believe that the solution
    is to move it to userland and thus unburden the kernel from having to
    deal with it.  In userland it can be maintained (A) more easily,
    (B) without the security issues involved with it being in the kernel,
    and (C) is far more portable.

    I fully intend to undertake this project for DragonFly, especially
    because as we move to a messaged syscall interface we need to maintain
    compatibility with the non-messaged interface, and I want that to be
    a function set that runs in userland.  i.e. for DragonFly when someone
    calls the 'native' read(), it wouldn't actually be a libc function
    but would instead be an intermediate user-level function vector whos
    code space is managed by the kernel, almost like a mmap'd library
    (or exactly like an mmap()'d library, but with a vector table).

    It would be great if we could come up with a joint methodology, because
    once such an abstraction is operational all the compatibility code that
    falls under it, being userland code, would be highly portable to any
    operating system running the abstraction.

    I would recommend that instead of ripping this stuff out of FreeBSD-5
    willy nilly, leave it in for now and let's spend our energies on the
    development of an intermediate compatibility layer, abstraction, and
    API.  

    The actual kernel work required to implement such a layer is not all
    that complex -- really all the kernel has to do is take an INT 0xN
    and throw it back in userland's face (or even just make the INT 0xN vector
    an LDT vector that runs in userland's protection ring and never even
    enters the kernel).

    In regards to where these functions would reside... well, I was thinking
    that we would reserve a chunk of VM either just below the kernel start,
    or just above the kernel start which would contain the intermediate layer.
    The actual address is almost irrelevant because the entry mechanism is,
    of course, the system call entry mechanism being emulated.  It would
    be pure read-only code, with no writable data other then the stack,
    whos purpose is simply to translate system calls into the 'native' form.

    Another aspect of this abstraction is that it would be possible to
    change the kernel's own native entry interface, argument format, and so
    forth, and yet still maintain compatibility with 'older' userland 
    programs by having an intermediate layer that glues userland program
    targeted to version X of the kernel to version Y of the kernel which
    is actually running.  (This is why DFly needs it).  One would also be 
    able to abstract out optimizations, such as providing non-ring-crossing
    timestamp functions that utilize memory mapped I/O or other things...
    these types of functions would be placed in the proposed intermediate
    (run in user mode) layer.

    The intermediate layer would also have a direct access mechanism.
    That is, userland programs which are aware of the layer could query
    to get a vector base and call through a vector array into the layer
    directly.  The intermediate layer would then optmiize those calls that
    do not require entry into the kernel and pass the rest on to the kernel.
    The userland program would not know the difference, which is the whole
    point of the exercise.

    So, as you can see, there is great potential flexibility in such a 
    design.  So much so, in fact, that the ability to move things like
    SysV and IBCS2 out of the kernel become mere side effects of a larger
    purpose.  It would be a huge advance over the crufty syscall methodology
    that all UNIXes today employ.

						    -Matt