Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 May 2002 00:00:01 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc:        John Baldwin <jhb@FreeBSD.ORG>, Matthew Dillon <dillon@apollo.backplane.com>, arch@FreeBSD.ORG
Subject:   Re: syscall changes to deal with 32->64 changes. 
Message-ID:  <20020508070001.1BF6F38FD@overcee.wemm.org>
In-Reply-To: <89017.1020785836@critter.freebsd.dk> 

next in thread | previous in thread | raw e-mail | index | archive | help
Poul-Henning Kamp wrote:
> 
> Well, seems like some sort of concensus is building.
> 
> Now for a bit of radical thinking.
> 
> The FreeBSD kernel is already a multi-API kernel, we have freebsd
> syscalls, including various old compat stuff, we have NetBSD compat,
> we have BSDI compat, Linuxolator, IBCS2, and we will probably have
> Solaris compat on the sparc64 platform.

Also, dont forget x86-64 and ia64 which will now have 3, 4 or 5 syscall
vectors to deal with.

1) 32 bit i386 a.out <= 4.x  (yes, a.out is still supported)
2) 32 bit i386 ELF <= 4.x
3) 32 bit i386 ELF >= 5.x
4) 64 bit x86-64 ELF / 64 bit ia64 ELF
5) 32 bit IA64 ELF (you can compile in ILP32 mode)

Personally, that's several too many already. :-(  (and yes, I know that
a.out and elf currently share the syscall vector part.. but not the
executable loader.. they have different startup mechanisms and different
kernel entry trap methods.)

We're already having enough trouble copying around the MPSAFE tag in all
those damn syscall.master's..  We dont need more.

> We cannot easily compile for two native APIs if we cannot have two
> different set of 'sys/*' and 'machine/*' files since a lot of the
> types we want to change size on are defined in these files.
> 
> I would therefore like to propose that we do something like
> the following:
> 
> 	Repocopy src/sys/sys/* to src/sys/include
> 	Repocopy src/sys/sys/* to src/sys/abi4
> 	Remove src/sys/sys/*
> 
> 	In src/sys/include we remove everything which deals with
> 	syscall parameters, but retain kernel internal data structures.
> 	Stuff like vnodes, and similar lives here.
> 
> 	In src/sys/abi4 we remove everything that we leave behind
> 	in src/sys/include.
> 
> 	(A similar split should be done for src/sys/$arch/include
> 	into a src/sys/$arch/abi4 directory)
> 
> 	In .c land we repocopy the relevant sys/kern/*.c files
> 	into sys/abi4 and remove everything but the syscall
> 	entry points and add explicit conversions to arguments
> 	as needed.
> 	
> 	We now have a clean split between the stuff which defines
> 	what goes on in the kernel and the stuff which defines
> 	the ABI to userland.
> 
> 	Adding a new ABI is now a matter of creating the relevant
> 	directories (src/sys/abi5) and populating with files which
> 	does it right.
> 
> I see many advantages to this way of doing things:
> 
> We can remove practically all "#ifdef _KERNEL" from the .h files
> we install in /usr/include/sys and we can get a fair bit closer to
> whatever the standard-du-jour dictates that we put there.
> 
> Conversely we can clean the kernel side of things (src/sys/include)
> for things we don't want people to use in the kernel, but which
> standards or compatibility demand we put in <sys/blah.h>.
> 
> We also get a clear kernel / userland split on C types, we may find
> it convenient to operate on a 64 bit foo_t in the kernel but leave
> it 32 bit in userland and this lets us trivially do so.
> 
> This should put a good bit of infrastructure in to make current and
> future API/ABI implementations simpler and more structured.
> 
> I guess a way to sum this up is that it will put all API/ABI's on
> equal footing.  With this change none of them will be any more
> "native" than any other API/ABI.

Personally, I think this is *way* overkill.

I think there is far more value to be had by divorcing the syscall
interfaces from the code that implements them so we can do away with the
damn stackgap stuff.

eg:  instead of open() doing the copyin *and* the body of the work,
we should have sys_open (or abi4_open, linux_open, etc) which do the pathname
copyin, any args massaging etc, and then call open() with the cleaned up
arguments.  open() shouldn't have to do copyin etc.

The ia64 32 bit emulator already does the 32 bit time_t <-> 64 bit time_t
conversions.  There are quite a few that need translation, not the least of
which are:  struct rusage (all the wait* syscalls), struct statfs,
things like setitimer etc which take timevals, select (timeval), 
gettimeofday(timeval), struct stat, utimes(time_t), adjtime(timeval),
readv/writev/pread/pwrite etc and all the syscalls that use iovec's
(there is a 'long' in there if you're thinking about making it all
explicit sized as well).

In fact, gettimeofday is a classic example, the following taken from
the sys/ia64/ia32/ia32_misc.c code that we're using now:

int
ia32_gettimeofday(struct thread *td, struct ia32_gettimeofday_args *uap)
{
        int error;
        caddr_t sg;
        struct timeval32 *p32, s32;
        struct timeval *p = NULL, s;

        p32 = SCARG(uap, tp);
        if (p32) {
                sg = stackgap_init();
                p = stackgap_alloc(&sg, sizeof(struct timeval));
                SCARG(uap, tp) = (struct timeval32 *)p;
        }
        error = gettimeofday(td, (struct gettimeofday_args *) uap);
        if (error)
                return (error);
        if (p32) {      
                error = copyin(p, &s, sizeof(s));
                if (error)
                        return (error);
                CP(s, s32, tv_sec);
                CP(s, s32, tv_usec);
                error = copyout(&s32, p32, sizeof(s32));
                if (error)
                        return (error);
        }
        return (error);
}

Do you really want to impose all those copyin/outs etc on common paths for
4.x binaries?  We need to spend more effort on things like having a seperate
sys_gettimeofday(td, struct gettimeofday_args *uap) vs
gettimeofday(td, struct timeval *tv);

You then have:
int gettimeofday(td, struct timeval *tv)
{
	.. normal code but no copyout ...
}
int sys_gettimeofday(td, uap)	/* native 5.x syscall */
{
	int error;
	struct timeval tv;	/* native kernel timeval */
	
	error = gettimeofday(td, &tv);
	if (error == 0 && uap->tp)
		error = copyout(&tv, uap->tp, sizeof(tv));
	return error;
}
int sys_gettimeofday32(td, uap)	/* 32 bit syscall interface */
{
	int error;
	struct timeval tv;
	struct timeval32 tv32;	/* userland 32 bit timeval */

	error = gettimeofday(td, &tv);
	convert_tv_to_tv32(&tv, &tv32);
	if (error == 0 && uap->tp)
		error = copyout(&tv32, uap->tp, sizeof(tv32));
	return error;
}
and so on.  Lots less bogus copyin/outs through the stackgap.  You can use
your local stack for temporary conversions, or even malloc etc.  But
trying to do it in userland because we dont cleanly divorce the syscall
ABI implementation with the functionality just sucks.

Finally, I really think the entire-new-syscall vector idea is sheer
wasteful overkill.  I'd much rather we had COMPAT_FREEBSD4 kernel compile
options using the existing vector with new syscalls added in that we need
to translate.  What I saw on SVR4 was much cleaner.  They dealt with
different "struct stat"'s wit no trouble at all.  You could even compile
to the older interfaces.

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020508070001.1BF6F38FD>