From owner-freebsd-hackers  Tue Oct 24 17:24:58 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id RAA13301
          for hackers-outgoing; Tue, 24 Oct 1995 17:24:58 -0700
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id RAA13271
          for <hackers@freebsd.org>; Tue, 24 Oct 1995 17:24:50 -0700
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id RAA17462; Tue, 24 Oct 1995 17:16:03 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199510250016.RAA17462@phaeton.artisoft.com>
Subject: Re: SYSCALL IDEAS [Was: cvs commit: src/sys/kern sysv_msg.c sysv_sem.c sysv_shm.c]
To: bde@zeta.org.au (Bruce Evans)
Date: Tue, 24 Oct 1995 17:16:03 -0700 (MST)
Cc: bde@zeta.org.au, terry@lambert.org, CVS-commiters@freefall.freebsd.org,
        bde@freefall.freebsd.org, cvs-sys@freefall.freebsd.org,
        hackers@freebsd.org, swallace@ece.uci.edu
In-Reply-To: <199510242157.HAA01465@godzilla.zeta.org.au> from "Bruce Evans" at Oct 25, 95 07:57:12 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 10667     
Sender: owner-hackers@freebsd.org
Precedence: bulk

> >When a system call is made, arguments are pushed on the user stack
> >and then the trap vector is called.  There is a necessity to copy
> >the arguments as if they were on the stack down to a kernel space
> >buffer area because of the address space differences.  The amount
> >that is copied is determined by an integer count of integers: that
> >is, it is some number 'n' * sizeof(int) bytes that get copied.
> 
> Only in some ABI's.  This is probably the best way, but it may
> requires messy conversions in the library to put the args on the
> stack with consistent padding.

Pushing on the stack is a messy conversion?  What about dead register
usage from not knowing about the stack?  I think you are going to
have to burn the cycles on an opaque function call in any case.

> Since we don't control foreign ABI's we shouldn't assume this.  For
> example, in Linux all the args are passed in registers.  In Minix,
> the args are stored in a syscall-dependent struct and a pointer
> to the struct is passed in %ebx.  The struct is not always nicely
> padded (it can have packed char and short fields).

That's fine.  The size of arguments in iBCS2 and BSD is 'int'.  It's
either 'int' or 'long' or '*'.

So we take a hit when processing these non-standard mechanisms; we do
so through the system call table for the ABI, so we will be taking
a function encapsulation hit anyway.

I think it is unlikely that iBCS2 or Linux develeopement will occur on
BSD in a non-emulated environment.

> >This is dependent on the arguments pushed on the stack being
> >representable as integer values (we are guaranteed this by the
> >fact that we are calling a non-local function that is not inlined).
> 
> There is no such guarantee.  gcc for the i386 happens to use this
> slow parameter passing convention for portability.  (Unless you
> compile with -mregparm.  -mregparm is officially supported in gcc-2.7.0.
> It is apparently necessary for OS/2 or Windows-NT.)

UNIX typically uses this, period.  Stack argument passing to system
calls is the way things are done.

The use of registers is fine, as long as the interface is general
enough.  The OS/2 and NT (and Win95) conventions derive from the
DOS interrupt mechanisms and require assembly function stubs in
any case, even if the routines are then tagged "naked" -- even then,
there are problems intordiced that the compiler can't understand a
call will trash EAX (and maybe EDX) without hacking an "xor eax,eax"
into the thing to make it know that the function to be called trashes
it and it can't leave a local variable in it over the call.

> >[copyin() of the args]
> >Now each of the arguments are themselves, potentially, pointers to
> >additional information in call/subcode specific user space structures
> >that must, additionally be copied in (or out to).
> 
> Not quite.  The args may be padded.  In NetBSD for the alpha, the args
> are apparently padded to 8 bytes and the SCARG macro is mainly to
> extract the relevant subfield which is usually 4 bytes.  There may be
> complications for endianness.

They are padded to the default bus transfer size for the machine,
which is supposed to be 'int'.

I'd argue that 'int' was the wrong size, not that there was extraneous
padding.

There is a limit on the address space on the alpha anyway; it's not
the full 64 bits.


I'd really dealy love to know how there could be an endianess issue,
considering system calls are only ever going to run as compiled code
on one endianess of machine.


> >While BSD could very well benefit from a single verification, avoiding
> >the mapping issues in copyin/copyout, giving one check instead of two
> 
> It could benefit most from passing args in registers, as in Linux, so
> that no copyin() is required.

1)	BSDI binary compatability
2)	FreeBSD/NetBSD binary backward compatability
3)	Register conversion from a Linux emulation mismatch.
4)	Register collision verification in the function so that
	the register is not used directly for local storage/scratch.
5)	Async reentrancy/multithreading reentrancy.

It's a can of worms that, frankly, buys so little as to be useless.

> >What does it do?  What use is the change?
> 
> It avoids scattering unportable casts and ugly macros to perform them
> throughout the "machine-independent" code.  Now we have only unportable
> casts.  4.4lite2 has slightly less unportable casts and ugly macros.
> NetBSD has much less unportable casts and ugly macros.

The structure casts, I presume?

The answer is to compile with the packing being the default for data
matchup -- in the alpha case, 64 bits.  For devices that need it, use
packing #pragma's to tighten them up on a case by case basis, in the
header file defining the structure.

Aligned element accesses are faster anyway.

> >It's very arguable that the compiler would generate incorrectly window
> >optimized code for inlined system calls at present.  Specifically,
> >it would fail to see the need to push the arguments.
> 
> Earth to Terry :-).  We're talking about inlining syscall handlers, not
> syscalls.

Sorry -- you're the one that brought up ABI, which is kernel code, not
user space code.  The only way you can effect the ABI code is if you
call the inlined versions and match the user and kernel usage.

Actually, the only way you can guarantee interface usability is to
use the inlines to make the calls -- otherwise, your plan to pass in
registers fails when you try to call the BSD system call from the
ABI system call code.  8-(.

> >This is incorrect.  The argument count specified in the systent[] table
> >should result in the correct copyin size.
> 
> There may be no correct size.  A size of 3 ints wouldn't work for
> open("foo", 0) if the caller has perversely passed 2 args on the stack
> at the top of the address space.  Where are the ABI specs that disallow
> this?

There are none.  However, you are wrong; it would work, you'd just
get a garbage value (stack direction grows the right direction for
the third argument to be optional).  Since in that case the garbage
value is unreferenced (or the call generated a prototype warning
...not 8-)), then it will work.

Getting a bogus value from the stack vs. getting a bogus value from
a register is a tossup in any case.

> Also, a single number doesn't tell you where the args are.  In general
> you need an offset and a size for each arg on the user stack (let's not
> worry about endianness conversions :-) and a mapping of user registers
> to args.

I'd really like to see your biendian binary machine that didn't do the
switch in the trap code in the kernel (PPC).

Throwing that out, the offset is "one stack entry per".

The biggest bugger here is preinitialization of the high word of a
quad for portability -- but of course, the resulting code fails to
operate after certain file lengths are hit instead of immediately,
so all that's been done is that the problem has been relocated,
not solved.

Even assuming the problem is "fixed" by doing this, the only
compatability guaranteed is with applications compiled using
the same techniques -- that is, applications that don't exist
because this is a change in call behaviour as well.

I think moving to register parameter passing is too architecture
specific.


[ ... multiplexed system calls (like, oh, say open with O_CREAT as
      the multiplexing flag) ... ]

> >The amount of code is a single computed goto.  One might as well
> 
> Not if there are nontrivial conversions.

Then it's a non-trivial case, and splitting it out isn't going to make
it more trivial.

[ ... and now to the meat ... ]

> >What portability problems do you see in the system call multiplex
> >interfaces, and under what circumstances can you cause incorrect code
> >to be generated?
> 
> A reasonable parameter passing convention should put the first few
> args (a fixed number) in registers but stop at the first `...' arg or
> the one before (so a variable number of args may be in registers.
> Where are you going to translate this?  Portability problems would
> result from delaying the translation.  Incorrect code would be generated,
> as usual, due to bugs.

I have to point out that it would then be impossible to make system
calls without prototype references.  If this is a "fix" for the quad
word passing problem (which is a "non-use of system call prototype"
problem), then, isn't this just making things worse?

> 
> >> They are a problem because they give more special cases to write code for.
> 
> >As opposed to generating code?  I don't see less total code in the long
> >run, and applying a cookie cutter and forcing all the calls to fit the
> >mold is not an optimal approach to solving the problem.
> 
> Er, isn't the array of args a mold?  I want to make args in the kernel
> fit the same molds as args in user space:
> 
> 	int open(path, flags, ...)
> 
> is a completely different mold from
> 
> 	int open(path, flags, mode) 
> 
> For the former, the compiler should use a special, slow parameter passing
> convention and pop the args in the caller.  For the latter, the compiler
> should pass at least the first one or two args in registers and pop the
> args in the callee.

Callee pop only works when using the same stack.  When you get to the
kernel, the kernel thread (or just process) will be using its own stack,
so that argument won't wash.  My complaint on callee pop as a more
fruitful pursuit was based on kernel-kernel calls, not user-kernel calls.

The only problem right now is that the quad argument takes more than
one stack position to do its dirty deed.  And that's more a result of
the thing being an invalid C type than anything else.  If were weren't
in violation of POSIX/ANSI on quad_t, then it wouldn't be a problem.

Probably the correct soloution is to make seperate "quad-knowledgable"
functions for the things that take quad arguments.  This is actually
described in __syscall(2) -- which *guarantees* padding.

Right now the screwable functions are truncate, ftruncate, seek, lseek,
and mmap -- and mmap() is bogus because of the kernel address space
restrictions currently on "vmio".  The others are in violation of one
or more standards because "quad" isn't an allowable type.  Might as
well violate them further by using inline references to the __syscall(2)
instead of syscall(2) to get to them so that: (1) they are undefined
without proper header inclusion, and (2) the padding is guaranteed
(as the __syscall(2) states in the man page).  That at least would solve
the screwups without adding to them.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.