Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Oct 1995 12:06:38 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        bde@zeta.org.au (Bruce Evans)
Cc:        swallace@ece.uci.edu, terry@lambert.org, CVS-commiters@freefall.freebsd.org, bde@freefall.freebsd.org, cvs-sys@freefall.freebsd.org, hackers@freebsd.org
Subject:   Re: SYSCALL IDEAS [Was: cvs commit: src/sys/kern sysv_msg.c sysv_sem.c sysv_shm.c]
Message-ID:  <199510241906.MAA13880@phaeton.artisoft.com>
In-Reply-To: <199510240739.RAA03491@godzilla.zeta.org.au> from "Bruce Evans" at Oct 24, 95 05:39:03 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >> That way we don't have SCARG all over the place, and this would
> >> prepare us for your static function idea.
> 
> >Seeing this on -hackers, I'd like to have someone back up and explain
> >"the static function idea",
> 
> static inline int
> read(fd, buf, nbytes)
> 	int fd;
> 	void *buf;
> 	size_t nbytes;
> 	ssize_t *retval;
> {
> 	/*
> 	 * Same as for the current read(), except we don't have to grope
> 	 * unportably in *uap for the args store the retval in an
> 	 * incompatible type.
> 	 */
> }
> 
> /* Machine generated - do not edit. */
> int
> freebsd_syscall_entry_read(p, args, retval)
> 	struct proc *p;
> 	void *args;
> 	int *retval;
> {
> 	int fd;
> 	void *buf;
> 	size_t nbytes;
> 	ssize_t nread;
> 	int res;
> 
> 	fd = machine_dependent_code_to_convert_fd_for_read(args);
> 	buf = machine_dependent_code_to_convert_buf_for_read(args);
> 	nbytes = machine_dependent_code_to_convert_nbytes_for_read(args);
> 	res = read(fd, buf, nbytes, &nread);
> 	machine_dependent_code_to_convert_retval_for_read(args, nread);
> 	return (res);
> }
> 
> >since it seems likely that an alternate
> >approact to the idea might be more fruitful than rewriting the system
> >call interface such that we have to hack tables all over hell for
> >no real gain.
> 
> This seems unlikely :-).
> 
> The main disadvantage of the above is that read() would actually have
> to be non-static inline so that it can be called from emulators.  E.g.,
> foo_ossyscall_entry_read() might have different machine-generated
> machine-dependent code and it's much better for it to convert from that
> to (int fd, void *buf, size_t nbytes, ssize_t *retval) and call read()
> than it is to convert to the machine-dependent freebsd args array and
> call bsd_syscall_entry_read().  Thus read() would be duplicated at least
> twice (once inline for fast Freebsd syscalls, once extern for slower
> foo_os syscalls).

OK.  I hate leaving all of the above in for context, but I have to.

The disadvantage you cite is false.  Internal kernel calls to read
do not follow the same rules on inlining.  The inlining is to
avoid the function call overhead to get to the trap vector.  This
is, in fact, something that can be done in any case, without a change
to the system call interface.

The problem seems to be one of understanding trap vector argument
decodes.

When a system call is made, arguments are pushed on the user stack
and then the trap vector is called.  There is a necessity to copy
the arguments as if they were on the stack down to a kernel space
buffer area because of the address space differences.  The amount
that is copied is determined by an integer count of integers: that
is, it is some number 'n' * sizeof(int) bytes that get copied.

This is dependent on the arguments pushed on the stack being
representable as integer values (we are guaranteed this by the
fact that we are calling a non-local function that is not inlined).

In any case, the copyin from user to system space in the trap code
must occur, since the kernel can not address the process data otherwise.

The area where the copyin takes place is a per process work area,
and it's a pointer to this that is passed as the argument vector
pointer, which assumes packing such that the structure dereference
of this data will result in the arguments as referenced from the
argument structure.

So far, we do not have a use for the change.


Now each of the arguments are themselves, potentially, pointers to
additional information in call/subcode specific user space structures
that must, additionally be copied in (or out to).


The potential use for the hack is to get the copyin all out of the
way at once with a single verification check, ala Linux system calls.

The problem with this idea is that the address space in the user
program is then assumed (potentially incorrectly) to be non-discontiguous.


While BSD could very well benefit from a single verification, avoiding
the mapping issues in copyin/copyout, giving one check instead of two
for a reuesd area, the actual benefits of this are fairly small.  The
biggest benefit would be in the copyinstr area, and there, the
algorithm would potentially fail because of indeterminate string length.


When we add in the concept of multithreading, it's possible to alter
the process address space promiscuously during a long delay operation,
such that an area that was viable for a copyout is no longer a viable
target.

So the benefits are lost.


Further, the use of null termination instead of byte count on strings
makes the copyinstr() operation extremely onerous.  There are only a
few locations where the facility is used, and even there, the overhead
introduced is largely an issue of bad call interface design.  There
are two types of strings that get copied in (1) paths and (2) structure
arguments (like lkm module but not path names, or NFS server names for
NFS mount).  In actuality, the structure arguments should be packed
into fixed length structure elements and the copyinstr disacarded.  The
paths can stay, but a seperation of the file system name space from
the act of string copying (Linux does this too, though I  had to fix
a bug in an error path) is a necessary part of the required cleanup
for this to happen.


Still, we do not benefit from the change.


What does it do?  What use is the change?

In the end, it allows us to alter our argument packing.  But the same
effect of the optimizer that does that, and destabilizes the structure
and stack packing from the historical (and processor efficient) integer
norm, also has the effect of (potentiall) using register argument
passing at certain optimization levels.  This is, of course, crap.  It
won't work across the trap vector interface for all the registers gcc
likes to use.


If we are trying for this small an incremental improvement, I suggest
spending coding time on getting callee pop working within the kernel
code.  It would be a much higher gain for less effort.


> All this assumes that the args are in a struct.  In general, at the
> beginning of a syscall, some are in registers and some are in user memory.
> foo_os_syscall() currently has to copy them to args struct.  It might
> be better to delay the copying to the machine-generated
> foo_os_syscall_entry_xxx() functions to reduce the amount of copying
> to the wrong places and to support messy ABI's.

The copying is, in all cases, to the stack, by way of "push".  The use
of registers for calling in a system call interface is bogus.  Removing
the function call overhead by inlining the traps is one then, but trying
to get rid of stack argument passing to system calls is quite another.

This would not be a problem were it not for the prototypes that allowed
the compiler to bogusly "optimize" the calls to include register argument
passing for system call "functions".

The correct way to deal with tis problem is to disable such optimizations,
such that each system call becomes a push of the call number and a trap,
and can itself be inlined.

Probably this requires a #pragma on the compiler's behalf to force the
argument push.

It's very arguable that the compiler would generate incorrectly window
optimized code for inlined system calls at present.  Specifically,
it would fail to see the need to push the arguments.

Even with a workaround that forced the push in the inlined code, the
arguments would be incorrectly trated prior to the push, potentitally
resulting in pessimal code.

> >> >   which passes the args correctly (as void *).  Then we need to handle
> >> >   varargs functions and struct padding better.  I think all the details
> >> >   can be hidden in machine-generated functions so that the args structs
> >> >   and verbose macros to reference them don't have to appear in the core
> >> >   sources.
> 
> >I have macros that divorce K&R and ANSI vararg behaviour from the code
> >itself (I use them for various things myself).  Is this what we are
> >trying to accomplish?
> 
> No.  Varargs syscalls such as open(), fcntl() and shmsys() mess up the
> ABI.  The args for them have to be copied from different places depending
> on codes in the early args.  syscall() currently assumes that the args
> are on the stack in user space and copies more args than are necessary
> and sometimes more than are officially supplied.  Varargs for syscall
> xxx should handled by fetching them from the appropriate place in
> foo_os_syscall_entry_xxx() and passing them in the normal varargs way
> to xxx().

This is incorrect.  The argument count specified in the systent[] table
should result in the correct copyin size.

Note that if the size was not validly known in any case, then the amount
of stack copied would be mismatched, and some system calls would certainly
fail.

> 
> >> >   semsys() and shmsys() syscall interfaces are BAD because they
> >> >   multiplex several syscalls that have different types of args.
> >> >   There was no reason to duplicate this sysv braindamage but now
> >> >   we're stuck with it.  NetBSD has reimplemented the syscalls properly
> >> >   as separate syscalls #220-231.
> >>
> >> I agree.  This is yucky!
> > 
> >No, this is good -- system calls are a scarce resource and should be
> >consumed conservatively.  What's the problem you have with anonymous
> >argument vectors using subfunction codes?
> 
> No, this (not syscalls #220-231) is yucky.  Multiplexing syscalls
> takes more code to handle even incorrectly like it does now.

The amount of code is a single computed goto.  One might as well
argue that locking was "yucky" using this yardstick.  But it fails
the "yucky" result of varying argument size (it uses a single fixed
size structure that does not change).

The yardstick is flawed.

Depending on the  logical grouping of calls, you can easily argue that
a compare's worth of overhead is less expensive than a function call's
worth, in function preamble/postamble, if nothing else (ignoreing the
3-6 cycles depending on how the call is made).

> Non-multiplexed syscalls take one entry in sysent[] and a function
> to handle them; multiplexed ones require less entries in sysent[]
> and an extra function to demultiplex them.  The demultiplexing
> function can be simple and unportable like the current ones or
> complicated and portable or machine-generated.  In general it has
> to have code like freebsd_syscall_entry_read() for each of the
> variants.

What portability problems do you see in the system call multiplex
interfaces, and under what circumstances can you cause incorrect code
to be generated?

> >> We need a better way to handle these syscall subcodes (as SYSV calls 'em).
> >
> >I guess I don't really understand why these are a problem, unless you
> >are trying to do something silly, like prototype them.
> 
> They are a problem because they give more special cases to write code for.

As opposed to generating code?  I don't see less total code in the long
run, and applying a cookie cutter and forcing all the calls to fit the
mold is not an optimal approach to solving the problem.


The biggest problem I see in the system call interface is the use of
a per process call argument stack scratch area makes it per process
non-reentrant, or at the very least prevents context change until
the arguments have been decoded.  This introduces a large latency
in async operations through the normal vector, or sync-to-async
conversion through an alternate vector.

A better resoloution to this particular problem requires per entrancy
call argument stack scratch areas, and I see the proposed changes as
antithetical to that approach.


This is a sacrafice of long term goals for no perceivable short term gain.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510241906.MAA13880>