From owner-freebsd-arch@FreeBSD.ORG Sun Nov 9 23:04:01 2008 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 232C61065674 for ; Sun, 9 Nov 2008 23:04:01 +0000 (UTC) (envelope-from peter@wemm.org) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.232]) by mx1.freebsd.org (Postfix) with ESMTP id 0A35D8FC12 for ; Sun, 9 Nov 2008 23:04:00 +0000 (UTC) (envelope-from peter@wemm.org) Received: by rv-out-0506.google.com with SMTP id b25so2055239rvf.43 for ; Sun, 09 Nov 2008 15:04:00 -0800 (PST) Received: by 10.142.178.2 with SMTP id a2mr2018990wff.214.1226270178409; Sun, 09 Nov 2008 14:36:18 -0800 (PST) Received: by 10.142.255.21 with HTTP; Sun, 9 Nov 2008 14:36:18 -0800 (PST) Message-ID: Date: Sun, 9 Nov 2008 14:36:18 -0800 From: "Peter Wemm" To: "Kostik Belousov" In-Reply-To: <20081109203848.GP18100@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20081109192746.GO1165@hoeg.nl> <20081109203848.GP18100@deviant.kiev.zoral.com.ua> Cc: Ed Schouten , FreeBSD Arch Subject: Re: pipe(2) calling convention: why? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Nov 2008 23:04:01 -0000 On Sun, Nov 9, 2008 at 12:38 PM, Kostik Belousov wrote: > On Sun, Nov 09, 2008 at 08:27:46PM +0100, Ed Schouten wrote: >> Hello all, >> >> After having a discussion on IRC with some friends of mine about system >> call conventions, we couldn't exactly determine why pipe(2)'s calling >> convention has to be different from the rest. Unlike most system calls, >> pipe(2) has two return values. Instead of just copying out an array of >> two elements, it uses two registers to store the file descriptor >> numbers. >> >> It seems a lot of BSD-style system calls used to work that way, but >> pipe(2) seems to be the only system call on FreeBSD that uses this >> today. Some system calls only seem to set td_retval[1] to zero, which >> makes little sense to me. Maybe those assignments can be removed. >> >> In my opinion there are a couple of disadvantages of having multiple >> return values: >> >> - As documented in syscall(2), there is no way to obtain the second >> return value if you use this functions. >> >> - Each of those system calls needs to have its own implementation >> written in assembly for each architecture we support. Why can hundreds >> of system calls be handled in a generic fashion, while interfaces like >> pipe(2) can't? >> >> As a small experiment I've written a patch to allocate a new system call >> (506) which uses a generic calling convention to implement pipe(2). It >> seems Linux also uses this method, so I've removed linux_pipe() from the >> Linuxolator as well, which seems to work. >> >> I could commit this if people think it makes sense. Any comments? >> > > The convention of returning pipe descriptors in the registers comes > back at least to the Six Edition. Check the Lion' book for the reference. > Amusingly, Solaris uses the same calling convention for pipe(2). > > I do not see what we gain by the change. Now, we have one syscall and > some arch-dependend wrappers in the libc. After the patch, we get rid > of the wrappers, but grow two syscalls. > > The only reason of doing this I can imagine is to allow syscall(2) to > work for SYS_pipe from C code. Since we did not heard complaints about > this for ~15 years, we can live with it. > The other side effect of the change is to remove one asm instruction code in the syscall handler and replace it by potentially hundreds of instructions to do the copyout. Plus we gain another syscall, lose backwards compatability with kernel.old again, and so on. I really don't see an overall benefit. What I do see some use for is to do the kern_pipe() split (like in the patch) which simplifies the linux abi wrappers (and other ABI wrappers, not just linux!). Just have our syscall return in retval[0] and [1] like before. But we get the benefit of simplifying a bunch of wrappers. The patch is incomplete anyway, It leaks fds if the copyout fails. There is a comment about this in the patch anyway. Other historical notes.. Ancient unix systems used to implement syscalls by having userland do a call (jsr) to a shared page. The trap handler would verify the entry point, and if it was approved, it would then give privilige and continue. The problem was that this severely limited the number of syscalls because we were talking tiny address spaces. Given that syscall numbers were at a premium, it made sense to pack as much functionality into syscalls as possible. eg: getpid syscall could return both pid and ppid, saving a kernel syscall entry point, and so on. This is also one of the reasons for SIGSYS. Calling an illegal kernel entry point in a process that had run wild could be easily converted into a signal. WIld processes could easily hit the kernel entry points. Again, this doesn't really apply these days. It is somewhat archaic by today's standards - linux doesn't even bother with SIGSYS - it has bad syscalls just return ENOSYS. fork() currently uses both retval[0] and [1], in spite of it appearing not to. See cpu_fork() for the other half. We use both return values for 64 bit returns. eg: lseek(). Some places that set it to 0 are silly. I really don't see td_retval[0] and td_retval[1] ever going away entirely, at least not while we share the syscall vector between 32 and 64 bit systems. I don't think it is worth breaking kernel.old compatability, replacing the current syscall for pipe() with a slower one, and having to have both anyway is much of a win. Splitting pipe() and kern_pipe() would help ABI wrappers. I don't see value in adding a new way for pipe(2) to fail (right now, pipe(2) causes a segfault if you pass a bad address. The new wrapper causes it to return EFAULT instead, and NOT crash the app. The failure mode has changed.) As an aside.. I'm very very very painfully aware of the dual return from syscalls. I've been fighting with this in valgrind for quite some time now. We have some very interesting semantics on i386. * syscalls preserve all registers except for %eax and %eflags. Even scratch registers. * .. except for %edx sometimes, for 64 bit returns, or dual-returns. Otherwise %edx is preserved. * libc depends on this in a couple of hand-written asm stubs, eg: brk()/sbrk(). Nothing else cares about this. * some libc syscall wrappers trash the scratch registers though. * in spite of syscalls not using C calling conventions, the kernel assumes you've done a C-style call to libc. It assumes the C return address was pushed onto the stack before the args. In retrospect I wish it never had started out this way. But it did, it still is, and I feel the costs of changing it are not worth it for such little gain. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell