From owner-freebsd-arch Thu Nov 30 14:47:11 2000 Delivered-To: freebsd-arch@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id C472A37B400 for ; Thu, 30 Nov 2000 14:47:05 -0800 (PST) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id PAA26953; Thu, 30 Nov 2000 15:43:50 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAA2NaOM0; Thu Nov 30 15:43:43 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id PAA23413; Thu, 30 Nov 2000 15:46:56 -0700 (MST) From: Terry Lambert Message-Id: <200011302246.PAA23413@usr05.primenet.com> Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem To: abial@webgiro.com (Andrzej Bialecki) Date: Thu, 30 Nov 2000 22:46:55 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), bright@wintelcom.net (Alfred Perlstein), arch@FreeBSD.ORG In-Reply-To: from "Andrzej Bialecki" at Nov 30, 2000 10:48:43 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > But don't we have the same issue with other parts of kernel structures > that we don't want to make visible to userland, not just the > mutexes. > > I had some discussion with Robert Watson a few days ago about the need to > hide the layout of struct proc (and the changes it undergoes) from > userland, which would allow to stabilize kernel interface to user > utilities, like libkvm and friends (which probably should use > specialized sysctl anyway). This goal would be quite difficult to achieve > with just macros (and ugly at that..), so we thought about fixing all > places where these structs are accessible to use special version of "user > space struct proc" (== struct xproc? :-). > > This way no user space code will have to be changed (more than today, > i.e. recompile libkvm et al., as usual), we could hide the complexities > that we don't want to be visible outside the kernel, and we gain the > stability in kernel/user interface (i.e. no more recompiles of userland > needed if you update the kernel with changed struct proc size). If you want to get technical, data interfaces are bad engineering, a bad idea all around, and something which should be immediately deprecated. XML is surrounded by similar problems. Really, there should be _NO_ reading of /dev/kmem, under any circumstances. Likewise, there should never be a case where a kernel structure is copied out to user space directly: all data to be externalized should be abstracted before it is externalized. So the canonically correct thing to so would be to surround most of the kernel dependent headers with "#ifdef _KERNEL", and not externalize _ANY_ structure declarations, whatsoever. There are two major, and many minor, problems with this approach, which boil down to data interfaces with no other available method to solve the problem (today). The first is latent interfaces, and the second is bimodal interfaces. A latent interface occurs when data is communicated with a latency, and the latency is unavoidable, and can not be easily worked around in code. The number one latent interface is the file system, with the latencies being present in newfs, tunefs, fsck, and other utilities. Since these utilities operate on data which is not visible to the kernel (for good reason!) at the time of the operation, the only option is a latent interface, or rolling the functionality into the kernel itself. This could be done, but it's prohibitively expensive without discardable code segments, which, while supported by ELF, are not supported by FreeBSD. Even were these supported by FreeBSD, you would still need to deal with discrete kernel object files, since the issue of license can not be resolved in a static linkage. In other words, it's possible to deal with this (Windows supports ELF [PE: Portable Executable] objects with segment attributes, including "initialization", "discardable", "pageable", etc.), but FreeBSD does not have the necessary technical sophistication at the present time. A bimodal interface is an interface intended to operate both interactively, and against latent data, potentially with huge latencies which can not be overcome with segment attribution, etc.. An example of an interface like this is the interface used by the "ps" command in order to obtain information from the current system image (the granddaddy of all of these is a kernel debuger). Since the "ps" command must be able to run against the existing system, and it must be able to run against a crashdump of a system, perhaps sent via parcel post or carrier pigeon, the interfaces it uses can not be seperated from the data against which they are implemented. Worst case, "_KERNEL" could be defined in scope, and the utilities could remain in user space. The second case here is the most interesting, and the most applicable to the ucred structure under discussion. Actually, the "ps" command has limited utility against a crash dump. This is because it is linked against a libkvm, and has itself intimate knowledge of a kernel structure (a historically volatile one -- proc -- which is shown no signs of stabilizing, in fact). The libkv, provides symbolic reference to the kmem image data base addresses, which can then be followed as linked lists in order to obtain information. The information is then interpreted by the "ps" program itself, based on its knowledge of the structure contents. I think in the limit, this interface will have to die. Consider the case of a "ps" command in user space, with the proc struct list protected by mutex from multiple CPUs and/or kernel preemption: the user space program will neither honor, nor will it itself assert, the protection mutex. This means that it may be running on one processor, while another is manipulating the structure linkages. Best case failure mode is the user space process sees the list appear to terminate prematurely. Worst case, the user space process causes a fault while reading kmem, or sees a circular reference, and fails to terminate properly, spending all its time traversing the circular reference. Another problem that will commonly arise is that the proc struct known to the "ps" program, or the information known to the libkvm, will change. When you go to apply this information to an older image, the newer tools will not operate. It's a royal pain, but it is possible to resynchronize this information in the common interactive case, by insisting that builds be grouped. For the latent data case, this will not work. In fact, most people who follow -current have, at one time or another, found themselves booted on a "kernel.old" because the new "kernel" was too unstable to use, even to correct the stability problem as a bootstrap for replacing itself. When this happens subsequent to a rebuild of libkvm and "ps" (and other utilities, such as "mount"), it is not as easy to revert the rest of the system as it was to revert the kernel. One way to deal with this problem would be to attach segments to the running kernel, which implement libkvm. Programs could map these in and use them as they would use any shared library to get kvm information. This is attractive, since it means that you could map your libkvm from the crashdump image, instead of the running kernel, or an old kernel (if symbols could not be obtained from the dump image, only from the kernel of which the image is a dump; I dislike this, as it means pushing around synchronized file sets, but it's at least a workable kludge). In this scenario, the libkvm/kernel synchronization problem has been resolved. This still leaves us with the "ps" program knowing about the proc structure (and the "mount" program knowing about the mount parameter structure, etc.). This initimate knowledge can only be worked around by abstraction. This might consist of providing a set of descriptors for data elements, and externalizing this as "ps" formatting argument strings, etc.. These descriptors could be bundled in with what was previously described as the shared objects that could be bundeled with the kernel, and mapped by user programs. This would provide a generic API to a protocol, defined by the descriptors interpretation at compile time and at runtime of the program using the descriptors. Not as abstract as SMTP, but a lot better than an application centric API for doing the same thing, and infinitely better than a data interface. This still doesn't resolve the SMP problem. This could be handled by externalizing access to the locks to user space. This would, IMO, be a terrible mistake. A second approach would be to define an access point that could act as an API when used interactively, and as a data interface when used latently. This is actually rather easy, when you realize that latent use will be against a static snapshot, and not have to worry about locking. The locking can be hidden behind the API, and the API can straddle a user/kernel boundary. For "ps", the most logical API is a procfs. The procfs can act as a descriptor tree automatically, since FSs are themselves hierarchical in nature. Similarly, the in-core implementation is such that the structure representing it can be traversed as data, in a static image (ideally, however, one would want to "fake" an FS interface, so as to keep the "shared library" segments of the kernel small, even though they are never loaded by the kernel into the kernel address space; this "faking" could be done by abstracting file I/O using libkvm descriptors, and by providing control over syspace vs. userspace copying when trying to do a "uiomove" to externalize FS data). In any case, the SMP problem means that the data interfaces must die, at least in as far as they apply to active systems, rather than crash dumps. If they die, then there is no kernel structure externalization to worry about (with the side benefit of not needing to recompile "ps" and the rest of the tools which use kmem or externalized kernel structures, each time those structures are changed). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message