From owner-freebsd-arch  Thu Nov 30 14:47:11 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id C472A37B400
	for <arch@FreeBSD.ORG>; Thu, 30 Nov 2000 14:47:05 -0800 (PST)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id PAA26953;
	Thu, 30 Nov 2000 15:43:50 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp05.primenet.com, id smtpdAAA2NaOM0; Thu Nov 30 15:43:43 2000
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id PAA23413;
	Thu, 30 Nov 2000 15:46:56 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200011302246.PAA23413@usr05.primenet.com>
Subject: Re: HEADSUP user struct ucred -> xucred (Was: Re: serious problem
To: abial@webgiro.com (Andrzej Bialecki)
Date: Thu, 30 Nov 2000 22:46:55 +0000 (GMT)
Cc: tlambert@primenet.com (Terry Lambert),
	bright@wintelcom.net (Alfred Perlstein), arch@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.20.0011301037190.51755-100000@mx.webgiro.com> from "Andrzej Bialecki" at Nov 30, 2000 10:48:43 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> But don't we have the same issue with other parts of kernel structures
> that we don't want to make visible to userland, not just the
> mutexes.
> 
> I had some discussion with Robert Watson a few days ago about the need to
> hide the layout of struct proc (and the changes it undergoes) from
> userland, which would allow to stabilize kernel interface to user
> utilities, like libkvm and friends (which probably should use
> specialized sysctl anyway). This goal would be quite difficult to achieve
> with just macros (and ugly at that..), so we thought about fixing all
> places where these structs are accessible to use special version of "user
> space struct proc" (== struct xproc? :-).
> 
> This way no user space code will have to be changed (more than today,
> i.e. recompile libkvm et al., as usual), we could hide the complexities
> that we don't want to be visible outside the kernel, and we gain the
> stability in kernel/user interface (i.e. no more recompiles of userland
> needed if you update the kernel with changed struct proc size).

If you want to get technical, data interfaces are bad engineering,
a bad idea all around, and something which should be immediately
deprecated.  XML is surrounded by similar problems.


Really, there should be _NO_ reading of /dev/kmem, under any
circumstances.  Likewise, there should never be a case where
a kernel structure is copied out to user space directly: all
data to be externalized should be abstracted before it is
externalized.

So the canonically correct thing to so would be to surround
most of the kernel dependent headers with "#ifdef _KERNEL",
and not externalize _ANY_ structure declarations, whatsoever.

There are two major, and many minor, problems with this approach,
which boil down to data interfaces with no other available method
to solve the problem (today).  The first is latent interfaces,
and the second is bimodal interfaces.

A latent interface occurs when data is communicated with a
latency, and the latency is unavoidable, and can not be easily
worked around in code.  The number one latent interface is the
file system, with the latencies being present in newfs, tunefs,
fsck, and other utilities.  Since these utilities operate on
data which is not visible to the kernel (for good reason!) at
the time of the operation, the only option is a latent interface,
or rolling the functionality into the kernel itself.  This could
be done, but it's prohibitively expensive without discardable
code segments, which, while supported by ELF, are not supported
by FreeBSD.  Even were these supported by FreeBSD, you would
still need to deal with discrete kernel object files, since the
issue of license can not be resolved in a static linkage.  In
other words, it's possible to deal with this (Windows supports
ELF [PE: Portable Executable] objects with segment attributes,
including "initialization", "discardable", "pageable", etc.),
but FreeBSD does not have the necessary technical sophistication
at the present time.

A bimodal interface is an interface intended to operate both
interactively, and against latent data, potentially with huge
latencies which can not be overcome with segment attribution,
etc..  An example of an interface like this is the interface
used by the "ps" command in order to obtain information from
the current system image (the granddaddy of all of these is a
kernel debuger).  Since the "ps" command must be able to run
against the existing system, and it must be able to run against
a crashdump of a system, perhaps sent via parcel post or carrier
pigeon, the interfaces it uses can not be seperated from the
data against which they are implemented.

Worst case, "_KERNEL" could be defined in scope, and the
utilities could remain in user space.


The second case here is the most interesting, and the most
applicable to the ucred structure under discussion.

Actually, the "ps" command has limited utility against a crash
dump.  This is because it is linked against a libkvm, and has
itself intimate knowledge of a kernel structure (a historically
volatile one -- proc -- which is shown no signs of stabilizing,
in fact).  The libkv, provides symbolic reference to the kmem
image data base addresses, which can then be followed as linked
lists in order to obtain information.  The information is then
interpreted by the "ps" program itself, based on its knowledge
of the structure contents.

I think in the limit, this interface will have to die.  Consider
the case of a "ps" command in user space, with the proc struct
list protected by mutex from multiple CPUs and/or kernel
preemption: the user space program will neither honor, nor will
it itself assert, the protection mutex.  This means that it may
be running on one processor, while another is manipulating the
structure linkages.  Best case failure mode is the user space
process sees the list appear to terminate prematurely.  Worst
case, the user space process causes a fault while reading kmem,
or sees a circular reference, and fails to terminate properly,
spending all its time traversing the circular reference.

Another problem that will commonly arise is that the proc struct
known to the "ps" program, or the information known to the libkvm,
will change.  When you go to apply this information to an older
image, the newer tools will not operate.  It's a royal pain, but
it is possible to resynchronize this information in the common
interactive case, by insisting that builds be grouped.  For the
latent data case, this will not work.  In fact, most people who
follow -current have, at one time or another, found themselves
booted on a "kernel.old" because the new "kernel" was too unstable
to use, even to correct the stability problem as a bootstrap for
replacing itself.  When this happens subsequent to a rebuild of
libkvm and "ps" (and other utilities, such as "mount"), it is not
as easy to revert the rest of the system as it was to revert the
kernel.

One way to deal with this problem would be to attach segments
to the running kernel, which implement libkvm.  Programs could
map these in and use them as they would use any shared library
to get kvm information.  This is attractive, since it means that
you could map your libkvm from the crashdump image, instead of
the running kernel, or an old kernel (if symbols could not be
obtained from the dump image, only from the kernel of which the
image is a dump; I dislike this, as it means pushing around
synchronized file sets, but it's at least a workable kludge).
In this scenario, the libkvm/kernel synchronization problem has
been resolved.

This still leaves us with the "ps" program knowing about the
proc structure (and the "mount" program knowing about the mount
parameter structure, etc.).  This initimate knowledge can only
be worked around by abstraction.  This might consist of providing
a set of descriptors for data elements, and externalizing this as
"ps" formatting argument strings, etc..  These descriptors could
be bundled in with what was previously described as the shared
objects that could be bundeled with the kernel, and mapped by
user programs.  This would provide a generic API to a protocol,
defined by the descriptors interpretation at compile time and at
runtime of the program using the descriptors.  Not as abstract as
SMTP, but a lot better than an application centric API for doing
the same thing, and infinitely better than a data interface.

This still doesn't resolve the SMP problem.  This could be
handled by externalizing access to the locks to user space.
This would, IMO, be a terrible mistake.

A second approach would be to define an access point that could
act as an API when used interactively, and as a data interface
when used latently.  This is actually rather easy, when you
realize that latent use will be against a static snapshot, and
not have to worry about locking.  The locking can be hidden
behind the API, and the API can straddle a user/kernel boundary.
For "ps", the most logical API is a procfs.  The procfs can
act as a descriptor tree automatically, since FSs are themselves
hierarchical in nature.  Similarly, the in-core implementation
is such that the structure representing it can be traversed as
data, in a static image (ideally, however, one would want to
"fake" an FS interface, so as to keep the "shared library"
segments of the kernel small, even though they are never loaded
by the kernel into the kernel address space; this "faking" could
be done by abstracting file I/O using libkvm descriptors, and by
providing control over syspace vs. userspace copying when trying
to do a "uiomove" to externalize FS data).

In any case, the SMP problem means that the data interfaces must
die, at least in as far as they apply to active systems, rather
than crash dumps.

If they die, then there is no kernel structure externalization
to worry about (with the side benefit of not needing to recompile
"ps" and the rest of the tools which use kmem or externalized
kernel structures, each time those structures are changed).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message