Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Dec 1999 00:10:25 -0500 (EST)
From:      Robert Watson <robert@cyrus.watson.org>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: Request for objections: extended attribute and ACL interfaces
Message-ID:  <Pine.BSF.3.96.991216000820.24846B-100000@fledge.watson.org>
In-Reply-To: <199912160023.RAA26871@usr09.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Terry, 

First let me start by saying that I greatly appreciate your taking the
time to send your comments--I value your feedback.  Please see my comments
below. 

> I personally have no objection to these interfaces.  They seem to
> cover the problem space that you say that they cover, and they are
> at worst, harmless.

The only functionality I'm not sure how to handle is identifying the
attributes available on the file for the presence of duplication--where
they back another file system visible object (such as ACLs, MAC labels,
etc) then those interfaces should be used to back up the attributes.  It
isn't clear to me that attributes should be backed up independent of
semantics available to interpret them, as they will most often be used for
file system services where restoring them and associating them with the
same file may not make sense, nor even be possible given the rights and
functionality associated with what they back.  I am comfortable not
defining that behavior just now, and waiting to see what applications
require it.  I can also conceive of write-only attributes, etc. 

> > The UFS EA supporting code would not be committed, as it is
> > experimental and not well tested.  Neither will the UFS ACLs over
> > EAs support.  Both of these will be made available soon, however,
> > and will rely on having vnops/vfsops/syscalls assigned and
> > available.  :-)
>
> I object to these patches ever being committed.  They are not truly
> UFS specific, and should be placed in a stacking layer so that they
> can be applied to any FS via normal stacking semantics.

Whether or not these changes are committed is a matter, of course, for
general debate and I welcome any constructive discussion of both the
general architecture of the changes, and my specific implementation.  Onne
of the goals of adding the interfaces first was to provide a development
framework for those needing to rely on the functionality, and to make them
available so that consumers of these interfaces could determine if the
interfaces and semantics met their needs. 

First, I'd like to address the issue of whether or not extended attributes
and ACLs should be made available as a stacked layer service, or whether
they should be in the base file system.  As you read in my detailed
documentation of these services, there are a number of existing
implementations that we can reference in looking for advantages and
disadvantages to various approaches.  The previous implementations that I
have in mind are the following: 

- Trusted IRIX XFS: extended attributes for supporting extended security
labels, including ACLs, MAC labels, capabilities. 

- IBM HPFS: general purpose named, extended attributes, used to store
program information, access control attributes, icons, etc. 

- Transarc AFS: per-directory ACLs

- CMU Coda: per-directory ACLs

- Solaris UFS: per-file ACLs

- Linux ACL extensions: per-file ACLs

- Microsoft NTFS: named file forks

- UDF: named file forks

- Apple HFS+: named file forks

- Jeff Weidner's EA + ACL layers

One of the first things that becomes clear, as you allude to below, is
that in most cases, access control information for file systems is
considered an integral part of the file system meta-data.  The only
exception in the list above, other than those that don't store access
control information at all, is HPFS, which uses extended attributes for
this purpose.  ACLs in most of these have different properties:  they
refer to different principal namespaces (UFS: local uids and gids; AFS:
centrally administered viceids mapped to Kerberos principals, ...) 
Similarly they have different semantics: AFS applies ACLs to directories
only, and not files.  The set of permissions is far more detailed than
UFS, and does not rely on an "owner".  On the other hand, Solaris and
Linux ACLs are mapped one per file, and two in the case of directories,
and attempt to closely follow the existing POSIX permission set, and are
strictly a superset of the base POSIX uid/gid + permissions concept.  As I
describe in my documents, and implement, the only common feature seems to
be the syntax of ACLs, which is generally consistent in assigning a mask
of rights to a numerically identified principal. 

Given that the goal of VFS is to represent that which is common between
file systems, it makes sense to expose the syntax, and not the semantics,
of ACLs.  Access control choices are made in the file system itself, or as
you point out, in a layer.  Layering allows common semantics to be applied
in a specific area of functionality, over what are potentially file
systems with quite different semantics: for example, you can imagine
composing a caching layer over a combination of NFS and Coda; you could
similarly imagine composing namespace extensions and a base file system to
expand or restrict naming capabilities.  You suggest layering access
control extensions over UFS, which is indeed possible (Jeff Weidner made a
first pass at this).  However, I would argue that this may not be
desirable; in fact, you appear to agree that extended access controls
might be useful in the base UFS as you suggest using spare fields.  It's
useful to note that if you're willing to go entirely to a new access
control mechanism, you can reclaim at least 8 bytes, as you no longer need
the file owner and group.  You can also grab at the permission bits, but
that's probably not worth it. 

As such, my opinion is that ACLs are not something that should be
introduced via a layer--I find especially concerning the limitations of
layering in terms of persistent state consistency management between
layers.  For example, in the event of a system failure (power loss, if you
will), not only must you keep both layers individual consistent (say, two
UFS file systems, one the real files, one backing the ACLs), but you must
also keep the two consistent with one another.  In my mind, it is not
acceptable to permit divergence of access control data and other file
system meta-data.  Of course, this is already a problem under UFS, but
only becomes more so without transactional support on a per-file system
basis, and then transactional consistency mechanisms between layers
(nested transactions, perhaps?)  At the very least, the layers must be
resynchronized. 

I'm also concerned with the assignment of vnode backing objects between
layers: is there a consistent and generalized way to map objects in one
layer to objects in the layer below?  When no persistent state is
involved, it's fairly straight forward to simply apply appropriate changes
to the layers in parallel, and dynamically construct node objects in the
top layer to map to active nodes in the bottom layer, via the name
lookups, etc.  But when you have persistent state cross-session in both
layers, it becomes more difficult.  This is the same problem that was
discussed on -hackers recently: Coda and AFS do not provide "inode number"
semantics for the value exposed via vop_getattr in the vattr field: how to
identify two instances of the same object.  It does not make sense, for
example, to tag the top layer object (an ACL, say) with the dev_t and
inode_t of the bottom layer object, as those are fs-specific values with
semantics that may differ from the extremely strong POSIX semantics, which
require that over the life time of an object, that specific inode_t and
dev_t, in combination, uniquely identify the object. 

In the context of a distributed file system this makes little sense, and
is hard for systems such as Coda and AFS (96 bit file ids, or fids) to
emulate without excessive cost.  They use a hash to hack around this
requirement, but in operating systems that uniquely identify objects in
kernel by unique dev_t and inode_t (Linux, for example) you see all hell
break loose when a collision occurs.  As BSD uses vnode pointers to verify
object uniqueness in kernel (leaving aside aliasing, another issue), this
is not a problem.  Some userland applications, such as tar, attempt to use
the unique inode_t to check for the same file, but this is known not to
work in AFS, etc, and should not work.  A samefile() call has been
suggested, comparing two fd's, but is not implemented.  But just as the
userland application should not determine based on these effectively
non-unique numbers the mapping between fs objects, neither should a
stacked layer.  I assume this issue has been addressed, possibly by assume
parallel naming structures (i.e., apply all name operations to the top
layer as to the bottom?), but do not know how this was done. I'd imagine
this would make it easy to create layers that just maintain general fs
data in the root of the file system, but hard to create layers that attach
object-specific data, such as attributes, to objects in lower layers. 
Especially in the context of a service such as hard links, or when an
ACLfs is layered over a namespace built of different file system services
(say, AFS and UFS), or a transient service, such as procfs.  Some of this
can be solved by upcalls and better synchronization, but in many cases
such synchronization is not possible or feasible. 

Leaving aside these concerns about the feasibility of stacking as a
mechanism for mapping attributes onto objects, I recognize that
layering/stacking is extremely useful functionality, and understand that
significant work is underway to remedy locking problems present in the
current implementation, and look forward to this service being available. 

> I assume that you are using reserved fields in order to do this?
> If so, this may justify a genberic interface to get at reserved
> inode data areas from an overlying layer.  However, I would
> prefer that some other method be used, and that the existing
> reserved areas be partially used to move the nanosecond time
> on the modification date off, and fix the Y2038 problem the
> way that the authors of FFS intended.  If there are fields left
> over after that, then they can be used for specific applications
> (like ACLs).

Actually, this is not the technique I am using.  I followed the lead of
the quota file system code, storing attributes in backed vnodes, indexed
by inode on a per-fs basis.  This implementation method is subject to
debate, but seems to work well as it stands.  As with Quotas, I provide
vfs call, extattrctl, which allows an appropriately privileged user
process to pass a vnode reference (via a file path, probably in the same
fs as the attributes, but not necesarily) down to UFS for a source of
backing data.  This requires one backing file per attribute per fs. 
Again, similar to the quota implementation supporting different named
quotas, a chain of vnodes is maintained, each with a name identifying
which attribute it corresponds to.  When an attribute is retrieved or
submited, the inode number (acceptable to use since we're within the UFS
layer where we know this information does uniquely identify the file) is
mapped to a location in the file where attribute data may be found.  This
file is usually a sparse file, as attributes are not always applied to all
files; in fact, given that POSIX.1e ACLs are a superset of the permission
behavior, it is expected that many files will not have extended ACLs.  The
attribute file stores various other pieces of meta-data about the
attributes, including whether they are defined for a particular inode (as
with environment variables, it's possible to have an attribute be
undefined, or defined but with no contents, and they should be
distinguishable). 

These implementation choices impose a number of semantic differences from
other extended attribute implementations.  For example, unlike HPFS, there
is a fixed maximum size per-attribute, but this maximum is independent of
other attribute storage on a particular file.  In HPFS, the total size of
the attributes is not permitted to exceed 64k.  Also, attributes must be
initialized and configured by an appropriately privileged entity, most
likely root in the default BSD security environment, and at that
initialization time access rights are set, maximum sizes, etc.  Arbitrary
attributes may not be defined, in this environment.  These differences in
semantics suggest, as with other file system behavior, that the
generalized properties be exposed in the VFS via vop_ calls, but that
individual file systems can behave differently.  In this manner,
attributes are like all other file system behaviors: optimized for a
particular environment, and the needs of the designer. 

If layering worked correctly, and supported an adequate mechanism for
mapping ACLs onto objects consistently between layers, I would be willing
to consider reimplementing ACLs as a layer.  Similarly, I suppose those
would be possible for attributes.  But I object to the idea that extended
attributes and ACLs are not specific to UFS: any individual implementation
will be optimized and designed to solve particular needs, in a particular
way, and based on the filestore properties.  In Linux, for example, it was
observed that ACLs are often the same across a number of files: as a
result, they allocate ACL blocks in the file store, and refcount them,
permitting files with the same ACLs to point to the same ACL blocks.  They
do this probabalistically, of course, as they cannot keep all ACLs in
memory, but it's a useful,file-system specific optimization.  In Solaris,
different choices were made: they choose to use "shadow inodes"  and allow
ACLs of (effectively) unlimited length. 

HPFS, because there is available space in the Fnode, the designers
initially store extended attributes there, and then overflow into a block
run when additional space is required.  The designers of HPFS anticipated
that the base attributes would need fast access, but that additional
attributes were desirable given their intent to move towards an
object-oriented file system environment.  This is also perfectly
acceptable as a local file system optimization that is HPFS-specific.  In
NTFS, a generic named file fork service is provided, where the forks are
complete address spaces with the full flat-file semantic.  This could be
viewed as an extended attribute, although as I describe in my document my
feeling is this is not appropriate to do :-).  The Apple file system
supports only specific forks: these are legitimate local design choices,
that while they could be provided as file system layers, made sense not
to.  My personal feeling on the file fork issue is that we already have a
mechanism to provide multiple complete address spaces under a path, each
with a different name.  We even allow this to be done recursively.  It's
called a directory tree, and the address spaces are known as files. :-) 

As such, I disagree with the assertion that it is inappropriate to add new
services to existing file stores: to fail to do so would be to fail to pay
attention to changing requirements.  While layering is a useful
extensibility mechanism, it has failed to deliver, and may continue to
fail to deliver.  By keeping ACL and Extended Attribute data within a
single file system, we allow that file system to provide consistency
guarantees not provided in the extisting layering model.  You can imagine
transactional and logging file systems maintaining versioning information
on the attributes themselves, as in fact I believe XFS can do.  While you
could theoretically extend layering interfaces to provided nested
transactional semantics, theres the important question of whether this is
really feasible, and given the added complexity, entirely desirable. 
Given a sufficently powerful layering mechanism, many things are possible. 
If there were any easier way to provide ACL, MAC, and Capability support
under UFS, the chances are I would have done it, but I didn't seem to find
one. 

Whether or not the UFS extended attribute code ends up being committed is
a decision I don't think anyone is ready to make yet, given that I'm
probably the only person who has seen them.  :-)  Either way, the services
are required in a number of environments in the immediate future.  In the
short and long term, having both the interfaces defined for these services
will facilitate a lot of further development, including improved support
for HPFS, and the ability to introduce features associated with the
trusted extensions operating system extensions in POSIX.1e. 

Robert Watson
Research Scientist, TIS Labs at Network Associates




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.991216000820.24846B-100000>