FreeBSD Mail Archives

Date:      Fri, 17 Dec 1999 02:04:17 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        robert+freebsd@cyrus.watson.org
Cc:        tlambert@primenet.com, freebsd-fs@FreeBSD.ORG
Subject:   Re: Request for objections: extended attribute and ACL interfaces
Message-ID:  <199912170204.TAA08405@usr07.primenet.com>
In-Reply-To: <Pine.BSF.3.96.991216000820.24846B-100000@fledge.watson.org> from "Robert Watson" at Dec 16, 99 00:10:25 am

> Terry, 
> 
> First let me start by saying that I greatly appreciate your taking the
> time to send your comments--I value your feedback.  Please see my comments
> below. 
> 
> > I personally have no objection to these interfaces.  They seem to
> > cover the problem space that you say that they cover, and they are
> > at worst, harmless.
> 
> The only functionality I'm not sure how to handle is identifying the
> attributes available on the file for the presence of duplication--where
> they back another file system visible object (such as ACLs, MAC labels,
> etc) then those interfaces should be used to back up the attributes.  It
> isn't clear to me that attributes should be backed up independent of
> semantics available to interpret them, as they will most often be used for
> file system services where restoring them and associating them with the
> same file may not make sense, nor even be possible given the rights and
> functionality associated with what they back.  I am comfortable not
> defining that behavior just now, and waiting to see what applications
> require it.  I can also conceive of write-only attributes, etc. 

The reason I said "harmless" is rather deeper.

The Heidemann framework supports the ability to add VOP's to an
existing system.  FreeBSD doesn't currently support this, but it
could be fixed to do so, simply by making *vfs_op_descs[] a
pointer reference, and reallocating it as needed in order to
grow the descriptor list using loaded descriptors.

For this to work safely, a secondary step of reallocating the
instance structures larger for exisiting VFS instances would do
the trick, such that the new VOPs didn't dereference off the end
of the instance structures, should you call them on an existing
VFS.

To get rid of the overhead of the pointer dereference, you could
simplify the descriptor calls.  The easiest way to do this would
be to sort the decriptors at instance time, to ensure that all
descriptors in all VFS instances were in a particular order, and
then use an integer index to dereference them, instead of having
to dereference them through both *vfs_op_descs[] and the reverse
lookup mechanism.  This should also result in a speed improvement,
anyway, compared to what is currently done.

System calls can also be loaded.

At this point, there doesn't seem to be any reason to statically
put this stuff in the kernel, but at least it's harmless.


> > > The UFS EA supporting code would not be committed, as it is
> > > experimental and not well tested.  Neither will the UFS ACLs over
> > > EAs support.  Both of these will be made available soon, however,
> > > and will rely on having vnops/vfsops/syscalls assigned and
> > > available.  :-)
> >
> > I object to these patches ever being committed.  They are not truly
> > UFS specific, and should be placed in a stacking layer so that they
> > can be applied to any FS via normal stacking semantics.
> 
> Whether or not these changes are committed is a matter, of course, for
> general debate and I welcome any constructive discussion of both the
> general architecture of the changes, and my specific implementation.  Onne
> of the goals of adding the interfaces first was to provide a development
> framework for those needing to rely on the functionality, and to make them
> available so that consumers of these interfaces could determine if the
> interfaces and semantics met their needs. 

Clearly, it is good to have a reference implementation; however,
I believe that it's possible to have the entire framework be
dynamically loadable.

My objection to commiting UFS specific ACL patches is not merely
that they potentially consume fields that are necessary to solve
the Y2038 issue (and were intended for that purpose), by displacing
the fields that we would use to allow (IMO, unnecessary: think monoclock)
nanosecond resolution on the one field that could be argued to need
it, mtime.  Further, it must render the fields that it uses unavailable
to future researchers.

I think it's also significant that, with these changes mandatorily
in UFS, the supporting code, which I've demonstrated above could be
rendered dynamically loadable, must instead become part of the base
kernel definition.


> First, I'd like to address the issue of whether or not extended attributes
> and ACLs should be made available as a stacked layer service, or whether
> they should be in the base file system.  As you read in my detailed
> documentation of these services, there are a number of existing
> implementations that we can reference in looking for advantages and
> disadvantages to various approaches.  The previous implementations that I
> have in mind are the following: 
> 
> - Trusted IRIX XFS: extended attributes for supporting extended security
> labels, including ACLs, MAC labels, capabilities. 
> - IBM HPFS: general purpose named, extended attributes, used to store
> program information, access control attributes, icons, etc. 
> - Transarc AFS: per-directory ACLs
> - CMU Coda: per-directory ACLs
> - Solaris UFS: per-file ACLs
> - Linux ACL extensions: per-file ACLs
> - Microsoft NTFS: named file forks
> - UDF: named file forks
> - Apple HFS+: named file forks
> - Jeff Weidner's EA + ACL layers

I would add VXFS (Veritas), FILES-11 (DEC VMS), NXFS (Novell -- mine,
from the NetWarefor UNIX product), NWFS (Novell -- NetWare [mine]),
and the object file system in the "Choices" OS from the University of
Kentucky, which adds ACLs through object inheritance.  I don't have a
comprehensive list handy, and unfortunately, won't be able to look
for one until after the first of the year.


> One of the first things that becomes clear, as you allude to below, is
> that in most cases, access control information for file systems is
> considered an integral part of the file system meta-data.  The only
> exception in the list above, other than those that don't store access
> control information at all, is HPFS, which uses extended attributes for
> this purpose.  ACLs in most of these have different properties:  they
> refer to different principal namespaces (UFS: local uids and gids; AFS:
> centrally administered viceids mapped to Kerberos principals, ...) 
> Similarly they have different semantics: AFS applies ACLs to directories
> only, and not files.  The set of permissions is far more detailed than
> UFS, and does not rely on an "owner".  On the other hand, Solaris and
> Linux ACLs are mapped one per file, and two in the case of directories,
> and attempt to closely follow the existing POSIX permission set, and are
> strictly a superset of the base POSIX uid/gid + permissions concept.  As I
> describe in my documents, and implement, the only common feature seems to
> be the syntax of ACLs, which is generally consistent in assigning a mask
> of rights to a numerically identified principal. 

I'd also suggest that it might be best to place ACLs in an EA,
and merely support EAs in the storage layer, reserving ACL
management and operation for a semantic layer.


> Given that the goal of VFS is to represent that which is common between
> file systems, it makes sense to expose the syntax, and not the semantics,
> of ACLs.  Access control choices are made in the file system itself, or as
> you point out, in a layer.  Layering allows common semantics to be applied
> in a specific area of functionality, over what are potentially file
> systems with quite different semantics: for example, you can imagine
> composing a caching layer over a combination of NFS and Coda; you could
> similarly imagine composing namespace extensions and a base file system to
> expand or restrict naming capabilities.  You suggest layering access
> control extensions over UFS, which is indeed possible (Jeff Weidner made a
> first pass at this).  However, I would argue that this may not be
> desirable; in fact, you appear to agree that extended access controls
> might be useful in the base UFS as you suggest using spare fields.  It's
> useful to note that if you're willing to go entirely to a new access
> control mechanism, you can reclaim at least 8 bytes, as you no longer need
> the file owner and group.  You can also grab at the permission bits, but
> that's probably not worth it. 

It may be worthwhile to do this, if you displace existing controls
into the ACLs.

As far as using spare fields, my suggestion is to have a "VOP_SPARE"
or similar mechanism for getting the size of spare fields, and then
manipulating them if they are sufficiently large.  This would allow
you to use your ACLs on any VFS type that had spare fields in the
inode, not just UFS.

One issue on layering: FFS is actually composed of FFS and UFS
VOP descriptors.

It would be very easy to define a new local media VFS type that
replaced the VOP descriptors which had to be modified in order
to support ACLs with the modified VOP descriptors, leaving FFS
more or less intact.

One issue that seems especially pertinent to me is that, unless
you have modelled the FFS dependency graph, and then written the
necessary metadata resolvers (like Kirk had to, in order to do
the work he did), then it seems to me that putting the code in
the base system will break soft updates.



> As such, my opinion is that ACLs are not something that should be
> introduced via a layer--I find especially concerning the limitations of
> layering in terms of persistent state consistency management between
> layers.  For example, in the event of a system failure (power loss, if you
> will), not only must you keep both layers individual consistent (say, two
> UFS file systems, one the real files, one backing the ACLs), but you must
> also keep the two consistent with one another.  In my mind, it is not
> acceptable to permit divergence of access control data and other file
> system meta-data.  Of course, this is already a problem under UFS, but
> only becomes more so without transactional support on a per-file system
> basis, and then transactional consistency mechanisms between layers
> (nested transactions, perhaps?)  At the very least, the layers must be
> resynchronized. 

This is one of the reasons that I argued that soft updates should
have been written with the dependency graph model intact, and
then registered inter-node resolvers (using exactly the same data
structures that are currently used, BTW) at the time that the
VFS was instanced.

This would allow externalization of the head and tail ends of the
graph, which would allow you to do soft updates between layers,
or even export a transactioning interface (BEGIN, COMMIT/ABORT)
to user space.  I will personally work on this once the soft
updates copyright changes to allow me to do a derivative work
without copyright infringement or intellectual property worries;
I discussed the issue at length with both Ganager and Patt after
their paper was first published.

Consistancy of a layer storing metadata in an underlying layer
is rather trivial, even without this, as the soloution is well
known to all: all metadata writes must be ordered, and committed
to stable storage before the next metadata write can proceed.

I think that you could do ACLs easily in a folding layer that
simply called VOP_SYNC() on the ACLs vnode.  This takes a bit
of a performance hit, but would be worthwhile.  Give a hook
into the soft updates metadata dependency resolution clock, it
becomes not hit at all, relative to the non-stacking implemetnation.
I admit, you could design the data structures wrong such that
you would not have deterministic recovery.  I think this is
easily fixed by simply adding a timestamp, and not removing the
previous value (by zeroing the timestamp) until the inode
metadata is flushed.  Recovery could be rolled forward or
backward depending the on the comparison of the newest time
stamp in the ACL file to the time on the inode to which the ACL
applies.


> I'm also concerned with the assignment of vnode backing objects between
> layers: is there a consistent and generalized way to map objects in one
> layer to objects in the layer below?  When no persistent state is
> involved, it's fairly straight forward to simply apply appropriate changes
> to the layers in parallel, and dynamically construct node objects in the
> top layer to map to active nodes in the bottom layer, via the name
> lookups, etc.  But when you have persistent state cross-session in both
> layers, it becomes more difficult.  This is the same problem that was
> discussed on -hackers recently: Coda and AFS do not provide "inode number"
> semantics for the value exposed via vop_getattr in the vattr field: how to
> identify two instances of the same object.  It does not make sense, for
> example, to tag the top layer object (an ACL, say) with the dev_t and
> inode_t of the bottom layer object, as those are fs-specific values with
> semantics that may differ from the extremely strong POSIX semantics, which
> require that over the life time of an object, that specific inode_t and
> dev_t, in combination, uniquely identify the object. 

The short answer is that there _is_ an object alias problem.

The longer answer is that the fix for this problem is obvious, and
that there is a workaround that uses read/write in the underlying
FS to provide VOP_{GET|PUT}PAGES() functionality, to make stacking
work in the current kernel, without the vmobject_t aliases problem
that you would otherwise suffer.

At worst, you could use the workaround, for which source code
already exists and can easily be grabbed and inserted.


> In the context of a distributed file system this makes little sense, and
> is hard for systems such as Coda and AFS (96 bit file ids, or fids) to
> emulate without excessive cost.

This argument is really non-applicable, because of the above, since
stacking works.

If the argument is association of data with metadata, namespace
folding is probably the correct approach (files are directories,
with file data in the file "data" in the directory named for the
file, and ACLs in the file "ACL" in the directory named for the
file).  This speaks to your object attachment argument.

In addition, you could expose the underlying VFS, and then use a
standard UNIX backup tool, such as tar, to back it up, _without_
needing to teach the tool about ACLs.  I think this represents
a significant advantage over an inode-embedded approach, which
requires you to get all new (non-interoperable) tools.


> Leaving aside these concerns about the feasibility of stacking as a
> mechanism for mapping attributes onto objects, I recognize that
> layering/stacking is extremely useful functionality, and understand that
> significant work is underway to remedy locking problems present in the
> current implementation, and look forward to this service being available. 

The locking issues stem mostly from the VFS layers consuming
kernel services, and the inability to handle this well in union
mounts or other many-to-one VFS stacks.  They don't really apply
to one-to-one VFS stacks, except NFS clients, and that's because
it must lock locally, and then proxy remotely and then coelesce
locally only when the proxy is successful.  This is tantamount
to an internal union of 2, and is a special case.


[ ... (incorrect) assumed approach ... ]

> Actually, this is not the technique I am using.  I followed the lead of
> the quota file system code, storing attributes in backed vnodes, indexed
> by inode on a per-fs basis.  This implementation method is subject to
> debate, but seems to work well as it stands.  As with Quotas, I provide
> vfs call, extattrctl, which allows an appropriately privileged user
> process to pass a vnode reference (via a file path, probably in the same
> fs as the attributes, but not necesarily) down to UFS for a source of
> backing data.  This requires one backing file per attribute per fs. 

I think this approach is ideally suited for stacking, much more so
than the one I assumed, which was the same one used in the ACLs in
the comp.sources.unix archives!


> Again, similar to the quota implementation supporting different named
> quotas, a chain of vnodes is maintained, each with a name identifying
> which attribute it corresponds to.  When an attribute is retrieved or
> submited, the inode number (acceptable to use since we're within the UFS
> layer where we know this information does uniquely identify the file) is
> mapped to a location in the file where attribute data may be found.

How do you deal with inherited ACLs when a subdirectory tree is
renamed from one directory to another?  Do you traverse them?

The problem with direct inheritance is that the on disk structure
for maintaining hard links would have to change, if you were to
have a functional "get parent inode" interface of some kind.


> If layering worked correctly, and supported an adequate mechanism for
> mapping ACLs onto objects consistently between layers, I would be willing
> to consider reimplementing ACLs as a layer.

With the exception of the object alias problem, it does.

Are you willing to use the workaround, at least until such time
that the FreeBSD VFS code is fixed?  I think that it would be
greatly beneficial to have a working stacking layer that actually
does something useful!

> Similarly, I suppose those
> would be possible for attributes.  But I object to the idea that extended
> attributes and ACLs are not specific to UFS: any individual implementation
> will be optimized and designed to solve particular needs, in a particular
> way, and based on the filestore properties.

I forsee that there will have to be common semantic guarantees,
across all file systems.  For this to work, I think the code has
to be the same code.

In particular, I can see that installing "priviledged images"
would be an ideal candidate for this.

Consider the case of programs which run today as root in order
to obtain reserved ports.  Now consider an ACL that allows an
unpriviledged application to obtain reserved ports.

How many security holes are caused by suid root programs that
are suid merely to obtain reserved ports?

This argues stringly for ACLs in the base system, but they must
be ACLs that apply to all VFS types, equally.  Especially ext2fs
mounted under FreeBSD to obtain a /compat/linux.


> In Solaris,
> different choices were made: they choose to use "shadow inodes"  and allow
> ACLs of (effectively) unlimited length. 

This is the approach I used in NWFS; Sun had access and license
to use that source code, actually.  I don't know if they did or
didn't, or if you are really describing VXFS.


> In
> NTFS, a generic named file fork service is provided, where the forks are
> complete address spaces with the full flat-file semantic.  This could be
> viewed as an extended attribute, although as I describe in my document my
> feeling is this is not appropriate to do :-).

This is a namespace fold, of exactly the type I described above.
There are at least two other ways to fold a namespace.

The useful part of this is that you can use the POSIX namespace
escape to directly access these files.  This would mean that you
would not need to add additional system calls or VOPs, so long as
your user space tools did the right thing (a two stage commit
using copy+add and two renames to modify/add/delete ACLs).  This
is a good thing.  It's also deterministically recoverable, with
roll-forward capability, following a crash in the middle.

The primary problem with this approach is that the FreeBSD namei()
uses mutual recursion instead oftrue recustrion for component
name resoloution, in order to avoid allocating additional namei bufs
for symbolic link resoloution.  As a result, it's not really possible
to inherit the POSIX namespace escape downward more than one level;
you either have to prefix each component, or you have to do everything
up top.

This is actually trivial to fix, however, even if no one has
been willing to commit a fix so far.  This would allow you to
access the files normally:

	./foo/fee
	foo/fee
	/foo/fee

and then directly access the ACLs for those files:

	//ACL/./foo/fee
	//ACL/foo/fee
	//ACL//foo/fee

Using POSIX namespace escapes.


> My personal feeling on the file fork issue is that we already have a
> mechanism to provide multiple complete address spaces under a path, each
> with a different name.  We even allow this to be done recursively.  It's
> called a directory tree, and the address spaces are known as files. :-) 

There is value in folding.  Administratively, it lets you apply a
policy control to access by removing the semantically controlled
address spaces from the user visible name space.

Admittedly, you can work around this, but it's rather butt-ugly
compared to a fold.  8-).


> As such, I disagree with the assertion that it is inappropriate to add new
> services to existing file stores: to fail to do so would be to fail to pay
> attention to changing requirements.  While layering is a useful
> extensibility mechanism, it has failed to deliver, and may continue to
> fail to deliver.

I think that this is not so true; layering has failed to deliver
in BSD, but recently, this has been addressed, if not totally
satisfactorily, at least functionally.


> By keeping ACL and Extended Attribute data within a
> single file system, we allow that file system to provide consistency
> guarantees not provided in the extisting layering model.  You can imagine
> transactional and logging file systems maintaining versioning information
> on the attributes themselves, as in fact I believe XFS can do.  While you
> could theoretically extend layering interfaces to provided nested
> transactional semantics, theres the important question of whether this is
> really feasible, and given the added complexity, entirely desirable. 

It's feasible, even without explicit transaction semantics.

I think that it's inarguably desirable, since so many of us FS
geeks desire it.  8-).


> Given a sufficently powerful layering mechanism, many things are possible. 
> If there were any easier way to provide ACL, MAC, and Capability support
> under UFS, the chances are I would have done it, but I didn't seem to find
> one. 

I'm sure that, given this forum, the author of the glue code that
works around the stacking problem in FreeBSD will step forward;
if not, ping me, and I can grovel through my mail and forward his
posting to you.


> Whether or not the UFS extended attribute code ends up being committed is
> a decision I don't think anyone is ready to make yet, given that I'm
> probably the only person who has seen them.  :-)  Either way, the services
> are required in a number of environments in the immediate future.  In the
> short and long term, having both the interfaces defined for these services
> will facilitate a lot of further development, including improved support
> for HPFS, and the ability to introduce features associated with the
> trusted extensions operating system extensions in POSIX.1e. 

I think that it's also inarguable that EA/ACL facilities _must_
be provided in the near term future; my only objection was in
tying them to a specific VFS.  I fully support commiting a layer
that can do the work.


In any case, I did not intend to denigrate your good work, only
to ensure that it can be applied universally instead of in a tiny
corner, and that it didn't preclude future research along other
lines (as it might have, had it permanently stolen the spare fields
like the nanosecond timestamp change did).

Keep up the good work!


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912170204.TAA08405>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation