Date: Thu, 16 Dec 1999 00:10:25 -0500 (EST) From: Robert Watson <robert@cyrus.watson.org> To: Terry Lambert <tlambert@primenet.com> Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Request for objections: extended attribute and ACL interfaces Message-ID: <Pine.BSF.3.96.991216000820.24846B-100000@fledge.watson.org> In-Reply-To: <199912160023.RAA26871@usr09.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Terry, First let me start by saying that I greatly appreciate your taking the time to send your comments--I value your feedback. Please see my comments below. > I personally have no objection to these interfaces. They seem to > cover the problem space that you say that they cover, and they are > at worst, harmless. The only functionality I'm not sure how to handle is identifying the attributes available on the file for the presence of duplication--where they back another file system visible object (such as ACLs, MAC labels, etc) then those interfaces should be used to back up the attributes. It isn't clear to me that attributes should be backed up independent of semantics available to interpret them, as they will most often be used for file system services where restoring them and associating them with the same file may not make sense, nor even be possible given the rights and functionality associated with what they back. I am comfortable not defining that behavior just now, and waiting to see what applications require it. I can also conceive of write-only attributes, etc. > > The UFS EA supporting code would not be committed, as it is > > experimental and not well tested. Neither will the UFS ACLs over > > EAs support. Both of these will be made available soon, however, > > and will rely on having vnops/vfsops/syscalls assigned and > > available. :-) > > I object to these patches ever being committed. They are not truly > UFS specific, and should be placed in a stacking layer so that they > can be applied to any FS via normal stacking semantics. Whether or not these changes are committed is a matter, of course, for general debate and I welcome any constructive discussion of both the general architecture of the changes, and my specific implementation. Onne of the goals of adding the interfaces first was to provide a development framework for those needing to rely on the functionality, and to make them available so that consumers of these interfaces could determine if the interfaces and semantics met their needs. First, I'd like to address the issue of whether or not extended attributes and ACLs should be made available as a stacked layer service, or whether they should be in the base file system. As you read in my detailed documentation of these services, there are a number of existing implementations that we can reference in looking for advantages and disadvantages to various approaches. The previous implementations that I have in mind are the following: - Trusted IRIX XFS: extended attributes for supporting extended security labels, including ACLs, MAC labels, capabilities. - IBM HPFS: general purpose named, extended attributes, used to store program information, access control attributes, icons, etc. - Transarc AFS: per-directory ACLs - CMU Coda: per-directory ACLs - Solaris UFS: per-file ACLs - Linux ACL extensions: per-file ACLs - Microsoft NTFS: named file forks - UDF: named file forks - Apple HFS+: named file forks - Jeff Weidner's EA + ACL layers One of the first things that becomes clear, as you allude to below, is that in most cases, access control information for file systems is considered an integral part of the file system meta-data. The only exception in the list above, other than those that don't store access control information at all, is HPFS, which uses extended attributes for this purpose. ACLs in most of these have different properties: they refer to different principal namespaces (UFS: local uids and gids; AFS: centrally administered viceids mapped to Kerberos principals, ...) Similarly they have different semantics: AFS applies ACLs to directories only, and not files. The set of permissions is far more detailed than UFS, and does not rely on an "owner". On the other hand, Solaris and Linux ACLs are mapped one per file, and two in the case of directories, and attempt to closely follow the existing POSIX permission set, and are strictly a superset of the base POSIX uid/gid + permissions concept. As I describe in my documents, and implement, the only common feature seems to be the syntax of ACLs, which is generally consistent in assigning a mask of rights to a numerically identified principal. Given that the goal of VFS is to represent that which is common between file systems, it makes sense to expose the syntax, and not the semantics, of ACLs. Access control choices are made in the file system itself, or as you point out, in a layer. Layering allows common semantics to be applied in a specific area of functionality, over what are potentially file systems with quite different semantics: for example, you can imagine composing a caching layer over a combination of NFS and Coda; you could similarly imagine composing namespace extensions and a base file system to expand or restrict naming capabilities. You suggest layering access control extensions over UFS, which is indeed possible (Jeff Weidner made a first pass at this). However, I would argue that this may not be desirable; in fact, you appear to agree that extended access controls might be useful in the base UFS as you suggest using spare fields. It's useful to note that if you're willing to go entirely to a new access control mechanism, you can reclaim at least 8 bytes, as you no longer need the file owner and group. You can also grab at the permission bits, but that's probably not worth it. As such, my opinion is that ACLs are not something that should be introduced via a layer--I find especially concerning the limitations of layering in terms of persistent state consistency management between layers. For example, in the event of a system failure (power loss, if you will), not only must you keep both layers individual consistent (say, two UFS file systems, one the real files, one backing the ACLs), but you must also keep the two consistent with one another. In my mind, it is not acceptable to permit divergence of access control data and other file system meta-data. Of course, this is already a problem under UFS, but only becomes more so without transactional support on a per-file system basis, and then transactional consistency mechanisms between layers (nested transactions, perhaps?) At the very least, the layers must be resynchronized. I'm also concerned with the assignment of vnode backing objects between layers: is there a consistent and generalized way to map objects in one layer to objects in the layer below? When no persistent state is involved, it's fairly straight forward to simply apply appropriate changes to the layers in parallel, and dynamically construct node objects in the top layer to map to active nodes in the bottom layer, via the name lookups, etc. But when you have persistent state cross-session in both layers, it becomes more difficult. This is the same problem that was discussed on -hackers recently: Coda and AFS do not provide "inode number" semantics for the value exposed via vop_getattr in the vattr field: how to identify two instances of the same object. It does not make sense, for example, to tag the top layer object (an ACL, say) with the dev_t and inode_t of the bottom layer object, as those are fs-specific values with semantics that may differ from the extremely strong POSIX semantics, which require that over the life time of an object, that specific inode_t and dev_t, in combination, uniquely identify the object. In the context of a distributed file system this makes little sense, and is hard for systems such as Coda and AFS (96 bit file ids, or fids) to emulate without excessive cost. They use a hash to hack around this requirement, but in operating systems that uniquely identify objects in kernel by unique dev_t and inode_t (Linux, for example) you see all hell break loose when a collision occurs. As BSD uses vnode pointers to verify object uniqueness in kernel (leaving aside aliasing, another issue), this is not a problem. Some userland applications, such as tar, attempt to use the unique inode_t to check for the same file, but this is known not to work in AFS, etc, and should not work. A samefile() call has been suggested, comparing two fd's, but is not implemented. But just as the userland application should not determine based on these effectively non-unique numbers the mapping between fs objects, neither should a stacked layer. I assume this issue has been addressed, possibly by assume parallel naming structures (i.e., apply all name operations to the top layer as to the bottom?), but do not know how this was done. I'd imagine this would make it easy to create layers that just maintain general fs data in the root of the file system, but hard to create layers that attach object-specific data, such as attributes, to objects in lower layers. Especially in the context of a service such as hard links, or when an ACLfs is layered over a namespace built of different file system services (say, AFS and UFS), or a transient service, such as procfs. Some of this can be solved by upcalls and better synchronization, but in many cases such synchronization is not possible or feasible. Leaving aside these concerns about the feasibility of stacking as a mechanism for mapping attributes onto objects, I recognize that layering/stacking is extremely useful functionality, and understand that significant work is underway to remedy locking problems present in the current implementation, and look forward to this service being available. > I assume that you are using reserved fields in order to do this? > If so, this may justify a genberic interface to get at reserved > inode data areas from an overlying layer. However, I would > prefer that some other method be used, and that the existing > reserved areas be partially used to move the nanosecond time > on the modification date off, and fix the Y2038 problem the > way that the authors of FFS intended. If there are fields left > over after that, then they can be used for specific applications > (like ACLs). Actually, this is not the technique I am using. I followed the lead of the quota file system code, storing attributes in backed vnodes, indexed by inode on a per-fs basis. This implementation method is subject to debate, but seems to work well as it stands. As with Quotas, I provide vfs call, extattrctl, which allows an appropriately privileged user process to pass a vnode reference (via a file path, probably in the same fs as the attributes, but not necesarily) down to UFS for a source of backing data. This requires one backing file per attribute per fs. Again, similar to the quota implementation supporting different named quotas, a chain of vnodes is maintained, each with a name identifying which attribute it corresponds to. When an attribute is retrieved or submited, the inode number (acceptable to use since we're within the UFS layer where we know this information does uniquely identify the file) is mapped to a location in the file where attribute data may be found. This file is usually a sparse file, as attributes are not always applied to all files; in fact, given that POSIX.1e ACLs are a superset of the permission behavior, it is expected that many files will not have extended ACLs. The attribute file stores various other pieces of meta-data about the attributes, including whether they are defined for a particular inode (as with environment variables, it's possible to have an attribute be undefined, or defined but with no contents, and they should be distinguishable). These implementation choices impose a number of semantic differences from other extended attribute implementations. For example, unlike HPFS, there is a fixed maximum size per-attribute, but this maximum is independent of other attribute storage on a particular file. In HPFS, the total size of the attributes is not permitted to exceed 64k. Also, attributes must be initialized and configured by an appropriately privileged entity, most likely root in the default BSD security environment, and at that initialization time access rights are set, maximum sizes, etc. Arbitrary attributes may not be defined, in this environment. These differences in semantics suggest, as with other file system behavior, that the generalized properties be exposed in the VFS via vop_ calls, but that individual file systems can behave differently. In this manner, attributes are like all other file system behaviors: optimized for a particular environment, and the needs of the designer. If layering worked correctly, and supported an adequate mechanism for mapping ACLs onto objects consistently between layers, I would be willing to consider reimplementing ACLs as a layer. Similarly, I suppose those would be possible for attributes. But I object to the idea that extended attributes and ACLs are not specific to UFS: any individual implementation will be optimized and designed to solve particular needs, in a particular way, and based on the filestore properties. In Linux, for example, it was observed that ACLs are often the same across a number of files: as a result, they allocate ACL blocks in the file store, and refcount them, permitting files with the same ACLs to point to the same ACL blocks. They do this probabalistically, of course, as they cannot keep all ACLs in memory, but it's a useful,file-system specific optimization. In Solaris, different choices were made: they choose to use "shadow inodes" and allow ACLs of (effectively) unlimited length. HPFS, because there is available space in the Fnode, the designers initially store extended attributes there, and then overflow into a block run when additional space is required. The designers of HPFS anticipated that the base attributes would need fast access, but that additional attributes were desirable given their intent to move towards an object-oriented file system environment. This is also perfectly acceptable as a local file system optimization that is HPFS-specific. In NTFS, a generic named file fork service is provided, where the forks are complete address spaces with the full flat-file semantic. This could be viewed as an extended attribute, although as I describe in my document my feeling is this is not appropriate to do :-). The Apple file system supports only specific forks: these are legitimate local design choices, that while they could be provided as file system layers, made sense not to. My personal feeling on the file fork issue is that we already have a mechanism to provide multiple complete address spaces under a path, each with a different name. We even allow this to be done recursively. It's called a directory tree, and the address spaces are known as files. :-) As such, I disagree with the assertion that it is inappropriate to add new services to existing file stores: to fail to do so would be to fail to pay attention to changing requirements. While layering is a useful extensibility mechanism, it has failed to deliver, and may continue to fail to deliver. By keeping ACL and Extended Attribute data within a single file system, we allow that file system to provide consistency guarantees not provided in the extisting layering model. You can imagine transactional and logging file systems maintaining versioning information on the attributes themselves, as in fact I believe XFS can do. While you could theoretically extend layering interfaces to provided nested transactional semantics, theres the important question of whether this is really feasible, and given the added complexity, entirely desirable. Given a sufficently powerful layering mechanism, many things are possible. If there were any easier way to provide ACL, MAC, and Capability support under UFS, the chances are I would have done it, but I didn't seem to find one. Whether or not the UFS extended attribute code ends up being committed is a decision I don't think anyone is ready to make yet, given that I'm probably the only person who has seen them. :-) Either way, the services are required in a number of environments in the immediate future. In the short and long term, having both the interfaces defined for these services will facilitate a lot of further development, including improved support for HPFS, and the ability to introduce features associated with the trusted extensions operating system extensions in POSIX.1e. Robert Watson Research Scientist, TIS Labs at Network Associates To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.991216000820.24846B-100000>