From owner-freebsd-arch@FreeBSD.ORG Wed Sep 13 14:29:19 2006 Return-Path: X-Original-To: arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 15B7F16A407; Wed, 13 Sep 2006 14:29:19 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id D7B1F43D5E; Wed, 13 Sep 2006 14:29:14 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 12BA846CBD; Wed, 13 Sep 2006 10:29:14 -0400 (EDT) Date: Wed, 13 Sep 2006 15:29:14 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: <20060913150912.J1823@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: trustedbsd-discuss@TrustedBSD.org Subject: New in-kernel privilege API: priv(9) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2006 14:29:19 -0000 Dear all, Over the past few weeks, I've been working on a replacement for the suser(9) API, used to check whether a thread or credential has the privilege to override discretionary access control or perform system configuration operations in the kernel. Currently, these checks use one of two kernel APIs: suser(thread) or: suser_cred(cred, flags) The former is the more common invocation, but the latter is also often used; this is largely because jail(4) requires limits of superuser privilege, so instances of privilege allowed in jail are explicitly marked via the flags field. There are also circumstances in which only a credential is available, perhaps cached from another context, and a very small number of instances (2) where a second flag, forcing use of the ruid instead of the euid, is used. The above API has served FreeBSD well for many years. However, it suffers from a number of architectural and functionality inadequacies. The goal of my work has been to address a particular functional lack: granularity. In particular, there are a number of things that finer granularity in the API would allow us to do: - Make it easier to explore the finer-grained granting of privilege via policy, such as assigning specific useful privileges -- the ability to bind a port, configure a SLIP interface, adjust the time, be exempt from audit requirements, be allowed to attach to a jail, override certain file permissions, set quotas, configure IP addresses, etc, which are cleanly separable (not to mention usefully assignable) privileges. - Make it easier to explore the finer-grained denial of privilege. For example, jail is in large part based on a marking of different privilege checking points as being "allowed in jail" or "not allowed in jail". In some ways this is advantageous: the implementer of each suser check gets to decide whether it's in jail, and that information is available in the context of the check. However, this has several important disadvantages. Not least is that the implementation of jail is highly distributed rather than centralized, making auditing the implementation difficult. Another disadvantage is that configuration options that vary the behavior of jail are also distributed throughout the kernel rather than centralized, as they must vary whether the SUSER_ALLOWJAIL flag is being passed into suser. It would be nice to be able to quickly and easily answer the question "what privileges are granted in jail", and to easily vary the list, which is not possible currently. - Make it easier to identify, categorize, and audit the use of privilege throughout the kernel by actually having a list of the privileges and what they correspond to, as well as making it easier to identify all the places a specific privilege is used. This facilitates auditing of kernel privilege use, and easy comparison of the use of identical privileges in different subsystems. For example, while doing this work, I identified inconsistencies in the application of superuser privilege in different file systems, privileges that were sometimes allowed in jail, but sometimes not, etc. 200 anonymous suser checks are hard to analyze, 160 named privilege checks are much easier to analyze. - Make it easier to modify the audit mechanism to capture a log of exactly what privileges are exercised during operation, a requirement for higher assurance evaluation. What does this all mean in practice? It means replacing suser(9) and suser_cred(9) with calls that express the specific privilege being checked for. I took the most straight forward possible implementation: I reviewed all privilege checks in the kernel, identified all identical privileges and categorized all privileges by subsystem. I then assigned unique numeric constants to each unique privilege, and added a privilege identifier argument to the two new functions, priv_check(9) and priv_check_cred(9). Here are a few sample snippet from the privilege list in src/sys/priv.h: ... PRIV_ACCT, /* Manage process accounting. */ PRIV_MAXFILES, /* Exceed system open files limit. */ PRIV_MAXPROC, /* Exceed system processes limit. */ PRIV_KTRACE, /* Set/accept KTRFAC_ROOT on ktrace. */ PRIV_SETDUMPER, /* Configure dump device (XXX: needs work). */ PRIV_NFSD, /* Can become NFS daemon. */ PRIV_REBOOT, /* Can reboot system. */ PRIV_SWAPON, /* Can swapon(). */ PRIV_SWAPOFF, /* Can swapoff(). */ ... PRIV_PMC_MANAGE, /* Can administer PMC. */ PRIV_PMC_SYSTEM, /* Can allocate a system-wide PMC. */ PRIV_SCHED_DIFFCRED, /* Exempt scheduling other users. */ PRIV_SCHED_SETPRIORITY, /* Can set lower nice value for proc. */ PRIV_SCHED_RTPRIO, /* Can set real time scheduling. */ PRIV_SCHED_SETPOLICY, /* Can set scheduler policy. */ PRIV_SCHED_SET, /* Can set thread scheduler. */ PRIV_SCHED_SETPARAM, /* Can set thread scheduler params. */ ... PRIV_UFS_SETQUOTA, /* setquota(). */ PRIV_UFS_SETUSE, /* setuse(). */ PRIV_UFS_EXCEEDQUOTA, /* Exempt from quota restrictions. */ PRIV_VFS_READ, /* Override vnode DAC read perm. */ PRIV_VFS_WRITE, /* Override vnode DAC write perm. */ PRIV_VFS_ADMIN, /* Override vnode DAC admin perm. */ PRIV_VFS_EXEC, /* Override vnode DAC exec perm. */ PRIV_VFS_LOOKUP, /* Override vnode DAC lookup perm. */ PRIV_VFS_BLOCKRESERVE, /* Can use free block reserve. */ ... As you can see, they break down into both a set of system management privileges, relating to configuring kernel services, and then a set of specific privileges associated with (and sorted by) major kernel subsystems. None of this implies a change in underlying policy -- just that a bit more contextual information is passed into the privilege check. This has some important specific functional benefits: - It makes it possible to migrate the "allowed in jail" decision from the calling context to the privilege management code. This will allow us to gradually eliminate the passing of flags to the privilege check code under almost all circumstances. In my patch, I have added a new function to kern_jail.c, prison_priv_check(), which essentially contains a switch statement listing the privileges allowed in jail, and denying the rest. Configurable privileges, raw socket access, etc, can now occur in one place, and open the door to introducing more easy per-jail configuration of privilege. After these changes, the implementation is much more centralized in kern_jail.c. - It makes it possible for the MAC Framework to restrict access to privilege, a feature required for the SEBSD policy module, which implements the FLASK/Type Enforcement policy environment as found in SELinux. Policy modules can register interest in privilege checks, and then specifically deny access to privileges as they see fit. - It makes it possible for the MAC Framework to allow policies to grant privilege. Policy modules can register interest in privilege checks, and then specifically grant access to privileges as they see fit. In order to demonstrate MAC Framework integration with the privilege system, I have implemented a sample policy module, mac_privs, which allows rule-based granting of privileges to specific uids. Using a command line tool, appropriately privileged processes can modify the rule list, granting named privileges to unprivileged users. This is not a particularly mature example of a privilege-granting policy, as ideally privilege is something that is available but not always exercised -- i.e., similar to a setuid root binary that switches the effective uid to root only when it specifically needs privilege. However, it's quite useful in practice, and demonstrates how configurable policies can interact with kernel privilege decisions. In the past, I've done similar work on two occasions: once in implementing POSIX.1e privileges for FreeBSD as part of the TrustedBSD Project (not merged), and once as part of the SEBSD implementation. This work is functionally similar, but there are several important ways in which this design differs from the POSIX.1e approach (also used in Linux): - The identification of privileges is quite fine-grained. The Linux-extended POSIX.1e privilege set contains high level privileges like "Network privilege", which encapsulates a broad range of different network privilege checks. I have identified over 50 different specific network privileges, each separately named. It would be easy to map these into the POSIX.1e privilege set, which is presumably what the SEBSD policy will need to do in order to produce the narrower set expected by the SELinux code. - The approach is intended to allow the granting as well as denying of privilege. This is an important design choice, and has both some costs and some benefits. One important benefit is that it has historically proven difficult to take rights away from the root user without introducing security vulnerabilities associated with applications written to use root privilege expecting that all privileges be in place. Granting specific privileges implies a fairly different application and policy construction and may well be safer. - Because of the fine-grained naming of privileges, it's possible to encapsulate jail in a way that was not previously possible: the POSIX.1e privilege set was simply too coarse to capture the requirements of jail. - Privileges under this model are not treated as maskable values. In practice, there are very few situations in which it is useful to check multiple privileges at once, and permitting that encourages authors adding new privilege checks to combine privileges in a way that makes it opaque to the privilege mechanism as to which privilege was actually needed. This also has the benefit of making it much easier/more efficient to add new privileges as required, as it doesn't require expanding a bit string representing the privileges. Most POSIX.1e implementations limit the total number of privileges to 32 to 64 in order to have them fit in a bitmask easily. - By assigning new privileges for every privilege with significantly different semantic, the question of "when to add a new privilege" is answered: unless there is an obvious match, you add one. With the POSIX.1e + Linux set, it is necessary to try to figure out how to fit a new check into one of many poorly matching privileges. The result was that almost all privileges not clearly matched to one of the POSIX.1e set ended up in the catch-all CAP_SYS_ADMIN. The status of this work is that a pretty functional prototype can be found in Perforce: //depot/projects/trustedbsd/priv/... A snapshot patch from the branch, excluding mac_privs, can be found here: http://www.watson.org/~robert/freebsd/20060913-trustedbsd-priv.diff In that tree, you'll want particularly to look at: sys/kern/kern_jail.c Revised jail privilege behavior sys/kern/kern_priv.c Privilege check implementation sys/security/mac/mac_priv.c MAC extensions for privileges sys/security/mac_privs/* Sample MAC policy granting privileges sys/sys/priv.h Privilege list, API share/man/man9/priv.9 Draft man page usr.sbin/mac_privs/* Management tool for sample MAC policy It is my intent, following review, discussion, cleanup, etc, to commit the priv(9) work, sans mac_privs, to the 7.x tree in the next couple of weeks. The mac_privs policy is a sample policy that will continue to be maintained as part of the TrustedBSD Project, but not merged into the base tree at this point. Some remaining TODO items are: - Review various XXX comments I added as part of this work. - Complete modification of System V IPC code to properly check privileges. - Update mac_none.c sample policy to include privilege stubs. - Possibly move securelevel support to kern_priv.c, since it largely relates to privilege. - Teach the audit subsystem to collect privilege information during a system call, and add it to audit records using privilege tokens (already present in Solaris). - Complete man page updates, including finalize priv.9, trim down suser.9. - Create further privilege-related regression tests. - Finalize decision on using an enum or an int to identify privileges. Using an enum requires more namespace pollution, and requires hard-coded values anyway in order to avoid ABI issues. Possibly using #defines would be simpler. I'd like to greatfully acknowledge the sponsorship of nCircle Network Security, Inc in performing this work. Robert N M Watson Computer Laboratory University of Cambridge