Date: Fri, 25 Jul 2003 09:48:46 -0700 (PDT) From: Robert Watson <rwatson@FreeBSD.org> To: Perforce Change Reviews <perforce@freebsd.org> Subject: PERFORCE change 34992 for review Message-ID: <200307251648.h6PGmkRl043340@repoman.freebsd.org>
next in thread | raw e-mail | index | archive | help
http://perforce.freebsd.org/chv.cgi?CH=34992 Change 34992 by rwatson@rwatson_paprika on 2003/07/25 09:47:47 First 45 pages of the secarch document; has been following me around on my notebook for a while, and it would be a good idea to get it in P4 so that when I drop my notebook, it's not lost. The kernel section is doing quite well (VFS and networking need work); userspace needs more fleshing out generally, especially relating to PAM, NSS, and crypto services. Affected files ... .. //depot/projects/trustedbsd/doc/en_US.ISO8859-1/books/developers-handbook/secarch/chapter.sgml#2 edit Differences ... ==== //depot/projects/trustedbsd/doc/en_US.ISO8859-1/books/developers-handbook/secarch/chapter.sgml#2 (text+ko) ==== @@ -1,0 +1,2808 @@ +<!-- + Copyright (c) 2002, 2003 Networks Associates Technology, Inc. + All rights reserved. + + This software was developed for the FreeBSD Project by Network + Associates Laboratories, the Security Research Division of Network + Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), + as part of the DARPA CHATS research program. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + 1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + SUCH DAMAGE. + + $FreeBSD$ +--> + +<chapter id="secarch"> + <chapterinfo> + <authorgroup> + <author> + <firstname>Robert</firstname> + <surname>Watson</surname> + + <affiliation> + <orgname>TrustedBSD Project, Network Associates + Laboratories</orgname> + <address><email>rwatson@FreeBSD.org</email></address> + </affiliation> + </author> + </authorgroup> + </chapterinfo> + + <title>FreeBSD Security Architecture</title> + + <sect1 id="secarch-copyright"> + <title>FreeBSD Security Architecture Copyright</title> + + <para>This software was developed for the FreeBSD Project by Network + Associates Laboratories, the Security Research Division of Network + Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 + ("CBOSS"), as part of the DARPA CHATS research program.</para> + + <para>Redistribution and use in source (SGML DocBook) and + 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) + with or without modification, are permitted provided that the + following conditions are met:</para> + + <orderedlist> + <listitem> + <para>Redistributions of source code (SGML DocBook) must + retain the above copyright notice, this list of conditions + and the following disclaimer as the first lines of this file + unmodified.</para> + </listitem> + + <listitem> + <para>Redistributions in compiled form (transformed to other + DTDs, converted to PDF, PostScript, RTF and other formats) + must reproduce the above copyright notice, this list of + conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution.</para> + </listitem> + </orderedlist> + + <important> + <para>THIS DOCUMENTATION IS PROVIDED BY THE NETWORKS ASSOCIATES + TECHNOLOGY, INC "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, + INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF + MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + DISCLAIMED. IN NO EVENT SHALL NETWORKS ASSOCIATES TECHNOLOGY, + INC BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS + OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN + IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</para> + </important> + </sect1> + + <sect1 id="secarch-synopsis"> + <title>Synopsis</title> + + <para>The FreeBSD operating system contains a variety of security + elements intended to support secure and reliable system operation. + These elements include:</para> + + <itemizedlist> + <listitem><para>Segmented address space to protect kernel + operation from accidental or malicious user process + interference.</para></listitem> + <listitem><para>Inter-process memory protections to limit the + impact of buggy or malicious user applications on other + running applications.</para></listitem> + <listitem><para>Association of user credentials with processes, + including user identifiers, group list membership, jail virtual + system, and mandatory access control label, supporting a + multi-user environment.</para></listitem> + <listitem><para>Privilege model based on a privileged root user + (uid 0)</para></listitem> + <listitem><para>Inter-process controls to prevent improper + interference between processes belonging to different users, + as well as to protect processes that undergo privilege level + changes.</para></listitem> + <listitem><para>Discretionary file system protections based on + user/group ownership, file permission mask, and file flags. + Optionally, extended discretionary access control list support + on the UFS and UFS2 file systems. + Special file modes to support uid and gid transition on + execution.</para></listitem> + <listitem><para>Mapping of network credentials received via + NFS Remote Procedure Calls (RPCs) to local credentials. + Administrative limits, by network address, to NFS services. + </para></listitem> + <listitem><para>Discretionary protections on System V IPC + primitives (shared memory, message queues, and semaphores) + based on ownership and permissions.</para></listitem> + <listitem><para>Extensible kernel and user access control + through a pluggable MAC Framework, permitting kernel modules + to bind additional security label data to processes and + system objects, and to enforce discretionary or mandatory + policies, including Biba and LOMAC integrity, MLS + confidentiality, and other augmented system security + policies.</para></listitem> + <listitem><para>Pluggable Authentication Module (PAM) support + permitting administrators to require appropriate authentication + in a multi-user environment. + Modules support traditional passwords, one-time passwords, + distributed passwords and authentication services such as + KerberosIV and Kerberos5, and support a variety of hardware + authentication tokens. + In addition, modules authorize login access to the system, + provide for accounting, implement password changing, and + enforce password change policies such as password length + requirements.</para></listitem> + <listitem><para>Name Service Switch (NSS) support permitting + a variety of local and distributed directory services to + provide account and authentication data.</para></listitem> + <listitem><para>A variety of remote access and management + tools with cryptographic protection of network traffic. + </para></listitem> + </itemizedlist> + + <note><para>This revision of the FreeBSD Security Architecture + describes the architecture as found in FreeBSD 5.1, and may + not accurately describe other versions of the FreeBSD + operating system.</para></note> + </sect1> + + <sect1 id="secarch-approach"> + <title>Approach</title> + + <para>As a general-purpose, multi-user operating system, FreeBSD + includes a number of security elements intended to form the + foundation for a secure application environment. + This includes basic system integrity, confidentiality, and + availability services. + This approach is intended to resist attack in a variety of forms, + and against a variety of attack methodologies. + In the next section, basic security concepts and assumptions are + discussed, including the goals of integrity, confidentiality, and + availability as addressed by FreeBSD.</para> + + <para>In general, FreeBSD adopts the same stance as most other + operating systems based on the UNIX model: the kernel is isolated + from user processes, which represent a variety of programs in + execution in isolated address spaces. + Processes each carry a process credential, managed by the kernel, + describing user and group information for the process, which will + be used to authorized access to other kernel objects. + Based on the credential and various object properties, several + mandatory and discretionary protection models control the + interactions between processes, and access by the processes to + various system resources (including storage, network + communications, etc.)</para> + + <para>As installed, FreeBSD supports easy communication and + collaboration between users, while providing the primitives to + prevent inappropriate release or modification of data owned by + users. + Some users are granted special system administion privileges by + virtue of being members of specific groups (such as the "operator" + and "wheel" groups); in addition, a special administrative account, + "root", is used to manage the system. + However, the security primitives and configuration are frequently + adapted to support much stronger or fine-grained security + deployment requirements, including containment of mutually + untrusting processes.</para> + + <para>FreeBSD also contains a number of extensions permitting + greater flexibility and control, including a system + partitioning model widely used by ISPs (jail), support + for Mandatory Access Control, and extensible access control + policies through the MAC Framework. + These mechanisms permit administrators to control the flow of + information in systems in a variety of ways, including using the + MLS mandatory sensitivity policy, and the Biba integrity policy. + These capabilities are similar to those found in many commercial + trusted operating systems, and permit FreeBSD to be used + in environments less reminiscent of the time-sharing + systems from which the UNIX access control requirements are + derived. + Making use of these primitives permits the administrator to reduce + their level of trust in user accounts on the system, limiting the + consequences of compromise of individual user accounts or + services.</para> + + <para>In recognition of the importance of networks and network + infrastructure, FreeBSD provides a variety of remote login + services, as well as advanced cryptographic protocols used to + protect the integrity of these services.</para> + </sect1> + + <sect1 id="secarch-concepts"> + <title>Security Architecture Concepts</title> + + <para>FreeBSD is a multi-tasking multi-user operating system, serving + in a variety of environments with a variety of security requirements. + Common deployment environments include single-user or multi-user + workstations, large-scale ISP environments, web or file server + clusters, and high-end embedded network appliances including + network-attached storage, routers, and firewalls. + The FreeBSD operating system combines many of the strongest elements + of traditional UNIX security, modern cryptographic services, and + trusted operating system elements to support the requirements of + these environments through flexibility and adaptability.</para> + + <variablelist> + <varlistentry> + <term>Authorization</term> + <listitem> + <para>Authorization refers to the process by which access control + decisions are made--typically, authorization may be performed + on the basis of an authenticated user identity, presentation + of a cryptographic token, an inherited or acquired capability, + explicit access control lists, or a variety of other policy + driven considerations. + Authorization checks occur in both the kernel and userspace + components of FreeBSD.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Authentication</term> + <listitem> + <para>Authentication refers to the process by which a system + (in this case, operating system) determines and confirms + the identity of a user (or another system) it interacts + with. + Frequently in the context of FreeBSD, this refers to early + stages in the login process, in which a user presents a + username and password; however, it may also refer to + inter-host authentication using host SSH keys, IKE key + negotiation for IPsec, and a variety of other elements. + Authentication typically relies on testing the knowledge of + a third party in relation to secrets that only an + appropriate third party could know: for example, testing a + shared secret (such as a password), one time passwords, or + through use of a Public Key Infrastructure. + Typically, system authorization occurs in the context of + an authenticated user identity; however, authorization + decisions may be made prior to user authentication, or in + the case of network activity, without access to any + authenticated identity.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Availability</term> + <listitem> + <para>Availability refers to the design requirement that + services offered by a system be, in as much as is possible, + uninterrupted despite unexpected or undesired circumstances. + In the context of FreeBSD, this concept frequently drives + the requirement for resource accounting, resource limits to + prevent unfair exhaustion of system resources, scheduler + behavior, access controls, and authentication. + Availability is expressed with regard to a subject: frequently, + to maintain availability for one user, it is necessary to + reduce or deny access to services for another user. + Availability is frequently considered in the context of a + malicious user attempting to deny service to other users. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Integrity</term> + <listitem> + <para>Integrity refers to the protection of system operation + and stored data from undesired modification by unauthorized + parties; the integrity of the operating system is required + to ensure proper operation. + The integrity of user data is then protected by the operating + system by means of authentication and authorization. + Integrity guarantees are often important to the notion of + system availability, as interference with system integrity + frequently has an impact on operation. + Cryptographic tools may also be used to measure the integrity + of the system.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Confidentiality</term> + <listitem> + <para>Confidentiality refers to the protection of system and + stored data from undesired leakage to unauthorized + parties: confidentiality of system authentication data + is required to ensure successful authentication, protection + of entropy, etc. + Confidentiality of system and user data is protected by the + operating system by means of authentication and + authorization. + Cryptographic tools may also be used to maintain the secrecy + of data.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term>Cryptography</term> + <listitem> + <para>Cryptography refers to the use of mathematical + algorithms and techniques used to provide guarantees for + the protection of data communications and processes. + Typically, cryptographic techniques are used in three + forms in FreeBSD: the protection of authentication data, + protection of data on storage arrays, and for the purposes + of secure network communications; cryptographic services + are also provided for the benefit of applications. + Guarantees of cryptographic algorithms and protocols often + include integrity, confidentiality, non-repudiation, and + freshness.</para> + </listitem> + </varlistentry> + </variablelist> + + </sect1> + + <sect1 id="secarch-kernel"> + <title>Kernel Security Model</title> + + <para></para> + + <sect2 id="secarch-kernel-addressspace"> + <title>Kernel Address Space Separation</title> + + <para>FreeBSD, as is the case with most UNIX-derived operating + systems, executes the kernel in a supervisor hardware mode + that prevents direct process access to kernel memory and + hardware resources by user processes, providing integrity and + confidentiality protections for kernel data structures and code. + On all current hardware architectures, this is accomplished by + reserving a segment of the system address space for read/write + access only by appropriately authorized task descriptors; access + to privileged instructions, such as those used to configure + page tables, flush TLBs, and configure new tasks, is limited + to code executing kernel mode in the common case. + On the i386 platform, this is implemented using rings -- the + kernel operates in ring 0, and user processes operate in ring 3. + Processes are forced to communicate with the kernel through a + variety of more explicit traps, including exceptions generated by + arithmetic traps in the instruction stream, exceptional memory + accesses such as page faults, or system calls via call gates. + Kernel code interacting with user processes is written carefully + so as to provide only support only the desired interactions + between the kernel and user processes.</para> + + <para>Within the kernel, direct manipulation of user memory contents + is generally avoided, and instead abstracted through a series of + copy routines that enforce appropriate protections, preventing + (among other things) the kernel from derefencing user-provided + pointers to kernel address space. + In general, direct access to system hardware devices is prohibited + to user processes--they must make use of control kernel service + APIs to access disk storage, I/O busses, etc. + This rule is circumvented under special circumstances, such as the + creation of user process device drivers (most frequently, for the + X11 window system). + To bypass this protection, privilege is required, or must be + delegated.</para> + </sect2> + + <sect2 id="secarch-kernel-bypass"> + <title>Direct Hardware and Kernel Memory Access</title> + + <para>Some bypass mechanisms are provided to permit privileged user + processes to monitor the kernel or interact with the hardware + through controlled abstractions, or in execptional cases, + to interact with the hardware in an unabstracted form.</para> + + <para>Most hardware devices are exposed to user applications + via the device file system (devfs), which presents these devices + as file-like objects. + Protection properties (such as access control lists) on the + pseudo-files are combined with direct authorization in the device + drivers to control access. + Two devices, <filename>/dev/mem</filename> and + <filename>/dev/kmem</filename> permit unrestricted access to system + memory and kernel memory, and are generally controlled so that only + highly privileged processes may use them. + The kmem interface was used extensively in earlier versions of + FreeBSD and other UNIX-derived systems as a means to monitor kernel + activities; in more recent versions of FreeBSD, the &man.sysctl.3; + API is used to monitor and manipulate a structure kernel MIB, + providing a more controlled interface with defined semantics. + The kmem interface may also be used for debugging purposes.</para> + + <para>On many hardware platforms, it is possible for the kernel to + authorize user processes to perform I/O directly; on the i386 + platform, opening the <filename>/dev/io</filename> device enables + direct I/O access. + Other platforms provide similar functionality. + Many platforms also offer hardware-specific via the + &man.sysarch.2; system call; some of the functions provided by the + system call are process-local, but others may provide privileged + services. + For example, the i386 sysarch() call implements a + <literal>I386_SET_IOPERM</literal> operation that also enables + direct hardware I/O. + Careful maintenance of the protections of these special devices + and interfaces is vital to the proper protection of the kernel + and user processes via address space protections.</para> + </sect2> + + <sect2 id="secarch-kernel-modules"> + <title>Kernel Extension via Kernel Modules</title> + + <para>FreeBSD permits the boot-time and run-time extension of the + operating system kernel through loadable kernel modules. + This facility is used to load device drivers supporting new + hardware devices on-demand, add support for new file systems, as + well as binary emulation layers and other services. + Run-time extension of the kernel involves loading of the module + from a file into the kernel address space, followed by dynamic + linking of that module into the execution environment, and + eventual execution of any events and services in the module. + As such, the system calls to cause loading or unloading of kernel + modules are carefully controlled operations, as loading new code + into the kernel could by used by attackers to bypass other + protections. + Modules may also be loaded as part of the boot process; file + system protections are used to protect the integrity of any files + involved in the boot process.</para> + + <para>Protection of the boot sequence is vital to the secure + operation of the system; this requires protection of any hardware + devices involved in the boot process, as well as any files (such + as the boot loader, kernel, modules, and configuration files) + from inappropriate access.</para> + </sect2> + </sect1> + + <sect1 id="secarch-processes"> + <title>Process Protections</title> + + <para>Processes represent the high-level abstraction of a program + "in execution"; each process consists of a virtual memory address + space (including a mapping of executable code from a file), signal + delivery information, process credentials, one or more threads in + execution, pool of resources limits, and an array of file + descriptors holding references to a variety of I/O and system + objects. + Processes generally run "in isolation", communicating with other + processes only through explicit and intentional communication + channels (such as files, IPC primitives, etc). Shared memory is + permitted, but must be explicitly configured by the process.</para> + + <para>Programs, as executed in processes, are generally derived + from an executable image (file) that has been mapped into the + process address space. + While the kernel does not provide explicit support for it, most + applications on FreeBSD make use of dynamically linked libraries, + implemented via the memory mapping of library files into the + process address space. + In practice, most programs executing on FreeBSD systems + are composed from a variety of run-time linked libraries, and + frequently pluggable modules loaded as shared objects.</para> + + <para>The ability to directly modify process memory or other + operation parameters represents the ability to control the + execution of the process, and hence manipulate its operation. + By providing memory and other protections, the FreeBSD kernel + limits inappropriate interference between processes, preventing + accidental or intentional leakage of data, damage to data or + operational integrity, and leakage of system privilege. + System debugging interfaces break down these barries, and must + be carefully controlled.</para> + + <sect2 id="secarch-process-credentials"> + <title>Process Credentials</title> + + <para>FreeBSD assigns each process a credential, which holds + a variety of information relating to the privileges available + to the process. + The process credential contains several elements, including real, + saved, and effective user IDs, group IDs, resource limit + information for the user, a reference to a system jail, and an + extensible MAC label. + This data will be used to compute access control results + associated with most security-sensitive operations.</para> + + <para>Consistency and performance are provided in the + multi-threaded kernel by also assigning a credential to each + thread. + The thread credential is compared to the primary process + credential whenever the thread enters the kernel, and if it + differs from the process credential, it is updated to reflect + the latest snapshot of the process credential. + This ensures a consistent credential for the duration of the + system call, and most access control checks for thread + operations are performed against the thread credential (a + thread-local variable) as it does not require explicit locking. + During update operations, the process lock is held across a + check and update of the process credential to ensure consistency + and prevent races.</para> + + <para>The semantics of the various credential fields are + defined both by historical application requirements, and + in the POSIX specifications. + In general, the effective uid, effective gid, and additional + groups will be used to implement access control checks. + Processes may use the saved and real uid and gid to preserve + other credential elements for conditional use during execution: + for example, when a setuid application is executed, the saved + uid and gid are updated to the values of the effective uid + and gid prior to the execution. + This permits setud applications to swap between the original + and file-originated ids, permitting the privileges to be + blended in a controlled manner. + Real and saved uids and gids will also be used in controlling + inter-process access control, and will be used under some + circumstances to control resource limits.</para> + + <para>Credentials are also cached in a variety of other + kernel data structures, generally at the point at which + initial access to the object occurs. + This caching permits "time of open" UNIX security semantics to + be implemented for a several objects, including file descriptors + and mountpoints. + These credential references are then used to authorize + asynchronous write-behind, such as found in NFS.</para> + + <para>As credentials are frequently referenced throughout the + system, but rarely modified, credentials are stored as + copy-on-write. + This permits new read-only references to credential structures + to be created with minimal memory overhead. + When a credential must be modified, a new copy of the credential + is created, modified, and then the process reference is updated + to point to the new credential.</para> + + <para>A variety of hazards are associated with dynamic changes + in process credentials, as processes may be the object of + operations, not just the subject. + When a process runs with upgraded or downgraded privileges, risks + may exist. + For example, even if a process reduces its level of privilege, it + may have cached access to objects or memory that is not revoked + with the explicit credential change (such as keying material in + library-managed or free'd memory). + When a process receives upgraded privileges, such as on execution + of a setuid binary, the system must revoke access to debug + the process by other processes that may already have had + debugging sessions opon.</para> + + <para>These protections are introduced in three ways: first, + disallowing of operations that may upgrade of process credentials + if access to the process cannot be revoked. + Second, storage of a "credential change flag", named P_SUGID for + historical reasons, which will be used to modify the + inter-process access control policy by indicating a change has + happened in the process life time. + Third, the explicit revocation of existing access to the process. + Additional information about inter-process access control may be + found later in this chapter.</para> + + <para>In most situations, the user credential structure is + sufficient to encapsulate all the necessary subject information + required for an access control decision. + However, under some circumstances, additional process information + may also be used in the decision to exempt closely related + processes from certain protections--for example, participation in + the same sesion is sufficient to authorize delivery of the + "continue" signal between processes, regardless of credentials. + </para> + </sect2> + + <sect2 id="secarch-privilege-model"> + <title>Root Privilege Model</title> + + <para>The uid 0, assigned to the root user, is given special + privilege to bypass system protections, including most + discretionary and mandatory protections on the local + system. + This privilege is referred to as the "superuser", and is used + for system processes, during the boot and shutdown processes, + and for management purposes. + Because of this concentration of privilege, required to + perform a number of system activities, system services + running with root privilege are popular targets for attack, + as gain access to uid 0 grants access to most other + privileges in the system.</para> + + <para>FreeBSD ships with the securelevel protection mechanism, + first distributed with BSD 4.4. + Securelevels limit the scope of root privilege based on a + monotonically increasing current securelevel. + As the securelevel increases, various privileges are removed, + including the privilege to directly access disk devices, to + change file system protection flags, and to modify the + firewall configuration. + This model does not provide comprehensive protection against + the compromise of root privilege, but if properly configured, + can be used to improve the safety of the recovery + process.</para> + + <para>The privileges of uid 0 are also bounded when used in + combination with the jail() security extension, described + later in this chapter.</para> + + <para>The TrustedBSD MAC Framework is also capability of + limiting certain root privileges, such as the cability to + read files based on system labels. + The MAC Framework and policies are described later in this + chapter.</para> + </sect2> + + <sect2 id="secarch-resource-limits"> + <title>Process Resource Limits</title> + + <para>FreeBSD is fundamentally designed around a resource-sharing + model, in which the operating system controls access to a + set of real hardware resources and mediates access to ensure + consistent and appropriate use. + As UNIX-derived systems are frequently deployed in environments + in which users or processes contend for resources, a variety of + approaches are taken to preventing inappropriate exclusion of + other users or processes. + This includes scheduler behavior to provide for "fair" + distribution of CPU resources between independent processes + based on priorities, a file system quota mechanism to bound + maximum consumption of resources by user or group, and a set + of process resource limits bounding access to a variety of + resources at the granularity of uids globally, and process + hierarchies. + In multiuser environments, such as ISP shell servers, resource + limits are vitally important to successful long-term operation: + users have a number of unfortunate habits, including the + creation of programs that (intentionally or otherwise) attempt + to consume all available system resources.</para> + + <sect3 id="secarch-scheduler"> + <title>System Scheduler Priorities</title> + + <para>FreeBSD 5.1 ships with two system schedulers, one of + which must be selected at kernel compile-time. + The 4.4BSD scheduler implements a time-sharing, floating + priority scheduler based on user-assigned process priorities, + with additional support for real-time and idle scheduling. + The ULE scheduler implements a similar scheduling policy, but + contains optimizations for threading, non-symmetric + CPU topologies, and scheduler structural optimizations for + MP environments.</para> + + <para>The UNIX priority scheme assigns fixed priorities to + kernel and user processes; lower priority values indicate + a higher precedence. + Kernel processes generally take priorities based on + compile-time configuration. + User processes inherit their priority from their parent + process, and priorities may be updated using the + setpriority() system call, which operates on a process, + process group, or all processes owned by a specified user. + In general, privilege is required to lower the priority (raise + the precedence) of a process. + Processes may perform operations to modify the scheduling + properties of another process; policies associated with + these operations are described in the Inter-Process + Authorization section. + Kernel locking primitives make use of priority propagation to + prevent priority inversions on contended kernel + resources.</para> + + <para>In addition to the UNIX process priority ranges, processes + may also operate with realtime or idle priority. + Real time processes preempt processes of lower priorities, and + when contending against equal priority processes, are executed + round-robin. + Idle priority processes operate only when no other processes + are able to execute. + Because both realtime and idle priority processes can result in + priority inversions, privilege is required to modify the + realtime or idle priorities of a process.</para> + </sect3> + + <sect3 id="secarch-globalmeasurement"> + <title>Per-Uid Global Resource Measurement</title> + + <para>FreeBSD permits limits on the number of processes and + amount of socket buffer space, two particularly sensitive + system resources. + Measurements are taken globally on a per-uid basis. + The FreeBSD kernel maintains a global list of both consumed + and maximum resources per-uid in reference-counted uidinfo + structures. + References to the uidinfo structure for the effective and real + uids are cached in the user credential structure, and are + updated when the uid of a process changes.</para> + + <para>Process counts are maintained based on the number of + processes owned by a particular real uid. + Updates to the per-uid process count are performed when the + first process is created, whenever a process forks or exits, + and whenever a process changes its real uid. + Resource limits on process counts are checked only on process + fork; uid change operations will not fail by virtue of a + resource limit.</para> + + <para>Socket buffer sizes are maintained based on the sum of + the high watermark sizes of sockets allocated by a + particular effective uid. + Updates to the per-uid socket buffer count are performed when + a socket is allocated (via the socket() system call or as part + of a new incoming connection), or when data is sent or + received on the socket that may expand the high watermark. + Resource limits are checked only when new sockets are + created.</para> + </sect3> + + <sect3 id="secarch-plimits"> + <title>Per-Process Limits</title> + + <para>Some resource limits, such as number of processes and + maximum socket buffer per uid, are measured globally across + the system. + Other resources, such as VM space consumption or stack size, + are measured locally to the process. + In both cases, the limits imposed are process-local in that + resource limits are a per-process property. + Each process has a reference to a per-process limit structure, + which consists of a set of limits associated with different + resources. + Each limit contains two elements: a soft (current) limit, and + a hard (maximum) limit which represents the greatest value the + current limit may be increased to without privilege. + Process limits are inheritted on fork(); internally, the + limits are stored copy-on-write. + Resource limits are tested on the allocation of the resource, + such as on the allocation of new memory, creation of a socket, + or forking of a process.</para> + + <para>Resources measured and controlled by the plimit structure + include CPU time, maximum file size, maximum address space + "data" size, maximum address space "stack" size, maximum + size of core file, maximum resident set size, maximum + memory pages that may be locked into physical memory, the + maximum number of processes for the real uid, maximum number + of open files, maximum size consumed by socket buffers for + the effective uid, and maximum virtual memory size (including + file mappings).</para> + </sect3> + </sect2> + + <sect2 id="secarch-interprocess-authorization"> + <title>Inter-Process Authorization</title> + + <para>Processes interact explicitly through a variety of + communication channels, including the file system and IPC + services. + They may also directly interact through a series of inter-process + services. + These include signalling (which may act as IPC, modify + scheduling, or signal termination), a variety of monitoring + mechanisms such as those used to implement &man.ps.1;, scheduler + services to modify a process priority or schedule model, and a + set of debugging interfaces permitting a process to closely + monitor and modify the behavior of another process. + Inter-process operations permit the flow of information and + control between processes: signals may directly control the + operation and behavior of a process; visibility of process + command lines share information about what the process is doing, + scheduling services may prevent a process from running or cause it + to operate improperly; debugging often permits direct control of + the process and access to any resources accessible to the target + process.</para> + + <para>Protections associated with these services are important to + prevent serious security vulnerabilities: in most cases, the + protection model requires that, to modify the behavior of another + process via signalling, scheduling, or debugging, the subject + (initiating) process must have the identical or superset + privileges to the object (target) process. + Additional protects may be assigned to a process if the process + has modified or downgraded is privileges, as it may still have + local references to information or resources beyond those + normally available to its new privileges; these protections are + typically used to prevent attaching a debugger to a process that + has run as root, and is now executing as another uid during the + login process, or to protect setuid or setgid programs in + execution. + Vulnerabilities exploitable without these protections might + include access to password files or keying material still held + by the process.<para> + + <para>An important limitation to the safety of inter-process + operations is derived from the UNIX "process id" (pid) model. + Each process is assigned a numeric identifier, unique for the + lifetime of the process. + Operations targetted at processes generally specify the + target process by means of its pid: however, pids may (will) be + reused following the death of a process. + While FreeBSD can be operated in a mode in which pids are + randomly allocated, eventual reuse of a pid is guaranteed + with any reasonable uptime. + As a result, signals may be delivered to improper processes as + a result of races in pid use. + While inter-process access control prevents the malicious + delivery of signals to processes based on differing + credentials, it cannot prevent the accidental delivery of a + signal to an unintended process by an authorized process. + </para> + + <para>MAC Framework policy modules are permitted to augment + inter-process protections, and many do so.</para> + </sect2> + </sect1> + + <sect1 id="secarch-fileobjects"> + <title>File Descriptors, File Systems, and Storage Security</title> + + <para></para> + + <sect2 id="secarch-fds"> + <title>File Descriptors</title> + + <para>Each FreeBSD process has an associated array of references + to active I/O objects, known as file descriptors, which + refer to active object sessions. + For most processes, the file descriptor array will not be + shared; however, it is possible to create processes that share + file descriptors using the rfork() call--this behavior is + required to emulate Linux threading. + Each object session, described by a <structname>struct + file</structname> in the kernel, has an associated underlying + object, operation vector, cached credential from the time of + creation, access mode, and current offset.</para> + + <para>Object sessions are initially referred to by one file + descriptor, but references may be duplicated to additional file + descriptors, as well as inheritted across fork() operations, and + passed to other processes using UNIX Domain Socket ancillary right + transfer. + In FreeBSD 5.1, objects referenced by file descriptors are: IPC + pipes, IPC sockets, vnodes (files, directories, device nodes, + POSIX fifos, etc), kqueues (kernel event notification queues). + References to object sessions remain until the the descriptor is + explicitly closed via the close() or rfork() system calls, or + implicitly closed on process exec() or exit(). + File descriptor arrays may not be modified except by processes + that reference them--however, the active object sessions may be + modified, as may the underlying objects. + File descriptor properties, such as offset and active access + flags, may be explicitly modified using system calls such as + seek() or fcntl(), or implicitly as a result of operations making + use of the file descriptor, such as read() or write().</para> + + <para>In most cases, accesses made using a file descriptor are + authorized using the cached file credential from the creation + of the file descriptor: for example, NFS read operations + initiated as a result of a read on the file descriptor will + be authorized using the credential that opened the file, + which may be different from the credential used to initiate + the read operation. + However, some accesses are authorized using the active + credential: typically, this includes meta-data and + administrative operations such as fchmod() on files, or ioctl() + on sockets. + Mandatory protections enforced by the MAC Framework may depend + on either the active or file credential. + For local file systems, protections are typically only enforced + with the open() operation, with the exception of + securelevel-related flags such as system immutable.</para> + </sect2> + + <sect2 id="secarch-filesystem"> + <title>File System Protection Model</title> + + <sect3 id="secarch-fsnamespace"> + <title>File System Namespace Protections</title> + + <para>The FreeBSD virtual file system (VFS) consists of + a namespace constructed out of individual file system + mounts, and a set of objects (including directories and + files) that make up the namespace. + Processes look up objects relative to either the process + root directory, or process current working directory; in + general, objects cannot be accessed without passing + through the file system namespace; objects that cannot + be named by a process are, in most cases, inaccessible to + the process, although references to unnameable objects + outside a process's namespace may be passed via IPC.</para> + + <para>As a result of these properties, three common protections + are available to protect objects in the file system + namespace: chroot() or mountpoint covering may be used + to prevent the process from constructing a name to the + object; namespace protections, such as access control + lists on directories, can prevent a process from traversing + the namespace to an object; finally, protections on the + object itself can prevent inappropriate access to an + object--for example, file permissions may permit read of + an object, but not write. + Objects may appear more than once in the same namespace by + virtue of hard links and synthetic mountpoints: as a result, + caution must be applied when relying on namespace-based + protections to limit access to an object.</para> + + <para>Modifications to the namespace may be performed by + adding or removing file system mounts, attaching, overlaying, + or detaching parts of the namespace, or by modifying elements + in the namespace by perform operations on objects in the + namespace. + Mount and unmount operations require privilege in FreeBSD + by default; however, the system policy may be configured to + permit user mounts under limited circumstances. + Access to the mount primitives is generally limited because + the ability to directly access the underlying storage + mechanism connotes the ability to manipulate the file system + namespace and protections, bypassing OS limits on those + operations. + When user mounts are permitted, the underlying device must + be readable (and optionally writable) by the user performing + the mount operation; in addition, users may only mount + new file systems on top of objects (typically, directories) + that they own. + Modifications to elements in a file system namespace are + typically authorized in a file system-specific manner, based + on the mandatory and discretionary protections offered by + the file system.</para> + + <para>The MAC Framework permits file system operations to + be controlled across all file systems, with protections + implemented above the per-filesystem layer. + File systems may support multi-label operation, assigning + labels to each object in the file system in a file-system + specific manner, or single-label operation in which all + objects in the file system share the same label, requiring + no special support by the file system for MAC.</para> + + <sect3 id="secarch-fsobjects"> + <title>File System Objects and Operations</title> + + <para>The FreeBSD VFS defines several classes of objects, and + operations that apply to one or more of those objects. + The following operations may be supported on a virtual file + node:</para> + + <variablelist> + <varlistentry> + <term>create()</term> + <listitem> + <para>Create a new file system object; parent directory, >>> TRUNCATED FOR MAIL (1000 lines) <<<
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200307251648.h6PGmkRl043340>