From owner-svn-src-stable-7@FreeBSD.ORG Sat Feb 7 13:19:09 2009 Return-Path: Delivered-To: svn-src-stable-7@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D7CC106564A; Sat, 7 Feb 2009 13:19:09 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 570DA8FC08; Sat, 7 Feb 2009 13:19:09 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id n17DJ9s9092750; Sat, 7 Feb 2009 13:19:09 GMT (envelope-from bz@svn.freebsd.org) Received: (from bz@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id n17DJ8Wd092736; Sat, 7 Feb 2009 13:19:08 GMT (envelope-from bz@svn.freebsd.org) Message-Id: <200902071319.n17DJ8Wd092736@svn.freebsd.org> From: "Bjoern A. Zeeb" Date: Sat, 7 Feb 2009 13:19:08 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-7@freebsd.org X-SVN-Group: stable-7 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r188281 - in stable/7: . etc etc/defaults etc/periodic/weekly lib/libc lib/libc/string lib/libc/sys lib/libkvm share/man/man4 sys sys/compat/freebsd32 sys/contrib/pf sys/dev/ath/ath_hal... X-BeenThere: svn-src-stable-7@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for only the 7-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Feb 2009 13:19:10 -0000 Author: bz Date: Sat Feb 7 13:19:08 2009 New Revision: 188281 URL: http://svn.freebsd.org/changeset/base/188281 Log: MFC: r185435: This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. r185441: Unbreak the no-networks (no INET/6) build. r185899: Correctly check the number of prison states to not access anything outside the prison_states array. When checking if there is a name configured for the prison, check the first character to not be '\0' instead of checking if the char array is present, which it always is. Note, that this is different for the *jailname in the syscall. Found with: Coverity Prevent(tm) CID: 4156, 4155 r186085: Make sure that the direct jls invocations prints something reasonable close to and in the same format as it had always. r186606: Make sure that unused j->ip[46] are cleared. r186834: Document the special loopback address behaviour of jails. PR: kern/103464 r186841: Put the devfs ruleset next to devfs enable, add a comment about the suggested ruleset[1]. While here use an IP from the 'test-net' prefix for docs. PR: kern/130102 r187059: Add a short section talking about jails and file systems; mention the mountand jail-aware file systems as well as quota. PR: kern/68192 r187092: Sort .Xr. r187365: s,unmount 8,umount 8, it is unmount(2) which I did not mean. r187669: Update the description of the '-h' option wrt to primary addresses per address family and add a reference to the ip-addresses option. r187670: New sentence starts on a new line. Modified: stable/7/UPDATING stable/7/etc/ (props changed) stable/7/etc/defaults/rc.conf stable/7/etc/periodic/weekly/ (props changed) stable/7/lib/libc/ (props changed) stable/7/lib/libc/string/ffsll.c (props changed) stable/7/lib/libc/string/flsll.c (props changed) stable/7/lib/libc/sys/cpuset_getaffinity.2 stable/7/lib/libc/sys/jail.2 stable/7/lib/libkvm/ (props changed) stable/7/lib/libkvm/kvm_proc.c stable/7/share/man/man4/ (props changed) stable/7/share/man/man4/ddb.4 stable/7/share/man/man4/igb.4 (props changed) stable/7/sys/ (props changed) stable/7/sys/compat/freebsd32/freebsd32.h stable/7/sys/compat/freebsd32/freebsd32_misc.c stable/7/sys/compat/freebsd32/syscalls.master stable/7/sys/contrib/pf/ (props changed) stable/7/sys/dev/ath/ath_hal/ (props changed) stable/7/sys/dev/cxgb/ (props changed) stable/7/sys/kern/kern_cpuset.c stable/7/sys/kern/kern_exit.c stable/7/sys/kern/kern_fork.c stable/7/sys/kern/kern_jail.c stable/7/sys/kern/uipc_socket.c stable/7/sys/net/if.c stable/7/sys/net/rtsock.c stable/7/sys/netinet/in_pcb.c stable/7/sys/netinet/raw_ip.c stable/7/sys/netinet/sctp_pcb.c stable/7/sys/netinet/sctp_usrreq.c stable/7/sys/netinet/tcp_usrreq.c stable/7/sys/netinet/udp_usrreq.c stable/7/sys/netinet6/in6_pcb.c stable/7/sys/netinet6/in6_src.c stable/7/sys/netinet6/raw_ip6.c stable/7/sys/netinet6/udp6_usrreq.c stable/7/sys/security/mac_bsdextended/mac_bsdextended.c stable/7/sys/sys/cpuset.h stable/7/sys/sys/jail.h stable/7/sys/sys/param.h stable/7/usr.bin/cpuset/ (props changed) stable/7/usr.bin/cpuset/cpuset.1 stable/7/usr.bin/cpuset/cpuset.c stable/7/usr.sbin/jail/ (props changed) stable/7/usr.sbin/jail/Makefile stable/7/usr.sbin/jail/jail.8 stable/7/usr.sbin/jail/jail.c stable/7/usr.sbin/jexec/ (props changed) stable/7/usr.sbin/jexec/Makefile stable/7/usr.sbin/jexec/jexec.8 stable/7/usr.sbin/jexec/jexec.c stable/7/usr.sbin/jls/ (props changed) stable/7/usr.sbin/jls/Makefile stable/7/usr.sbin/jls/jls.8 stable/7/usr.sbin/jls/jls.c Modified: stable/7/UPDATING ============================================================================== --- stable/7/UPDATING Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/UPDATING Sat Feb 7 13:19:08 2009 (r188281) @@ -8,6 +8,12 @@ Items affecting the ports and packages s /usr/ports/UPDATING. Please read that file before running portupgrade. +20090207: + Multi-IPv4/v6/no-IP jail support was merged to STABLE. + You need to rebuild jls(8) and to use the new features + jail(8), jexec(8) and cpuset(1) with a new kernel. + __FreeBSD_version was bumped to 701103. + 20090119: NTFS has been removed from GENERIC kernel on amd64 to match GENERIC on i386. Should not cause any issues since mount_ntfs(8) Modified: stable/7/etc/defaults/rc.conf ============================================================================== --- stable/7/etc/defaults/rc.conf Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/etc/defaults/rc.conf Sat Feb 7 13:19:08 2009 (r188281) @@ -620,7 +620,7 @@ jail_sysvipc_allow="NO" # Allow SystemV # #jail_example_rootdir="/usr/jail/default" # Jail's root directory #jail_example_hostname="default.domain.com" # Jail's hostname -#jail_example_ip="192.168.0.10" # Jail's IP number +#jail_example_ip="192.0.2.10" # Jail's IP number #jail_example_interface="" # Interface to create the IP alias on #jail_example_fib="0" # routing table for setfib(1) #jail_example_exec_start="/bin/sh /etc/rc" # command to execute in jail for starting @@ -629,10 +629,11 @@ jail_sysvipc_allow="NO" # Allow SystemV # specified using a trailing number #jail_example_exec_stop="/bin/sh /etc/rc.shutdown" # command to execute in jail for stopping #jail_example_devfs_enable="NO" # mount devfs in the jail +#jail_example_devfs_ruleset="ruleset_name" # devfs ruleset to apply to jail - + # usually you want "devfsrules_jail". #jail_example_fdescfs_enable="NO" # mount fdescfs in the jail #jail_example_procfs_enable="NO" # mount procfs in jail #jail_example_mount_enable="NO" # mount/umount jail's fs -#jail_example_devfs_ruleset="ruleset_name" # devfs ruleset to apply to jail #jail_example_fstab="" # fstab(5) for mount/umount #jail_example_flags="-l -U root" # flags for jail(8) Modified: stable/7/lib/libc/sys/cpuset_getaffinity.2 ============================================================================== --- stable/7/lib/libc/sys/cpuset_getaffinity.2 Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/lib/libc/sys/cpuset_getaffinity.2 Sat Feb 7 13:19:08 2009 (r188281) @@ -25,7 +25,7 @@ .\" .\" $FreeBSD$ .\" -.Dd March 29, 2008 +.Dd November 29, 2008 .Dt CPUSET 2 .Os .Sh NAME @@ -46,7 +46,7 @@ and .Fn cpuset_setaffinity allow the manipulation of sets of CPUs available to processes, threads, -interrupts and other resources. +interrupts, jails and other resources. These functions may manipulate sets of CPUs that contain many processes or per-object anonymous masks that effect only a single object. .Pp Modified: stable/7/lib/libc/sys/jail.2 ============================================================================== --- stable/7/lib/libc/sys/jail.2 Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/lib/libc/sys/jail.2 Sat Feb 7 13:19:08 2009 (r188281) @@ -8,7 +8,7 @@ .\" .\" $FreeBSD$ .\" -.Dd April 8, 2003 +.Dd January 6, 2009 .Dt JAIL 2 .Os .Sh NAME @@ -32,15 +32,20 @@ The argument is a pointer to a structure .Bd -literal -offset indent struct jail { u_int32_t version; - char *path; - char *hostname; - u_int32_t ip_number; + char *path; + char *hostname; + char *jailname; + unsigned int ip4s; + unsigned int ip6s; + struct in_addr *ip4; + struct in6_addr *ip6; }; .Ed .Pp .Dq Li version defines the version of the API in use. -It should be set to zero at this time. +.Dv JAIL_API_VERSION +is defined for the current version. .Pp The .Dq Li path @@ -54,8 +59,24 @@ This can be changed from the inside of the prison. .Pp The -.Dq Li ip_number -can be set to the IP number assigned to the prison. +.Dq Li jailname +pointer is an optional name that can be assigned to the jail +for example for managment purposes. +.Pp +The +.Dq Li ip4s +and +.Dq Li ip6s +give the numbers of IPv4 and IPv6 addresses that will be passed +via their respective pointers. +.Pp +The +.Dq Li ip4 +and +.Dq Li ip6 +pointers can be set to an arrays of IPv4 and IPv6 addresses to be assigned to +the prison, or NULL if none. +IPv4 addresses must be in network byte order. .Pp The .Fn jail_attach @@ -97,6 +118,12 @@ or, if present, the per-jail .Pp All IP activity will be forced to happen to/from the IP number specified, which should be an alias on one of the network interfaces. +All connections to/from the loopback address +.Pf ( Li 127.0.0.1 +for IPv4, +.Li ::1 +for IPv6) will be changed to be to/from the primary address +of the jail for the given address family. .Pp It is possible to identify a process as jailed by examining .Dq Li /proc//status : Modified: stable/7/lib/libkvm/kvm_proc.c ============================================================================== --- stable/7/lib/libkvm/kvm_proc.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/lib/libkvm/kvm_proc.c Sat Feb 7 13:19:08 2009 (r188281) @@ -54,10 +54,11 @@ __FBSDID("$FreeBSD$"); #include #include #include -#define _WANT_PRISON /* make jail.h give us 'struct prison' */ -#include +#include #include #include +#define _WANT_PRISON /* make jail.h give us 'struct prison' */ +#include #include #include #include Modified: stable/7/share/man/man4/ddb.4 ============================================================================== --- stable/7/share/man/man4/ddb.4 Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/share/man/man4/ddb.4 Sat Feb 7 13:19:08 2009 (r188281) @@ -60,7 +60,7 @@ .\" .\" $FreeBSD$ .\" -.Dd December 26, 2007 +.Dd November 29, 2008 .Dt DDB 4 .Os .Sh NAME @@ -539,6 +539,16 @@ If the .Ar addr is given, displays details about the given GEOM object (class, geom, provider or consumer). +.\" +.Pp +.It Ic show Cm jails +Show the list of +.Xr jail 8 +instances. +In addition to what +.Xr jls 8 +shows, also list kernel internal details. +.\" .Pp .It Ic show Cm map Ns Oo Li / Ns Cm f Oc Ar addr Prints the VM map at Modified: stable/7/sys/compat/freebsd32/freebsd32.h ============================================================================== --- stable/7/sys/compat/freebsd32/freebsd32.h Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/compat/freebsd32/freebsd32.h Sat Feb 7 13:19:08 2009 (r188281) @@ -153,6 +153,24 @@ struct stat32 { unsigned int :(8 / 2) * (16 - (int)sizeof(struct timespec32)); }; +struct jail32_v0 { + u_int32_t version; + uint32_t path; + uint32_t hostname; + u_int32_t ip_number; +}; + +struct jail32 { + uint32_t version; + uint32_t path; + uint32_t hostname; + uint32_t jailname; + uint32_t ip4s; + uint32_t ip6s; + uint32_t ip4; + uint32_t ip6; +}; + struct sigaction32 { u_int32_t sa_u; int sa_flags; Modified: stable/7/sys/compat/freebsd32/freebsd32_misc.c ============================================================================== --- stable/7/sys/compat/freebsd32/freebsd32_misc.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/compat/freebsd32/freebsd32_misc.c Sat Feb 7 13:19:08 2009 (r188281) @@ -36,6 +36,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -1983,6 +1984,66 @@ done2: } int +freebsd32_jail(struct thread *td, struct freebsd32_jail_args *uap) +{ + uint32_t version; + int error; + struct jail j; + + error = copyin(uap->jail, &version, sizeof(uint32_t)); + if (error) + return (error); + switch (version) { + case 0: + { + /* FreeBSD single IPv4 jails. */ + struct jail32_v0 j32_v0; + + bzero(&j, sizeof(struct jail)); + error = copyin(uap->jail, &j32_v0, sizeof(struct jail32_v0)); + if (error) + return (error); + CP(j32_v0, j, version); + PTRIN_CP(j32_v0, j, path); + PTRIN_CP(j32_v0, j, hostname); + j.ip4s = j32_v0.ip_number; + break; + } + + case 1: + /* + * Version 1 was used by multi-IPv4 jail implementations + * that never made it into the official kernel. + */ + return (EINVAL); + + case 2: /* JAIL_API_VERSION */ + { + /* FreeBSD multi-IPv4/IPv6,noIP jails. */ + struct jail32 j32; + + error = copyin(uap->jail, &j32, sizeof(struct jail32)); + if (error) + return (error); + CP(j32, j, version); + PTRIN_CP(j32, j, path); + PTRIN_CP(j32, j, hostname); + PTRIN_CP(j32, j, jailname); + CP(j32, j, ip4s); + CP(j32, j, ip6s); + PTRIN_CP(j32, j, ip4); + PTRIN_CP(j32, j, ip6); + break; + } + + default: + /* Sci-Fi jails are not supported, sorry. */ + return (EINVAL); + } + return (kern_jail(td, &j)); +} + +int freebsd32_sigaction(struct thread *td, struct freebsd32_sigaction_args *uap) { struct sigaction32 s32; Modified: stable/7/sys/compat/freebsd32/syscalls.master ============================================================================== --- stable/7/sys/compat/freebsd32/syscalls.master Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/compat/freebsd32/syscalls.master Sat Feb 7 13:19:08 2009 (r188281) @@ -587,7 +587,7 @@ off_t *sbytes, int flags); } 337 AUE_NULL NOPROTO { int kldsym(int fileid, int cmd, \ void *data); } -338 AUE_JAIL NOPROTO { int jail(struct jail *jail); } +338 AUE_JAIL STD { int freebsd32_jail(struct jail32 *jail); } 339 AUE_NULL UNIMPL pioctl 340 AUE_SIGPROCMASK NOPROTO { int sigprocmask(int how, \ const sigset_t *set, sigset_t *oset); } Modified: stable/7/sys/kern/kern_cpuset.c ============================================================================== --- stable/7/sys/kern/kern_cpuset.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/kern/kern_cpuset.c Sat Feb 7 13:19:08 2009 (r188281) @@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include /* Must come after sys/proc.h */ #include @@ -208,7 +209,7 @@ cpuset_rel_complete(struct cpuset *set) * Find a set based on an id. Returns it with a ref. */ static struct cpuset * -cpuset_lookup(cpusetid_t setid) +cpuset_lookup(cpusetid_t setid, struct thread *td) { struct cpuset *set; @@ -221,6 +222,28 @@ cpuset_lookup(cpusetid_t setid) if (set) cpuset_ref(set); mtx_unlock_spin(&cpuset_lock); + + KASSERT(td != NULL, ("[%s:%d] td is NULL", __func__, __LINE__)); + if (set != NULL && jailed(td->td_ucred)) { + struct cpuset *rset, *jset; + struct prison *pr; + + rset = cpuset_refroot(set); + + pr = td->td_ucred->cr_prison; + mtx_lock(&pr->pr_mtx); + cpuset_ref(pr->pr_cpuset); + jset = pr->pr_cpuset; + mtx_unlock(&pr->pr_mtx); + + if (jset->cs_id != rset->cs_id) { + cpuset_rel(set); + set = NULL; + } + cpuset_rel(jset); + cpuset_rel(rset); + } + return (set); } @@ -414,12 +437,38 @@ cpuset_which(cpuwhich_t which, id_t id, set = cpuset_refbase(curthread->td_cpuset); thread_unlock(curthread); } else - set = cpuset_lookup(id); + set = cpuset_lookup(id, curthread); if (set) { *setp = set; return (0); } return (ESRCH); + case CPU_WHICH_JAIL: + { + /* Find `set' for prison with given id. */ + struct prison *pr; + + sx_slock(&allprison_lock); + pr = prison_find(id); + sx_sunlock(&allprison_lock); + if (pr == NULL) + return (ESRCH); + if (jailed(curthread->td_ucred)) { + if (curthread->td_ucred->cr_prison == pr) { + cpuset_ref(pr->pr_cpuset); + set = pr->pr_cpuset; + } + } else { + cpuset_ref(pr->pr_cpuset); + set = pr->pr_cpuset; + } + mtx_unlock(&pr->pr_mtx); + if (set) { + *setp = set; + return (0); + } + return (ESRCH); + } default: return (EINVAL); } @@ -668,6 +717,59 @@ cpuset_thread0(void) } /* + * Create a cpuset, which would be cpuset_create() but + * mark the new 'set' as root. + * + * We are not going to reparent the td to it. Use cpuset_reparentproc() for that. + * + * In case of no error, returns the set in *setp locked with a reference. + */ +int +cpuset_create_root(struct thread *td, struct cpuset **setp) +{ + struct cpuset *root; + struct cpuset *set; + int error; + + KASSERT(td != NULL, ("[%s:%d] invalid td", __func__, __LINE__)); + KASSERT(setp != NULL, ("[%s:%d] invalid setp", __func__, __LINE__)); + + thread_lock(td); + root = cpuset_refroot(td->td_cpuset); + thread_unlock(td); + + error = cpuset_create(setp, td->td_cpuset, &root->cs_mask); + cpuset_rel(root); + if (error) + return (error); + + KASSERT(*setp != NULL, ("[%s:%d] cpuset_create returned invalid data", + __func__, __LINE__)); + + /* Mark the set as root. */ + set = *setp; + set->cs_flags |= CPU_SET_ROOT; + + return (0); +} + +int +cpuset_setproc_update_set(struct proc *p, struct cpuset *set) +{ + int error; + + KASSERT(p != NULL, ("[%s:%d] invalid proc", __func__, __LINE__)); + KASSERT(set != NULL, ("[%s:%d] invalid set", __func__, __LINE__)); + + cpuset_ref(set); + error = cpuset_setproc(p->p_pid, set, NULL); + if (error) + return (error); + cpuset_rel(set); + return (0); +} + +/* * This is called once the final set of system cpus is known. Modifies * the root set and all children and mark the root readonly. */ @@ -732,7 +834,7 @@ cpuset_setid(struct thread *td, struct c */ if (uap->which != CPU_WHICH_PID) return (EINVAL); - set = cpuset_lookup(uap->setid); + set = cpuset_lookup(uap->setid, td); if (set == NULL) return (ESRCH); error = cpuset_setproc(uap->id, set, NULL); @@ -771,6 +873,7 @@ cpuset_getid(struct thread *td, struct c PROC_UNLOCK(p); break; case CPU_WHICH_CPUSET: + case CPU_WHICH_JAIL: break; } switch (uap->level) { @@ -831,6 +934,7 @@ cpuset_getaffinity(struct thread *td, st thread_unlock(ttd); break; case CPU_WHICH_CPUSET: + case CPU_WHICH_JAIL: break; } if (uap->level == CPU_LEVEL_ROOT) @@ -857,6 +961,7 @@ cpuset_getaffinity(struct thread *td, st PROC_SUNLOCK(p); break; case CPU_WHICH_CPUSET: + case CPU_WHICH_JAIL: CPU_COPY(&set->cs_mask, mask); break; } @@ -934,6 +1039,7 @@ cpuset_setaffinity(struct thread *td, st PROC_UNLOCK(p); break; case CPU_WHICH_CPUSET: + case CPU_WHICH_JAIL: break; } if (uap->level == CPU_LEVEL_ROOT) @@ -953,7 +1059,8 @@ cpuset_setaffinity(struct thread *td, st error = cpuset_setproc(uap->id, NULL, mask); break; case CPU_WHICH_CPUSET: - error = cpuset_which(CPU_WHICH_CPUSET, uap->id, &p, + case CPU_WHICH_JAIL: + error = cpuset_which(uap->which, uap->id, &p, &ttd, &set); if (error == 0) { error = cpuset_modify(set, mask); Modified: stable/7/sys/kern/kern_exit.c ============================================================================== --- stable/7/sys/kern/kern_exit.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/kern/kern_exit.c Sat Feb 7 13:19:08 2009 (r188281) @@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -461,6 +462,10 @@ retry: p->p_xstat = rv; p->p_xthread = td; + /* In case we are jailed tell the prison that we are gone. */ + if (jailed(p->p_ucred)) + prison_proc_free(p->p_ucred->cr_prison); + #ifdef KDTRACE_HOOKS /* * Tell the DTrace fasttrap provider about the exit if it Modified: stable/7/sys/kern/kern_fork.c ============================================================================== --- stable/7/sys/kern/kern_fork.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/kern/kern_fork.c Sat Feb 7 13:19:08 2009 (r188281) @@ -54,6 +54,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -442,6 +443,11 @@ again: __rangeof(struct proc, p_startzero, p_endzero)); p2->p_ucred = crhold(td->td_ucred); + + /* In case we are jailed tell the prison that we exist. */ + if (jailed(p2->p_ucred)) + prison_proc_hold(p2->p_ucred->cr_prison); + PROC_UNLOCK(p2); /* Modified: stable/7/sys/kern/kern_jail.c ============================================================================== --- stable/7/sys/kern/kern_jail.c Sat Feb 7 11:40:47 2009 (r188280) +++ stable/7/sys/kern/kern_jail.c Sat Feb 7 13:19:08 2009 (r188281) @@ -1,5 +1,7 @@ /*- - * Copyright (c) 1999 Poul-Henning Kamp. All rights reserved. + * Copyright (c) 1999 Poul-Henning Kamp. + * Copyright (c) 2008 Bjoern A. Zeeb. + * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -26,6 +28,9 @@ #include __FBSDID("$FreeBSD$"); +#include "opt_ddb.h" +#include "opt_inet.h" +#include "opt_inet6.h" #include "opt_mac.h" #include @@ -51,6 +56,12 @@ __FBSDID("$FreeBSD$"); #include #include #include +#ifdef DDB +#include +#ifdef INET6 +#include +#endif /* INET6 */ +#endif /* DDB */ #include @@ -67,7 +78,7 @@ SYSCTL_INT(_security_jail, OID_AUTO, set int jail_socket_unixiproute_only = 1; SYSCTL_INT(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW, &jail_socket_unixiproute_only, 0, - "Processes in jail are limited to creating UNIX/IPv4/route sockets only"); + "Processes in jail are limited to creating UNIX/IP/route sockets only"); int jail_sysvipc_allowed = 0; SYSCTL_INT(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW, @@ -94,6 +105,11 @@ SYSCTL_INT(_security_jail, OID_AUTO, mou &jail_mount_allowed, 0, "Processes in jail can mount/unmount jail-friendly file systems"); +int jail_max_af_ips = 255; +SYSCTL_INT(_security_jail, OID_AUTO, jail_max_af_ips, CTLFLAG_RW, + &jail_max_af_ips, 0, + "Number of IP addresses a jail may have at most per address family"); + /* allprison, lastprid, and prisoncount are protected by allprison_lock. */ struct prisonlist allprison; struct sx allprison_lock; @@ -119,6 +135,12 @@ struct prison_service { static void init_prison(void *); static void prison_complete(void *context, int pending); static int sysctl_jail_list(SYSCTL_HANDLER_ARGS); +#ifdef INET +static int _prison_check_ip4(struct prison *, struct in_addr *); +#endif +#ifdef INET6 +static int _prison_check_ip6(struct prison *, struct in6_addr *); +#endif static void init_prison(void *data __unused) @@ -130,6 +152,278 @@ init_prison(void *data __unused) SYSINIT(prison, SI_SUB_INTRINSIC, SI_ORDER_ANY, init_prison, NULL); +#ifdef INET +static int +qcmp_v4(const void *ip1, const void *ip2) +{ + in_addr_t iaa, iab; + + /* + * We need to compare in HBO here to get the list sorted as expected + * by the result of the code. Sorting NBO addresses gives you + * interesting results. If you do not understand, do not try. + */ + iaa = ntohl(((const struct in_addr *)ip1)->s_addr); + iab = ntohl(((const struct in_addr *)ip2)->s_addr); + + /* + * Do not simply return the difference of the two numbers, the int is + * not wide enough. + */ + if (iaa > iab) + return (1); + else if (iaa < iab) + return (-1); + else + return (0); +} +#endif + +#ifdef INET6 +static int +qcmp_v6(const void *ip1, const void *ip2) +{ + const struct in6_addr *ia6a, *ia6b; + int i, rc; + + ia6a = (const struct in6_addr *)ip1; + ia6b = (const struct in6_addr *)ip2; + + rc = 0; + for (i=0; rc == 0 && i < sizeof(struct in6_addr); i++) { + if (ia6a->s6_addr[i] > ia6b->s6_addr[i]) + rc = 1; + else if (ia6a->s6_addr[i] < ia6b->s6_addr[i]) + rc = -1; + } + return (rc); +} +#endif + +#if defined(INET) || defined(INET6) +static int +prison_check_conflicting_ips(struct prison *p) +{ + struct prison *pr; + int i; + + sx_assert(&allprison_lock, SX_LOCKED); + + if (p->pr_ip4s == 0 && p->pr_ip6s == 0) + return (0); + + LIST_FOREACH(pr, &allprison, pr_list) { + /* + * Skip 'dying' prisons to avoid problems when + * restarting multi-IP jails. + */ + if (pr->pr_state == PRISON_STATE_DYING) + continue; + + /* + * We permit conflicting IPs if there is no + * more than 1 IP on eeach jail. + * In case there is one duplicate on a jail with + * more than one IP stop checking and return error. + */ +#ifdef INET + if ((p->pr_ip4s >= 1 && pr->pr_ip4s > 1) || + (p->pr_ip4s > 1 && pr->pr_ip4s >= 1)) { + for (i = 0; i < p->pr_ip4s; i++) { + if (_prison_check_ip4(pr, &p->pr_ip4[i])) + return (EINVAL); + } + } +#endif +#ifdef INET6 + if ((p->pr_ip6s >= 1 && pr->pr_ip6s > 1) || + (p->pr_ip6s > 1 && pr->pr_ip6s >= 1)) { + for (i = 0; i < p->pr_ip6s; i++) { + if (_prison_check_ip6(pr, &p->pr_ip6[i])) + return (EINVAL); + } + } +#endif + } + + return (0); +} + +static int +jail_copyin_ips(struct jail *j) +{ +#ifdef INET + struct in_addr *ip4; +#endif +#ifdef INET6 + struct in6_addr *ip6; +#endif + int error, i; + + /* + * Copy in addresses, check for duplicate addresses and do some + * simple 0 and broadcast checks. If users give other bogus addresses + * it is their problem. + * + * IP addresses are all sorted but ip[0] to preserve the primary IP + * address as given from userland. This special IP is used for + * unbound outgoing connections as well for "loopback" traffic. + */ +#ifdef INET + ip4 = NULL; +#endif +#ifdef INET6 + ip6 = NULL; +#endif +#ifdef INET + if (j->ip4s > 0) { + ip4 = (struct in_addr *)malloc(j->ip4s * sizeof(struct in_addr), + M_PRISON, M_WAITOK | M_ZERO); + error = copyin(j->ip4, ip4, j->ip4s * sizeof(struct in_addr)); + if (error) + goto e_free_ip; + /* Sort all but the first IPv4 address. */ + if (j->ip4s > 1) + qsort((ip4 + 1), j->ip4s - 1, + sizeof(struct in_addr), qcmp_v4); + + /* + * We do not have to care about byte order for these checks + * so we will do them in NBO. + */ + for (i=0; iip4s; i++) { + if (ip4[i].s_addr == htonl(INADDR_ANY) || + ip4[i].s_addr == htonl(INADDR_BROADCAST)) { + error = EINVAL; + goto e_free_ip; + } + if ((i+1) < j->ip4s && + (ip4[0].s_addr == ip4[i+1].s_addr || + ip4[i].s_addr == ip4[i+1].s_addr)) { + error = EINVAL; + goto e_free_ip; + } + } + + j->ip4 = ip4; + } else + j->ip4 = NULL; +#endif +#ifdef INET6 + if (j->ip6s > 0) { + ip6 = (struct in6_addr *)malloc(j->ip6s * sizeof(struct in6_addr), + M_PRISON, M_WAITOK | M_ZERO); + error = copyin(j->ip6, ip6, j->ip6s * sizeof(struct in6_addr)); + if (error) + goto e_free_ip; + /* Sort all but the first IPv6 address. */ + if (j->ip6s > 1) + qsort((ip6 + 1), j->ip6s - 1, + sizeof(struct in6_addr), qcmp_v6); + for (i=0; iip6s; i++) { + if (IN6_IS_ADDR_UNSPECIFIED(&ip6[i])) { + error = EINVAL; + goto e_free_ip; + } + if ((i+1) < j->ip6s && + (IN6_ARE_ADDR_EQUAL(&ip6[0], &ip6[i+1]) || + IN6_ARE_ADDR_EQUAL(&ip6[i], &ip6[i+1]))) { + error = EINVAL; + goto e_free_ip; + } + } + + j->ip6 = ip6; + } else + j->ip6 = NULL; +#endif + return (0); + +e_free_ip: +#ifdef INET6 + free(ip6, M_PRISON); +#endif +#ifdef INET + free(ip4, M_PRISON); +#endif + return (error); +} +#endif /* INET || INET6 */ + +static int +jail_handle_ips(struct jail *j) +{ +#if defined(INET) || defined(INET6) + int error; +#endif + + /* + * Finish conversion for older versions, copyin and setup IPs. + */ + switch (j->version) { + case 0: + { +#ifdef INET + /* FreeBSD single IPv4 jails. */ + struct in_addr *ip4; + + if (j->ip4s == INADDR_ANY || j->ip4s == INADDR_BROADCAST) + return (EINVAL); + ip4 = (struct in_addr *)malloc(sizeof(struct in_addr), + M_PRISON, M_WAITOK | M_ZERO); + + /* + * Jail version 0 still used HBO for the IPv4 address. + */ + ip4->s_addr = htonl(j->ip4s); + j->ip4s = 1; + j->ip4 = ip4; + break; +#else + return (EINVAL); +#endif + } + + case 1: + /* + * Version 1 was used by multi-IPv4 jail implementations + * that never made it into the official kernel. + * We should never hit this here; jail() should catch it. + */ + return (EINVAL); + + case 2: /* JAIL_API_VERSION */ + /* FreeBSD multi-IPv4/IPv6,noIP jails. */ +#if defined(INET) || defined(INET6) +#ifdef INET + if (j->ip4s > jail_max_af_ips) + return (EINVAL); +#else + if (j->ip4s != 0) + return (EINVAL); +#endif +#ifdef INET6 + if (j->ip6s > jail_max_af_ips) + return (EINVAL); +#else + if (j->ip6s != 0) + return (EINVAL); +#endif + error = jail_copyin_ips(j); + if (error) + return (error); +#endif + break; + + default: + /* Sci-Fi jails are not supported, sorry. */ + return (EINVAL); + } + + return (0); +} + + /* * struct jail_args { * struct jail *jail; @@ -138,23 +432,73 @@ SYSINIT(prison, SI_SUB_INTRINSIC, SI_ORD int jail(struct thread *td, struct jail_args *uap) { + uint32_t version; + int error; + struct jail j; + + error = copyin(uap->jail, &version, sizeof(uint32_t)); + if (error) + return (error); + + switch (version) { + case 0: + /* FreeBSD single IPv4 jails. */ + { + struct jail_v0 j0; + + bzero(&j, sizeof(struct jail)); + error = copyin(uap->jail, &j0, sizeof(struct jail_v0)); + if (error) + return (error); + j.version = j0.version; + j.path = j0.path; + j.hostname = j0.hostname; + j.ip4s = j0.ip_number; + break; + } + + case 1: + /* + * Version 1 was used by multi-IPv4 jail implementations + * that never made it into the official kernel. + */ + return (EINVAL); + + case 2: /* JAIL_API_VERSION */ + /* FreeBSD multi-IPv4/IPv6,noIP jails. */ + error = copyin(uap->jail, &j, sizeof(struct jail)); + if (error) + return (error); + break; + + default: + /* Sci-Fi jails are not supported, sorry. */ + return (EINVAL); + } + return (kern_jail(td, &j)); +} + +int +kern_jail(struct thread *td, struct jail *j) +{ struct nameidata nd; struct prison *pr, *tpr; struct prison_service *psrv; - struct jail j; struct jail_attach_args jaa; int vfslocked, error, tryprid; - error = copyin(uap->jail, &j, sizeof(j)); + KASSERT(j != NULL, ("%s: j is NULL", __func__)); + + /* Handle addresses - convert old structs, copyin, check IPs. */ + error = jail_handle_ips(j); if (error) return (error); - if (j.version != 0) - return (EINVAL); - MALLOC(pr, struct prison *, sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO); + /* Allocate struct prison and fill it with life. */ + pr = malloc(sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO); mtx_init(&pr->pr_mtx, "jail mutex", NULL, MTX_DEF); pr->pr_ref = 1; - error = copyinstr(j.path, &pr->pr_path, sizeof(pr->pr_path), 0); + error = copyinstr(j->path, &pr->pr_path, sizeof(pr->pr_path), NULL); if (error) goto e_killmtx; NDINIT(&nd, LOOKUP, MPSAFE | FOLLOW | LOCKLEAF, UIO_SYSSPACE, @@ -167,10 +511,25 @@ jail(struct thread *td, struct jail_args VOP_UNLOCK(nd.ni_vp, 0, td); NDFREE(&nd, NDF_ONLY_PNBUF); VFS_UNLOCK_GIANT(vfslocked); - error = copyinstr(j.hostname, &pr->pr_host, sizeof(pr->pr_host), 0); + error = copyinstr(j->hostname, &pr->pr_host, sizeof(pr->pr_host), NULL); if (error) goto e_dropvnref; - pr->pr_ip = j.ip_number; + if (j->jailname != NULL) { + error = copyinstr(j->jailname, &pr->pr_name, + sizeof(pr->pr_name), NULL); + if (error) + goto e_dropvnref; + } + if (j->ip4s > 0) { + pr->pr_ip4 = j->ip4; + pr->pr_ip4s = j->ip4s; + } +#ifdef INET6 + if (j->ip6s > 0) { + pr->pr_ip6 = j->ip6; + pr->pr_ip6s = j->ip6s; + } +#endif pr->pr_linux = NULL; pr->pr_securelevel = securelevel; if (prison_service_slots == 0) @@ -180,8 +539,29 @@ jail(struct thread *td, struct jail_args M_PRISON, M_ZERO | M_WAITOK); } - /* Determine next pr_id and add prison to allprison list. */ + /* + * Pre-set prison state to ALIVE upon cration. This is needed so we + * can later attach the process to it, etc (avoiding another extra + * state for ther process of creation, complicating things). *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***