From owner-freebsd-stable@FreeBSD.ORG Sun Jun 7 01:39:38 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F2B3C6C8; Sun, 7 Jun 2015 01:39:37 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com [IPv6:2a00:1450:400c:c00::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ACE92182D; Sun, 7 Jun 2015 01:39:37 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgme6 with SMTP id e6so79328901wgm.2; Sat, 06 Jun 2015 18:39:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=KRjVQh4jIuZpxziV/mYVyC5udTp3d7Xp/x59TzSB4AE=; b=YaR0J7OJftBNaJFDeDr/GP+XsvQIF2QmNfLOUItUmrdZWM1a0XDUH4jnRVkTlxA+9q JRmC2pAhQH7OEC71LmP0wS0n5EL+JHz41c/X0ZTEaW1IUOC5A4hy9ZXStbe+Cvc3MK5k fmv1ycMnTfkltCHURNEViwjUg9WOiofWhjvQ9xNYG5GuJpnuk9iO/Jta203MWqoOEubU Ovf6wDDVtbka2HIpRw8k+77XBEMOwajVBqh9LqwH/ZbBWtCn2v3/hTFyLAxK62lLXjme uHhdQCecgn/y+0RoCUyxsflP7v3a6ZFjq2UgFlLrhGyGEq2Osnkl8v23HcjO8Rj7n+as h8KQ== X-Received: by 10.195.13.1 with SMTP id eu1mr19405799wjd.131.1433641174551; Sat, 06 Jun 2015 18:39:34 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id u7sm4758833wif.3.2015.06.06.18.39.32 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 06 Jun 2015 18:39:32 -0700 (PDT) Date: Sun, 7 Jun 2015 03:39:30 +0200 From: Mateusz Guzik To: kikuchan Cc: freebsd-jail@freebsd.org, freebsd-stable@freebsd.org Subject: Re: [patch] separate SysV IPC namespace for jail Message-ID: <20150607013929.GA9182@dft-labs.eu> References: <20150605235348.GA9965@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jun 2015 01:39:38 -0000 On Sun, Jun 07, 2015 at 12:04:17AM +0900, kikuchan wrote: > Sorry for cross-post to freebsd-stable, but I want to get more > feedback for my patch. > (The patch is; http://lists.freebsd.org/pipermail/freebsd-jail/attachments/20150606/7736309b/attachment.bin) > > > I believe this patch FIXES current SysV IPC for jail WITHOUT changing > current kernel architecture. > (so I hope it will be merged into stable/10) > > Let me explain what happens currently, with and without my patch, > since it's little confusing. > > > I use SysV IPC shared memory (SYSVSHM) as an example here, because > it's easy to understand. > Remember shmget / shmat / shmdt / shmctl, are syscalls of SYSVSHM. > > All normal processes have its own virtual memory space, it is done by kernel. > A backend component of virtual memory is a page, is on real memory or > on swap devices. > > SYSVSHM provides a way to share memory segments on the page between > processes on userland. > A process can load the page into its own virtual memory space with > shmat syscall. > Once the page is loaded into the virtual memory space, the page is > accessible until further shmdt syscall or exit of process. > > Another process can obtain the exact same page, by calling shmat syscall. > So, permission of shmat syscall is very important. > > > > Address space can be shared between multiple jails > This was a typo. Let me quote fixed version: "Address space can be shared between multiple PROCESSES, what happens if such a pair ends up in different jails? Preferably such a scenario would be prohibited to avoid future accidents." However, sysvipc namespace sharing is an ok feature esp. with multi-level jails. In the simplest scenario upon jail creation you decide whether it gets its own namespace or inherits it. > > What about existing sysvshm mappings when jailing? > > Real (not jailed) environment is treated as a jail with jid=0 in kernel. > If you create sysvshm memory segment before entering a jail, the > segment simply owned by jid=0. > The point is you get a process with sysvshm segments from 2 different jails. Looks like solid trouble protential. > > > Extending struct prison with relevant pointers and updating the code to > > You don't need to extend the struct to separate IPC namespaces. > The word "namespaces" means a key (key_t) of IPC syscall, here. > > Whether the struct should be extended or not, depends on how we want > to control IPC resources for each jail. > If you want to control SysV IPC resources by changing sysctl > parameters from inside of jail for each jail, > then it might be yes. > But I think per-jail resource control should be done with RACCT, and > it might be applied to my implementation too. > > > The one missing feature is how to export information to userland. > This should be discuss separately, even if my patch is rejected. > (If visibility control is needed for ipcs, maybe it should use similar > technique to ps or netstat?) > > > Conclusion; > I think my patch is better than broken. (SysV IPC + jail is buggy over > 10 years!) > The feature in question is definitely desirable, but your patch is hack, with the "hack" part visible to userspace. As mentioned earlier there are some things to do before any kind of jail-aware ipcs land in the tree. As a minimum this is singlethreading when jailing, prevention of jailing processes with shared virtual address spaces and ones with existing sysvshm mappings. All this is to reduce amount of bugs one would have to deal with. After the work is completed there is no problem whatsoever with providing per-jail sysvipcs. This avoids information leaks (no id list to look at) and conflicts. Exporting is not a problem either - a dedicated sysctl grabs JID and dumps its ipcs. It also gets a 'recursive' flag to know whether ipcs for its own jails should be dumped as well (if different). -- Mateusz Guzik