From owner-freebsd-jail@FreeBSD.ORG Mon Jun 15 21:44:35 2015 Return-Path: Delivered-To: freebsd-jail@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0C7F8D80; Mon, 15 Jun 2015 21:44:35 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 941AF6B3; Mon, 15 Jun 2015 21:44:34 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiga1 with SMTP id a1so91284772wig.0; Mon, 15 Jun 2015 14:44:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=qN5ok6rJG3E1UCPGCJwfevktPczWXBgca6Aseab4tAc=; b=gBBKefEBGo3HpvREA8qnfW92rCKsezhO+18VjxjQwN/lGaivNZQlALqFyucfHZYHqF bgP8UEPWPVd6n1H41kVhGevMk9JMlJIHCnL8RGZEQ8hhLg0kaJg/CtDeNhBUZiKwtJOl R2WbhyZ/R/vIRoT1t6mxxA5uHv0JFiYlYzOkOzS2Gq1qL7Y5tgPRL7lwteREKCxu9LZ9 RB9HV4UfZbElxYIHwbtd/PJ30XYniM8nj6Ldq/TA7PXDH6O15p4A3iNL9RPNYLDYwWiQ 2jPsGTchUEFWPJpUdcW5HO5FSkm5/Al6hhNkw0jwRB+p+bcIY9HPZUG1wmvdCnHoQMiz CbHQ== X-Received: by 10.194.184.140 with SMTP id eu12mr54337746wjc.78.1434404671982; Mon, 15 Jun 2015 14:44:31 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id o6sm17752918wiz.24.2015.06.15.14.44.30 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 15 Jun 2015 14:44:30 -0700 (PDT) Date: Mon, 15 Jun 2015 23:44:28 +0200 From: Mateusz Guzik To: kikuchan@uranus.dti.ne.jp Cc: freebsd-jail@freebsd.org, freebsd-virtualization@freebsd.org Subject: Re: How to implement jail-aware SysV IPC (with my nasty patch) Message-ID: <20150615214427.GB18004@dft-labs.eu> References: <2B7AA933-CB74-4737-8330-6E623A31C6DA@lists.zabbadoz.net> <20150615104915.GA18004@dft-labs.eu> <3681b69c41fd9352fef30afed901661a@imap.cm.dream.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3681b69c41fd9352fef30afed901661a@imap.cm.dream.jp> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 21:44:35 -0000 On Tue, Jun 16, 2015 at 03:45:34AM +0900, kikuchan@uranus.dti.ne.jp wrote: > On Mon, 15 Jun 2015 12:49:16 +0200, Mateusz Guzik wrote: > > Fundamentally the basic question is how does the implementation cope > > with processes having sysvshm mappings obtained from 2 different jails > > (provided they use different sysvshms). > > > > Preferably the whole business would be /prevented/. Prevention mechanism > > would have to deal with shared address spaces (rfork(2) + RFMEM), > > threads and pre-existing mappings. > > > > The patch posted here just puts permission checks in several places, > > while leaving the namespace shared, which I find to be a user-visible > > hack with no good justification. There is also no analysis how this > > behaves when presented with aforementioned scenario. Even if it turns > > out the resut is harmless with resulting code, this leaves us with a > > very error-prone scheme. > > > > There is no technical problem adding a pointer to struct prison and > > dereferencing it instead of current global vars. Adding proper sysctls > > dumping the content for given jail is trivial and so is providing > > resource limits when creating a first-level jail with a separate > > sysvshm. Something which cannot be as easily achieved with the patch in > > question. > > Could you try the latest patch, please? > I justify user-visibility, make it hierarchical jail friendly, and use EINVAL instead of EACCES to conceal information leak. > https://bz-attachments.freebsd.org/attachment.cgi?id=157661 (typo fixed) > > > I realized my method is a bit better, when I'm trying to port/write the real namespace separation. > Let me explain (again) why I choose this method for sysv ipc, and could you tell me how it should be, please? > > struct shmmap_state { > vm_offset_t va; > int shmid; > }; > > In sysv_shm.c, struct shmmap_state, exist per process as p->p_vmspace->vm_shm, is a lookup-table for va -> shm object lookup. > The shmmap_state entry holds a reference (here, shmid) to shm object for further detach, and entries are simply copied on fork. > > If you split namespace (includes shmid space) completely, shmid would be no longer a unique identifier for IPC object in kernel. > To make it unique, adding a reference to prison into shmmap_state like this; > > struct shmmap_state { > vm_offset_t va; > struct prison *prison; > int shmid; > }; > > would be bad idea, because after a process calls jail_attach(), the process holds a reference to another (creator) prison, or copy the IPC object completely on every jail_attach() occurs? As I explained in the previous thread, with a separate namespace it is a strict requirement to prevent sharing of sysvshm mappings. With the requirement met, there is no issue. As you will see later in the mail, even your approach would benefit greatly from having such a restriction. > How do you deal with hierarchical jail? > If proper resource limiting for hierarchical jails is implemented, the new jail either inherits or gets a new namespace, depending on used options. With only simplistic support first level jails can inherit or get a new namespace, the rest must inherit. There is no issue here due to sharing prevention. > My method didn't touch anything about the mapping stuff, thus it behaves exactly the same as current FreeBSD behave on this point. > Sure it did. As you noticed yourself it makes sense to clean up sysvshms on jail destruction, which you do in sysvshm_cleanup_for_prison_myhook. Your code does: if ((shmseg->u.shm_perm.mode & SHMSEG_ALLOCATED) && shmseg->cred->cr_prison == pr) { shm_remove(shmseg, i); .... which differs from what is executed by kern_shmdt_locked. Now let's consider a process which rforks and shared the address space with it's child. The child enters a jail and grabs a sysvshm mapping, then exits and we kill the jail. In effect we got a process with an address space which used a mapping created in a now-destroyed jail. Is this situation problematic? I don't see any anlysis provided. Maybe it is, maybe it so happens it is not. The mere posibility of this scenario needlessly complicates maintenance, and such a scenario has likely no practical purpose. As such, it is best /prevented/. With it prevented there is nothing positive about your approach that I could see. > I'm not sure I could understand properly what the shared address space problem is, (Could someone help me to understand, perhaps in code?) > and, I'm not sure whether the current FreeBSD has the shared address space problem for sysvshm combined with jails. > If it has the problem, unfortunately my patch doesn't provide any solution for that, > but if not, my patch doesn't have the problem either, because I didn't change code structure. > As I mentioned, you sure did. I don't know if there are any serious problems /as it is/ and I'm too lazy to check. I surely expect any patch doing sysvshm for jails to be provided with an anslysis of its behaviour in that regard though. > The patch just fixes key_t collision for jails, nothing more. > So, the patch is harmless for non-jail user, and I believe it's useful for jail user using allow.sysvipc=true. > > > BTW, What do you think about the following design for jail-aware sysvipc? > > > - IPC objects created on parent jail, are invisible to children. > > - IPC objects created on neighbor jail, are also invisible each other. > > - IPC objects craeted on child jail, are VISIBLE from parent. > > - IPC key_t spaces are separated between jails. If you see the key_t named object from parent, it's shown as IPC_PRIVATE. > How about the following: the jail decided whether it wants to share a namespace with a particular child (and by extension grandchildren and so on). Done. There is nothing complicated to do here unless you want to try out named namespace which you e.g. assign to different jails on the same level. -- Mateusz Guzik