From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 8 11:45:16 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D2261F9A; Tue, 8 Jul 2014 11:45:16 +0000 (UTC) Received: from trypticon.cs.illinois.edu (trypticon.cs.illinois.edu [128.174.237.181]) (using TLSv1.1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A7BB9293A; Tue, 8 Jul 2014 11:45:16 +0000 (UTC) Received: from trypticon.cs.illinois.edu (localhost [127.0.0.1]) by trypticon.cs.illinois.edu (8.14.4/8.14.4/Debian-2.1ubuntu2) with ESMTP id s68Bj8wW034371; Tue, 8 Jul 2014 06:45:08 -0500 Received: (from dautenh1@localhost) by trypticon.cs.illinois.edu (8.14.4/8.14.4/Submit) id s68Bj7Hs034361; Tue, 8 Jul 2014 06:45:07 -0500 Date: Tue, 8 Jul 2014 06:45:07 -0500 From: Nathan Dautenhahn To: George Neville-Neil Subject: Re: Kernel Privilege Separation Policy Message-ID: <20140708114506.GA20687@trypticon.cs.illinois.edu> References: <100A360A-DF5E-46D5-83F0-BCAE672D1D6C@illinois.edu> <53B5ADBE.1020905@freebsd.org> <20140704204630.GC16358@trypticon.cs.illinois.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2014 11:45:16 -0000 On Sat, Jul 05, 2014 at 12:24:31PM -0400, George Neville-Neil wrote: > On 4 Jul 2014, at 16:46, Nathan Dautenhahn wrote: > > >On Fri, Jul 04, 2014 at 10:27:57AM -0400, George Neville-Neil wrote: > >>On 3 Jul 2014, at 15:23, Julian Elischer wrote: > >> > >>>On 7/2/14, 10:52 PM, Dautenhahn, Nathan Daniel wrote: > >>>>Hi All- > >>>> > >>>>I am a graduate student at UIUC and am currently working on a > >>>>system that > >>>>isolates the MMU from the rest of the FreeBSD kernel. For the > >>>>purpose of > >>>>enabling privilege separtion within the kernel. > >>>> > >>>> > >>>[...] > >>> > >>>it does sound interesting.. I think the dearth of answers is that > >>>everyone is waiting for someone-else to answer, because the topic > >>>sounds a bit intimidating, > >> > >>I also think we'd be interested in seeing the code itself, and what > >>APIs it exposes. > >>That would probably focus thinking on what can be done with it. > > > >Hi George- > > > >I will start working on getting the code available for view on a > >github > >repository. It is currently in a research prototype state, but > >moving it into a > >more production level is a goal. > > That sounds great and will definitely help. > > >The base system effectively splits the kernel into two privilege > >levels: 1) a > >very small component that mediates access to the MMU to enforce > >system wide > >memory access policies, and 2) the lower privilege part of the kernel. > > > >The initial set of policies that I'm investigating are write > >protect policies > >within the kernel itself (we can do read protect too). In other > >words place > >specific data structures into the secure region (thus protected > >from the rest > >of the kernel), and then mediate write access through a > >*write_secure_data()* > >interface by some to be determined policy. > > > >Effectively the type of interface I have is: > >- Some data structure allocated to write protected pages > >- A mediation policy for that data structure -- set of checks the > >write must > > pass > >- A write_ function for such data structures > > > > How small a region can you target? A whole structure? A field? Currently the implementation works at page granularity in terms of detecting a write to a particular memory object. Therefore, any protected data structure needs to be allocated on a secured page. So the protection region is fairly course grained. The challenge here is identifying the particular object that a write is targeting. The current implementation can detect a write to a page, and given the processor state, the exact line (virtual address) of the page being written (I haven't built this piece yet so I don't know what state gives the address to be written but I assume it is in the HW state somewher). I see that there are two directions to go for this. 1) Allow some type of static annotation to label a data structure for recording and fire a secure handler on writes -- I currently don't have any idea on how to acheive this given the granularity of write protection. 2) Enable a runtime specifier to identify the virtual address and range to watch along with an associated event handler. This would allow the developer to effectively target any sized memory region -- providing either structure, array of structs, field granularity. The latter sounds vaguely like debugging watch points. I'm not too familiar with how debuggers do this type of work. > > The more I think about it the more I think that there are two possible > interesting applications. > > 1) Debugging > > Can a pre-allocated region be moved or adjusted in such a way that, > say, under > debugger control, you could say, "Do X if anyone touches region Y?" I think this could be achieved with (2) from above. Very cool idea! > > 2) Run time protection > > Track every change to, say, the proc structure or perhaps the > routing table > etc. > > Do you have an example of code of how you use this now? I am currently applying this idea to protecting system call sysent data structures --- effectively stopping syscall hooking by protecting the function pointers. Once I get it working I will make it available on GitHub for perusing. These structures (sysent structures) are intriguing because they also have a fields to identify an audit handler. It might give a template for your suggestion here. I also plan on protecting the proc structure so that either 1) any mods are denied by some yet to be defined policy or 2) mods are allowed but will be recorded in an immutable log. > > >I am not well versed in how to translate this to the "interface" > >you asked for, > >but the basic idea is to apply some type of access policy to > >critical data > >structures to improve the security of the system. > > > >Another example benefit or application of such a *write_protect* > >mechanism is > >that even if the *write_secure_data* function does not include an > >access > >control policy, the data structures being protected will not be > >subject to > >memory corruption in the insecure kernel. For example, the UMA > >allocator > >vulnerability mentioned on Phrack [1] could be defended by > >write_protecting the > >critical allocator metadata (slab header). > > Yes, also interesting. Do you have any data on the cost of this > protection > in terms of run time overhead? I have not run any numbers that specifically measure the impact of the protection mechanism on normal writes to single objects. However, I have the following microbenchmarks for the system as a whole: ----------------------------+---------+-------------+------------------- Test | Native | PerspicuOS |PerspicuOS Overhead ----------------------------+---------+-------------+------------------- null syscall | 0.094 | 0.164 | 1.75x ----------------------------+---------+-------------+------------------- open/close | 2.02 | 2.16 | 1.07x ----------------------------+---------+-------------+------------------- mmap | 7.01 | 19 | 2.62x ----------------------------+---------+-------------+------------------- page fault | 30.8 | 33 | 1.04x ----------------------------+---------+-------------+------------------- signal handler install | 0.168 | 0.239 | 1.42x ----------------------------+---------+-------------+------------------- signal handler delivery | 1.27 | 1.33 | 1.05x ----------------------------+---------+-------------+------------------- fork + exit | 64.4 | 184 | 2.86x ----------------------------+---------+-------------+------------------- fork + exec | 99.7 | 253 | 2.54x ----------------------------+---------+-------------+------------------- The base system applies a write protection policy to MMU updates and also manages some secure state on interrupts. This latter cost can be done away with but those are unexplored mechanisms at this point. Thanks, ::nathan:: > > >I appreciate any ideas and even questions. I find that these help > >me to > >understand the system with greater clarity. > > Glad to help. > > Best, > George >