From nobody Thu Mar 31 10:24:43 2022 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 46AB91A5615C for ; Thu, 31 Mar 2022 10:24:49 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KTfYr4mz7z4ybD for ; Thu, 31 Mar 2022 10:24:48 +0000 (UTC) (envelope-from theraven@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1648722289; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fhQwPICDq7bbC10ob8EU1VP4W9PQFO2dbtMRy5PdZ5Q=; b=nOesgy6xdF30mVzmap8X6Izi5l0j+tDgNKkZqC6cPfm86Tbgees0/WUgOuwp5jf0VQulr0 aMqf7YhDykSxJFoW9025UKeijMh8VdSxx56nITOlbMXH7xPnc29cFxR7FHsIGOPiN2rJ1F LOfKrCyhPeeEKYLiCmSYCGtQUGIp9PGoSlEJ1IvSd1gfQZu9rcAeUnBE4LTULsZsi5L/fA A1Ve0uPIl5t77KUPMuVIPkLpQoTQUFvX7A/pMCsVGKSUchVK55yCUs8X4ZsZFP4mzAsEvz l1FCxxVc9tqUbnc8fS43IU6md+hoycvSplSpcwP/PzzrkKRnAFT82Zxj8EeEvA== Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: theraven) by smtp.freebsd.org (Postfix) with ESMTPSA id 710A321C14 for ; Thu, 31 Mar 2022 10:24:48 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from [192.168.1.202] (host86-134-184-31.range86-134.btcentralplus.com [86.134.184.31]) by smtp.theravensnest.org (Postfix) with ESMTPSA id AD70A3030D for ; Thu, 31 Mar 2022 11:24:45 +0100 (BST) Message-ID: <16ab7cdb-32b4-5ffe-f6a8-a657383b3078@FreeBSD.org> Date: Thu, 31 Mar 2022 11:24:43 +0100 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: curtain: WIP sandboxing mechanism with pledge()/unveil() support Content-Language: en-GB To: freebsd-hackers@freebsd.org References: <25b5c60f-b9cc-78af-86d7-1cc714232364@gmail.com> <01320c49-fa7e-99d2-5840-3c61bb8c0d57@FreeBSD.org> <2d103b77-84d4-fbd7-d957-21b9aa4d5d79@gmail.com> From: David Chisnall In-Reply-To: <2d103b77-84d4-fbd7-d957-21b9aa4d5d79@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1648722289; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fhQwPICDq7bbC10ob8EU1VP4W9PQFO2dbtMRy5PdZ5Q=; b=jdRdh6sg8JbY5RuHaoKqL/yfhYHqk/pBsS1bWxBLzJfVUaX6bDvyHzTJJQ19YL0wCKoH1y dsi4Zkgnp3fsjS52BfQxNLd9BGlrig+rUS/bLGjCNHKxBXtf5scSj3vOPxXvisZhqzo7ch XCOS9a13Dn6MAnXlNdK4p7f8PsGg4Bf2R8HgevjjHFpGsaK2TwUxavU6+fkxy6AGfNqOTV egezFYqYYr8gx/cktjHTYN5oye6x79In5GdUeAml3/yh5NH3eaXXrGEuKIzENYVCDE7FAd EygzFJpY/X9sDDOSJ47h6YsQQS9XcrpufY8Xk7HTKYcYxwzGlF+0MM1SRlsInQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1648722289; a=rsa-sha256; cv=none; b=MtZN6jaxrAGUy/UH3SuXIsbfHHpEQ3QWsu4suuzyOeyNtUc6x/TqTBcttN49k3Yo0t3FUC w+7PFrXkg1FZhDIkfxKH/1xxmbZLbjeVlX97fb4MFm5Wsh65+znml6m/Oy+z84RgW+2PN3 8YzCRCJZBR/4rum2AcAL+pz2wLy8Y/kc0zkpg83xM5NY8X7JIFHatelP3CUxfb53Y6ps4D kFhjpBibd6kHakaqfcdV3iri3Od+sXTfYkNXHGnGNFiywd80bXUiwBhYxksnIg5uYpj0xL EO4bQbocv7oOOClqJTMLEZmXO8JevdBiAsBjQg7M5nZ4fgzha6Uhw5IyF27sog== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N On 29/03/2022 18:32, Mathieu wrote: > On 3/29/22 04:34, David Chisnall wrote: >> Hi, >> >> Does pledge actually require kernel support?  I'd have thought that it >> could be implemented on top of Capsicum as a purely userland >> abstraction (more easily with libc help, but even with an LD_PRELOADed >> library along the lines of libpreopen).  In Verona, we're able to use >> Capsicum to run unmodified libraries in a sandbox, for example, >> including handling raw system calls: >> >> https://github.com/microsoft/verona/tree/master/experiments/process_sandbox >> >> >> It would be good to understand why this needs more kernel attack surface. >> >> David > > If it can work like that then it's pretty cool.  It could be a lot more > secure.  But it's just not the way I went with. Re-implementing so much > kernel functionality in userland seems like a lot of work. Because I > wanted my module to be able to sandbox (almost) everything that the OS > can run.  Including whole process hierarchies that execute other > programs and use process management and shared memory, etc.  That's a > lot of little details to get right...  So I went with the same route > that jails, other MAC modules and even Capsicum are implemented: with > access checks in the kernel itself.  And most of these checks were > already in place with MAC hooks. My concern with adding it to the kernel is that anything that does path-based checks is *incredibly* hard to get right and it will fail open. To date, there are zero examples of path-based sandboxing mechanisms deployed in the wild that have not had vulnerabilities arising from the nature of the problem. The filesystem is, inherently, concurrent. A process can mutate the shape of the filesystem graph while you are doing path-based checks, mostly around the handling of '..' in paths. Jails and Capsicum sidestep this in different ways: Jails effectively punt the problem to the jail orchestration code. They provide very strong restrictions on the paths, with a single root and allowing all access within this. There are a few restrictions on what you can do from outside of a jail to avoid allowing the jailed process to exploit TOCTOU differences and escaping but fortunately these align with the use of jails as isolated containers containing (minimal) base system. Capsicum simply disallows '..' in paths. If you want to support it in user code then you must do path resolution in userspace. You may still have TOCTOU bugs, but they'll all fail closed: you will try to resolve the result, discover that you don't have a file descriptor corresponding to the path, and fail. > pledge()/unveil() are usually used for fairly well-disciplined > applications that either don't run other programs or run very specific > programs that are also well-disciplined and don't expect too much > (unless you just drop the pledges on execve()). The execve hole is the reason that I have little interest in pledge as an enforcement mechanism. If a process can just execve itself to escape, then that's a trivial hole to exploit unless you're incredibly careful to make sure that the process does not have the ability to create or read files with executable privilege on the filesystem. In contrast, something using Capsicum can create child processes but they inherit the same limitations. It can inherit file descriptors from the parent, so if it is using something like libpreopen then it can inherit a large number of file descriptors for any of the files / directories that it should be permitted to open. Since rtld was extended to allow direct execution mode, you can launch dynamically linked binaries in Capsicum mode. With the SIGCAP things in https://reviews.freebsd.org/D33248, it becomes easy to write a signal handler that intercepts blocked system calls and handles them (I'm running with this applied and doing exactly that), so this can be transparent to any dynamically linked binary. > Pledged applications usually reduce the kernel attack surface a lot, but > you don't run arbitrary programs with pledge (and that wasn't one of its > goals AFAIK).  But that's what I wanted my module to be able to do.  I'd > say it has become a bit of a weird hybrid between a "container" > framework and an exploit mitigation framework at this point.  You can > run a `make buildworld` with it, build/install/run random programs > isolated in your project directories, sandbox shell/desktop sessions as > a whole, etc.  And then within those sandboxes, nested applications can > do their own sandboxing on top of it (with this module (and its > pledge/unveil compat) or Capsicum (and possibly other compat layers > built on top of it)).  The "inner" programs can use more restrictive > sandboxes that don't expose as much kernel functionality.  But for the > "outer" programs the whole thing slides more towards being > "containers"/"jails" (and the more complex it would have been to do > purely in userland I believe). So how do you avoid TOCTOU bugs in your path logic? I don't disagree with the goals, I worry that you're doing something that is intrinsically almost impossible to get right. David