From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 8 12:12:33 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 93DEB4E5 for ; Tue, 8 Apr 2014 12:12:33 +0000 (UTC) Received: from mail-ee0-x22e.google.com (mail-ee0-x22e.google.com [IPv6:2a00:1450:4013:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 281291955 for ; Tue, 8 Apr 2014 12:12:33 +0000 (UTC) Received: by mail-ee0-f46.google.com with SMTP id t10so599689eei.33 for ; Tue, 08 Apr 2014 05:12:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=/OkMlT/JvT+KmuHGsTPLmnKNRJeWjudDAVmjMomemGw=; b=N7jeSwLWlV77q2qJE+rhIz6MrglmtXMKfuHr0+pKiyy8PK+q8quCThVr1MmzCQo3tK 6ujrg9BotVzey4OPz2k2ztKP82fc+BFzdA3qhjKh8pyO8sbjP4H2SzAelaenkg1wjkgS kkWewzXrnZCLBrVpNYc71cJLLsyxS5RERnY06EkxPBo34BLJewE9r+X2tGnmh81tZ6jx VYN7mt+rqr6/y/qPjsw9aPmuRmBo6QD84bjMkpNsFl1a+B7Vuk92OIAOgpV8aGdw8OWU wtFtfLFUIbg+Q+eRBHWF440x/3WrZadlZJUj7SPDn4CObWeKAfil6kOVU832dMm7PccF HEmw== X-Received: by 10.14.184.66 with SMTP id r42mr1850552eem.84.1396959151429; Tue, 08 Apr 2014 05:12:31 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id q49sm4256946eem.34.2014.04.08.05.12.29 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 08 Apr 2014 05:12:29 -0700 (PDT) Date: Tue, 8 Apr 2014 14:12:22 +0200 From: Mateusz Guzik To: Eduardo Morras Subject: Re: pipe() resource exhaustion Message-ID: <20140408121222.GB30326@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Eduardo Morras , freebsd-hackers@freebsd.org References: <20140408130206.e75f3bf6c6df28b6e4839e70@yahoo.es> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20140408130206.e75f3bf6c6df28b6e4839e70@yahoo.es> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Apr 2014 12:12:33 -0000 On Tue, Apr 08, 2014 at 01:02:06PM +0200, Eduardo Morras wrote: > On Mon, 7 Apr 2014 07:25:22 -0500 > Mark Felder wrote: > > > On 2014-04-07 06:02, Ivan Voras wrote: > > > Hello, > > > > > > Last time I mentioned this it didn't get any attention, so I'll try > > > again. By accident (via a buggy synergy server process) I found > > > that a simple userland process can exhaust kernel pipe memory > > > (kern.ipc.pipekva > > > sysctl) which as a consequence has that new processes which use pipe > > > cannot be started, which includes "su", by which an administrator > > > could kill such a process. > > > > > > > That's a pretty painful local denial of service :( > > Yes it is. Perhaps there should be 8% fd reserved for root, su and setuid family syscalls like in filesystem space or postgresql reserved connections for db admin. > There is an fd reserve already. Report talks about problems with creating a new *pipe*, not any fd in general. That said, supporting a reserve for this one sounds like a good idea and not that hard to implement - one can either play with atomics and double check or just place a mutex-protected check in pipespace_new (before vm_map_find). I implemented the second one, which turned out to be surprisingly ugly. For now it abuses PRIV_MAXPROC and has a reserve taken out of the blue. Thoughts? diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c index 6ba52e3..04c4559 100644 --- a/sys/kern/sys_pipe.c +++ b/sys/kern/sys_pipe.c @@ -102,6 +102,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -200,6 +201,7 @@ static struct filterops pipe_wfiltops = { #define MAXPIPESIZE (2*PIPE_SIZE/3) static long amountpipekva; +static struct mtx amountpipekva_mtx; static int pipefragretry; static int pipeallocfail; static int piperesizefail; @@ -256,6 +258,8 @@ pipeinit(void *dummy __unused) KASSERT(pipeino_unr != NULL, ("pipe fake inodes not initialized")); pipedev_ino = devfs_alloc_cdp_inode(); KASSERT(pipedev_ino > 0, ("pipe dev inode not initialized")); + + mtx_init(&amountpipekva_mtx, "pipe kva mutex", NULL, MTX_DEF); } static int @@ -515,24 +519,29 @@ pipespace_new(cpipe, size) KASSERT(!mtx_owned(PIPE_MTX(cpipe)), ("pipespace: pipe mutex locked")); KASSERT(!(cpipe->pipe_state & PIPE_DIRECTW), ("pipespace: resize of direct writes not allowed")); + + mtx_lock(&amountpipekva_mtx); retry: cnt = cpipe->pipe_buffer.cnt; if (cnt > size) size = cnt; size = round_page(size); - buffer = (caddr_t) vm_map_min(pipe_map); - error = vm_map_find(pipe_map, NULL, 0, - (vm_offset_t *) &buffer, size, 0, VMFS_ANY_SPACE, - VM_PROT_ALL, VM_PROT_ALL, 0); - if (error != KERN_SUCCESS) { + if (amountpipekva + size + 10 * PAGE_SIZE >= maxpipekva) { + if (priv_check_cred(curthread->td_ucred, PRIV_MAXPROC, 0) == 0 && + amountpipekva + size <= maxpipekva) + goto alloc; +recalc: if ((cpipe->pipe_buffer.buffer == NULL) && (size > SMALL_PIPE_SIZE)) { size = SMALL_PIPE_SIZE; pipefragretry++; goto retry; } + + mtx_unlock(&amountpipekva_mtx); + if (cpipe->pipe_buffer.buffer == NULL) { pipeallocfail++; if (ppsratecheck(&lastfail, &curfail, 1)) @@ -542,6 +551,20 @@ retry: } return (ENOMEM); } +alloc: + atomic_add_long(&amountpipekva, size); + mtx_unlock(&amountpipekva_mtx); + + buffer = (caddr_t) vm_map_min(pipe_map); + + error = vm_map_find(pipe_map, NULL, 0, + (vm_offset_t *) &buffer, size, 0, VMFS_ANY_SPACE, + VM_PROT_ALL, VM_PROT_ALL, 0); + if (error != KERN_SUCCESS) { + mtx_lock(&amountpipekva_mtx); + atomic_subtract_long(&amountpipekva, size); + goto recalc; + } /* copy data, then free old resources if we're resizing */ if (cnt > 0) { @@ -563,7 +586,6 @@ retry: cpipe->pipe_buffer.in = cnt; cpipe->pipe_buffer.out = 0; cpipe->pipe_buffer.cnt = cnt; - atomic_add_long(&amountpipekva, cpipe->pipe_buffer.size); return (0); }