From owner-freebsd-arch@FreeBSD.ORG Wed Aug 29 04:39:12 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0708F1065686; Wed, 29 Aug 2012 04:39:12 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9F14F8FC14; Wed, 29 Aug 2012 04:39:11 +0000 (UTC) Received: by ialo14 with SMTP id o14so396045ial.13 for ; Tue, 28 Aug 2012 21:39:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=N+jR5z5xlIiHFBXQaQCpiRwml3lAhhhbLl8Fk9SsuOc=; b=O2zLp2Z+iTs3P+tJftisCJMkG95Nv4S2faHRKU06k+pnBbcHPjIIUZ6g2k0y//gxkl HeHGr3jfO3Ao+3GfgC4EJJ1zP0I4Ti/TnFZQw3iGRTTAcGG4zzexWLDFGZkvBD/PwagY vefK8m7jvhX/8ethttMHYKvp85JAFvrMGiMvg+u/hY3raDcZq7lcvQDbl6WJgkBtpBvx dZ1n1QITrhX6f1gHJvdbl2SC9L+h/WoNeL8djrWYEtSjD9sHVkOcGD6GWYbf0SHAZYLr kxkMAM9VtNWoBZFB/Rrj+5+b/o4Of+4AYjQ7z95aDX5RjR1WSv6gdzLWI47n7uIBHCEZ RJMg== MIME-Version: 1.0 Received: by 10.50.47.200 with SMTP id f8mr319047ign.6.1346215151078; Tue, 28 Aug 2012 21:39:11 -0700 (PDT) Received: by 10.64.72.165 with HTTP; Tue, 28 Aug 2012 21:39:11 -0700 (PDT) In-Reply-To: <503D34DB.3090000@FreeBSD.org> References: <503CF3B1.3050604@FreeBSD.org> <503D08D6.1040004@shatow.net> <503D281A.3080107@FreeBSD.org> <503D34DB.3090000@FreeBSD.org> Date: Tue, 28 Aug 2012 23:39:11 -0500 Message-ID: From: Alan Cox To: Andrey Zonov Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Bryan Drewery , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: [patch] unprivileged mlock(2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Aug 2012 04:39:12 -0000 On Tue, Aug 28, 2012 at 4:15 PM, Andrey Zonov wrote: > On 8/29/12 12:20 AM, Andriy Gapon wrote: > > on 28/08/2012 21:07 Bryan Drewery said the following: > >> On 8/28/2012 11:37 AM, Andrey Zonov wrote: > >>> Hi, > >>> > >>> We've got RLIMIT_MEMLOCK for years, but this limit is useless, because > >>> only root may call mlock(2), and root may raise any limits. > >>> > >>> I suggest patch that allows to call mlock(2) for unprivileged users. > >>> Are there any objections to got it in tree? > >>> > >> > >> FYI, see previous recent thread on this here: > >> > >> http://lists.freebsd.org/pipermail/freebsd-arch/2012-May/012552.html > >> and > >> http://lists.freebsd.org/pipermail/freebsd-arch/2012-June/012606.html > > > > Yes, Andrey, I highly suggest that you read those threads completely. > > > > Here are some observations. > > > > It doesn't look like mlockall and mlockall(MCL_FUTURE) in particular > properly > > honor RLIMIT_MEMLOCK. If this is not fixed, then it would be premature > to > > enable the privilege for non-privileged users. > > > > This should be surely fixed, but I don't know how. Any suggestions are > welcome. > > > I am against adding the sysctl knob. If RLIMIT_MEMLOCK limit is properly > > implemented then it is sufficient to effectively deny the privilege (and > with > > much finer granularity). > > > > Until all bugs around this problem will be fixed, to have such sysctl > would be nice, and even keep it turned off to not change default > behavior (not like in patch). > > > When the privilege is allowed to ordinary users, then memorylocked in the > > default login.conf would need to be set to something much lower than the > current > > 'unlimited' :-) > > > > It's not a problem to set it to a new reasonable value in the tree, but > it will be a problem to fix this everywhere. > > > Also, note that currently RLIMIT_MEMLOCK is abused at least in vslock() > (used at > > least by sysctl's kernel side). The temporary wirings performed as an > > implementation detail or side-effect should not be accounted against the > limit. > > The limit is for wirings that a user makes and controls explicitly. It > should > > not be applied to something that kernel does behind the scenes without > user's > > knowledge. > > > > I was surprised when I stepped on this few years ago on machine with > thousands processes. top(8) ate 100% CPU in a forever loop, ps(1) > didn't work, and that is because memorylocked limit was set to low. > Later I submitted two patches which fixed kvm (r230873) and sockstat > (r230874), but I totally agree with you here, we shouldn't check for > limits in vslock(). > > I agree with Andriy's argument for making the following change. Please go ahead and commit it. Alan Index: sys/vm/vm_glue.c > =================================================================== > --- sys/vm/vm_glue.c (revision 239772) > +++ sys/vm/vm_glue.c (working copy) > @@ -183,7 +183,6 @@ int > vslock(void *addr, size_t len) > { > vm_offset_t end, last, start; > - unsigned long nsize; > vm_size_t npages; > int error; > > @@ -195,18 +194,6 @@ vslock(void *addr, size_t len) > npages = atop(end - start); > if (npages > vm_page_max_wired) > return (ENOMEM); > - PROC_LOCK(curproc); > - nsize = ptoa(npages + > - pmap_wired_count(vm_map_pmap(&curproc->p_vmspace->vm_map))); > - if (nsize > lim_cur(curproc, RLIMIT_MEMLOCK)) { > - PROC_UNLOCK(curproc); > - return (ENOMEM); > - } > - if (racct_set(curproc, RACCT_MEMLOCK, nsize)) { > - PROC_UNLOCK(curproc); > - return (ENOMEM); > - } > - PROC_UNLOCK(curproc); > #if 0 > /* > * XXX - not yet > @@ -222,14 +209,6 @@ vslock(void *addr, size_t len) > #endif > error = vm_map_wire(&curproc->p_vmspace->vm_map, start, end, > VM_MAP_WIRE_SYSTEM | VM_MAP_WIRE_NOHOLES); > -#ifdef RACCT > - if (error != KERN_SUCCESS) { > - PROC_LOCK(curproc); > - racct_set(curproc, RACCT_MEMLOCK, > - > ptoa(pmap_wired_count(vm_map_pmap(&curproc->p_vmspace->vm_map)))); > - PROC_UNLOCK(curproc); > - } > -#endif > /* > * Return EFAULT on error to match copy{in,out}() behaviour > * rather than returning ENOMEM like mlock() would. > > -- > Andrey Zonov > >