From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 21 20:58:32 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 527D9106566C; Fri, 21 Jan 2011 20:58:32 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id A949F8FC0C; Fri, 21 Jan 2011 20:58:31 +0000 (UTC) Received: by fxm16 with SMTP id 16so2324591fxm.13 for ; Fri, 21 Jan 2011 12:58:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=KBRU4nP67LU98Uizl2aLjhFGrgP4kOUsThpt1rNXP94=; b=PVRnHaf1SwjvnLsDqNZziskoiW71Ty/tjr4IhOlJdfH/6LYhT0pPZnHlxI8wnL6ExH pXymnLHI8OkD9SsWSn24CeRXqpCc4fNiQMMN+eI9zpy9/FQ7JqDEJLrLr9UpQZHkiE+l MuEYpCD5tLvuGBjs3LKEaVJXsokUfcNGdcTx8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=jSE0ex5xgQtZiVEvf8r34Q61pFWd4+Von+BX8onpgT9G/kaKDmDI2E58q54CYz2D2m NNFSsbFxSRIsiHr9+c06Uq3GXZvXzygzUUBaAjH0GhF3mypX+mqIIxOoeoZJWkSk1EIS eU/o4Nc0wHmyG+qy8AVOOdmMfa4AU7n8mgbvE= MIME-Version: 1.0 Received: by 10.223.120.193 with SMTP id e1mr1113184far.106.1295643510427; Fri, 21 Jan 2011 12:58:30 -0800 (PST) Received: by 10.223.126.207 with HTTP; Fri, 21 Jan 2011 12:58:30 -0800 (PST) In-Reply-To: <201101211244.13830.jhb@freebsd.org> References: <201101211244.13830.jhb@freebsd.org> Date: Fri, 21 Jan 2011 14:58:30 -0600 Message-ID: From: Alan Cox To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org, Sergey Kandaurov Subject: Re: [rfc] allow to boot with >= 256GB physmem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jan 2011 20:58:32 -0000 On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin wrote: > On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote: > > Hello. > > > > Some time ago I faced with a problem booting with 400GB physmem. > > The problem is that vm.max_proc_mmap type overflows with > > such high value, and that results in a broken mmap() syscall. > > The max_proc_mmap value is a signed int and roughly calculated > > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient: > > vm_kmem_size / sizeof(struct vm_map_entry) / 100. > > > > Although at the time it was introduced at svn r57263 the value > > was quite low (f.e. the related commit log stands: > > "The value defaults to around 9000 for a 128MB machine."), > > the problem is observed on amd64 where KVA space after > > r212784 is factually bound to the only physical memory size. > > > > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry) > > is 120, it's enough to have sligthly less than 256GB to be able > > to reproduce the problem. > > > > I rewrote vmmapentry_rsrc_init() to set large enough limit for > > max_proc_mmap just to protect from integer type overflow. > > As it's also possible to live tune this value, I also added a > > simple anti-shoot constraint to its sysctl handler. > > I'm not sure though if it's worth to commit the second part. > > > > As this patch may cause some bikeshedding, > > I'd like to hear your comments before I will commit it. > > > > http://plukky.net/~pluknet/patches/max_proc_mmap.diff > > Is there any reason we can't just make this variable and sysctl a long? > > Or just delete it. 1. Contrary to what the commit message says, this sysctl does not effectively limit the number of vm map entries. It only limits the number that are created by one system call, mmap(). Other system calls create vm map entries just as easily, for example, mprotect(), madvise(), mlock(), and minherit(). Basically, anything that alters the properties of a mapping. Thus, in 2000, after this sysctl was added, the same resource exhaustion induced crash could have been reproduced by trivially changing the program in PR/16573 to do an mprotect() or two. In a nutshell, if you want to really limit the number of vm map entries that a process can allocate, the implementation is a bit more involved than what was done for this sysctl. 2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did not. Moreover, vm map entries for user maps are allocated with M_WAITOK. So, the exact crash reported in PR/16573 couldn't happen any longer. 3. We now have the "vmemoryuse" resource limit. When this sysctl was defined, we didn't. Limiting the virtual memory indirectly but effectively limits the number of vm map entries that a process can allocate. In summary, I would do a little due diligence, for example, run the program from PR/16573 with the limit disabled. If you can't reproduce the crash, in other words, nothing contradicts point #2 above, then I would just delete this sysctl. Alan