From owner-freebsd-hackers@FreeBSD.ORG Mon Feb 7 15:11:05 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6538106564A; Mon, 7 Feb 2011 15:11:05 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 526698FC19; Mon, 7 Feb 2011 15:11:05 +0000 (UTC) Received: by qwj9 with SMTP id 9so3475223qwj.13 for ; Mon, 07 Feb 2011 07:11:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=XryZCvgso1IzruhvkTCma8LWRtTaGTsguDVPkpyTxhY=; b=pmWbrmZIQFqbbrt+wryRjUZCUIDhsAWxPGVqOHSt9/92nOymAqcjkvZ8mQNlYO9ZBM l0YR9k62XxLM0xICAxvVYEmoYQJ8LUDGeC6RaPqLFAvRedRaYZ6hEkE1rlhEgfFtbnxw 1R7YVqvIUuHYj0MO4OIEani9FOUPRT37K4oQ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=nJIaozY8Ub2c6PtofQ/QCxI93LA6MLWwgdDXJAIA0gbAsppQz8uGKX4oZSDe6VgrTz PU68jsWRJC3C4DAcrabxvxtX3DLNehMWTXPGURWjr3DuIBv2GbBi1OcTBxv0CTxkj5MU Lx6TsdbjR4DskRrvI7OXhYDK8DH7bHDjJdhTY= MIME-Version: 1.0 Received: by 10.229.240.66 with SMTP id kz2mr11185202qcb.233.1297091463554; Mon, 07 Feb 2011 07:11:03 -0800 (PST) Received: by 10.229.102.87 with HTTP; Mon, 7 Feb 2011 07:11:03 -0800 (PST) In-Reply-To: References: <201101211244.13830.jhb@freebsd.org> Date: Mon, 7 Feb 2011 18:11:03 +0300 Message-ID: From: Sergey Kandaurov To: alc@freebsd.org Content-Type: multipart/mixed; boundary=0016363b8848ed2be3049bb2a3d8 Cc: freebsd-hackers@freebsd.org, Konstantin Belousov Subject: Re: [rfc] allow to boot with >= 256GB physmem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Feb 2011 15:11:05 -0000 --0016363b8848ed2be3049bb2a3d8 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 22 January 2011 00:43, Alan Cox wrote: > On Fri, Jan 21, 2011 at 2:58 PM, Alan Cox wrote: >> >> On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin wrote: >>> >>> On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote: >>> > Hello. >>> > >>> > Some time ago I faced with a problem booting with 400GB physmem. >>> > The problem is that vm.max_proc_mmap type overflows with >>> > such high value, and that results in a broken mmap() syscall. >>> > The max_proc_mmap value is a signed int and roughly calculated >>> > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient: >>> > vm_kmem_size / sizeof(struct vm_map_entry) / 100. >>> > >>> > Although at the time it was introduced at svn r57263 the value >>> > was quite low (f.e. the related commit log stands: >>> > "The value defaults to around 9000 for a 128MB machine."), >>> > the problem is observed on amd64 where KVA space after >>> > r212784 is factually bound to the only physical memory size. >>> > >>> > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry) >>> > is 120, it's enough to have sligthly less than 256GB to be able >>> > to reproduce the problem. >>> > >>> > I rewrote vmmapentry_rsrc_init() to set large enough limit for >>> > max_proc_mmap just to protect from integer type overflow. >>> > As it's also possible to live tune this value, I also added a >>> > simple anti-shoot constraint to its sysctl handler. >>> > I'm not sure though if it's worth to commit the second part. >>> > >>> > As this patch may cause some bikeshedding, >>> > I'd like to hear your comments before I will commit it. >>> > >>> > http://plukky.net/~pluknet/patches/max_proc_mmap.diff >>> >>> Is there any reason we can't just make this variable and sysctl a long? >>> >> >> Or just delete it. >> >> 1. Contrary to what the commit message says, this sysctl does not >> effectively limit the number of vm map entries.=A0 It only limits the nu= mber >> that are created by one system call, mmap().=A0 Other system calls creat= e vm >> map entries just as easily, for example, mprotect(), madvise(), mlock(),= and >> minherit().=A0 Basically, anything that alters the properties of a mappi= ng. >> Thus, in 2000, after this sysctl was added, the same resource exhaustion >> induced crash could have been reproduced by trivially changing the progr= am >> in PR/16573 to do an mprotect() or two. >> >> In a nutshell, if you want to really limit the number of vm map entries >> that a process can allocate, the implementation is a bit more involved t= han >> what was done for this sysctl. >> >> 2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did >> not.=A0 Moreover, vm map entries for user maps are allocated with M_WAIT= OK. >> So, the exact crash reported in PR/16573 couldn't happen any longer. >> > > Actually, I take back part of what I said here.=A0 The old zone allocator= did > implement something like M_WAITOK, and that appears to have been used for > user maps.=A0 However, the crash described in PR/16573 was actually on th= e > allocation of a vm map entry within the *kernel* address space for a proc= ess > U area.=A0 This type of allocation did not use the old zone allocator's > equivalent to M_WAITOK.=A0 However, we no longer have U areas, so the exa= ct > crash scenario is clearly no longer possible.=A0 Interestingly, the sysct= l in > question has no direct effect on the allocation of kernel vm map entries. > > So, I remain skeptical that this sysctl is preventing any resource > exhaustion based panics in the current kernel.=A0 Again, I would be thril= led > to see one or more people do some testing, such as rerunning the program > from PR/16573. > > >> 3. We now have the "vmemoryuse" resource limit.=A0 When this sysctl was >> defined, we didn't.=A0 Limiting the virtual memory indirectly but effect= ively >> limits the number of vm map entries that a process can allocate. >> >> In summary, I would do a little due diligence, for example, run the >> program from PR/16573 with the limit disabled.=A0 If you can't reproduce= the >> crash, in other words, nothing contradicts point #2 above, then I would = just >> delete this sysctl. >> I tried the test from PR/16573 running as root. If unmodified it just quick= ly bounds on kern.maxproc limit. So, I added signal(SIGCHLD, SIG_IGN); to not create zombie processes at all to give it more workload. With this change i= t also survived. Submitter reported that it crashes with 10000 iterations. After increasing the limit up to 1000000 I still couldn't get it to crash. * The testing was done with commented out max_proc_mmap part. The change effectively reverts r57263. --=20 wbr, pluknet --0016363b8848ed2be3049bb2a3d8 Content-Type: application/octet-stream; name="vm_mmap_maxprocmmap.diff" Content-Disposition: attachment; filename="vm_mmap_maxprocmmap.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gjviptnp0 SW5kZXg6IC9zeXMvdm0vdm1fbW1hcC5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIC9zeXMvdm0vdm1fbW1hcC5j CShyZXZpc2lvbiAyMTgwMjYpCisrKyAvc3lzL3ZtL3ZtX21tYXAuYwkod29ya2luZyBjb3B5KQpA QCAtNDgsNyArNDgsNiBAQAogCiAjaW5jbHVkZSA8c3lzL3BhcmFtLmg+CiAjaW5jbHVkZSA8c3lz L3N5c3RtLmg+Ci0jaW5jbHVkZSA8c3lzL2tlcm5lbC5oPgogI2luY2x1ZGUgPHN5cy9sb2NrLmg+ CiAjaW5jbHVkZSA8c3lzL211dGV4Lmg+CiAjaW5jbHVkZSA8c3lzL3N5c3Byb3RvLmg+CkBAIC02 Niw3ICs2NSw2IEBACiAjaW5jbHVkZSA8c3lzL3N0YXQuaD4KICNpbmNsdWRlIDxzeXMvc3lzZW50 Lmg+CiAjaW5jbHVkZSA8c3lzL3ZtbWV0ZXIuaD4KLSNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAK ICNpbmNsdWRlIDxzZWN1cml0eS9tYWMvbWFjX2ZyYW1ld29yay5oPgogCkBAIC04MCw3ICs3OCw2 IEBACiAjaW5jbHVkZSA8dm0vdm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX2V4dGVybi5o PgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9rZXJuLmg+CiAKICNp ZmRlZiBIV1BNQ19IT09LUwogI2luY2x1ZGUgPHN5cy9wbWNrZXJuLmg+CkBAIC05MiwzMCArODks NiBAQAogfTsKICNlbmRpZgogCi1zdGF0aWMgaW50IG1heF9wcm9jX21tYXA7Ci1TWVNDVExfSU5U KF92bSwgT0lEX0FVVE8sIG1heF9wcm9jX21tYXAsIENUTEZMQUdfUlcsICZtYXhfcHJvY19tbWFw LCAwLAotICAgICJNYXhpbXVtIG51bWJlciBvZiBtZW1vcnktbWFwcGVkIGZpbGVzIHBlciBwcm9j ZXNzIik7Ci0KLS8qCi0gKiBTZXQgdGhlIG1heGltdW0gbnVtYmVyIG9mIHZtX21hcF9lbnRyeSBz dHJ1Y3R1cmVzIHBlciBwcm9jZXNzLiAgUm91Z2hseQotICogc3BlYWtpbmcgdm1fbWFwX2VudHJ5 IHN0cnVjdHVyZXMgYXJlIHRpbnksIHNvIGFsbG93aW5nIHRoZW0gdG8gZWF0IDEvMTAwCi0gKiBv ZiBvdXIgS1ZNIG1hbGxvYyBzcGFjZSBzdGlsbCByZXN1bHRzIGluIGdlbmVyb3VzIGxpbWl0cy4g IFdlIHdhbnQgYQotICogZGVmYXVsdCB0aGF0IGlzIGdvb2QgZW5vdWdoIHRvIHByZXZlbnQgdGhl IGtlcm5lbCBydW5uaW5nIG91dCBvZiByZXNvdXJjZXMKLSAqIGlmIGF0dGFja2VkIGZyb20gY29t cHJvbWlzZWQgdXNlciBhY2NvdW50IGJ1dCBnZW5lcm91cyBlbm91Z2ggc3VjaCB0aGF0Ci0gKiBt dWx0aS10aHJlYWRlZCBwcm9jZXNzZXMgYXJlIG5vdCB1bmR1bHkgaW5jb252ZW5pZW5jZWQuCi0g Ki8KLXN0YXRpYyB2b2lkIHZtbWFwZW50cnlfcnNyY19pbml0KHZvaWQgKik7Ci1TWVNJTklUKHZt bWVyc3JjLCBTSV9TVUJfS1ZNX1JTUkMsIFNJX09SREVSX0ZJUlNULCB2bW1hcGVudHJ5X3JzcmNf aW5pdCwKLSAgICBOVUxMKTsKLQotc3RhdGljIHZvaWQKLXZtbWFwZW50cnlfcnNyY19pbml0KGR1 bW15KQotICAgICAgICB2b2lkICpkdW1teTsKLXsKLSAgICBtYXhfcHJvY19tbWFwID0gdm1fa21l bV9zaXplIC8gc2l6ZW9mKHN0cnVjdCB2bV9tYXBfZW50cnkpOwotICAgIG1heF9wcm9jX21tYXAg Lz0gMTAwOwotfQotCiBzdGF0aWMgaW50IHZtX21tYXBfdm5vZGUoc3RydWN0IHRocmVhZCAqLCB2 bV9zaXplX3QsIHZtX3Byb3RfdCwgdm1fcHJvdF90ICosCiAgICAgaW50ICosIHN0cnVjdCB2bm9k ZSAqLCB2bV9vb2Zmc2V0X3QgKiwgdm1fb2JqZWN0X3QgKik7CiBzdGF0aWMgaW50IHZtX21tYXBf Y2RldihzdHJ1Y3QgdGhyZWFkICosIHZtX3NpemVfdCwgdm1fcHJvdF90LCB2bV9wcm90X3QgKiwK QEAgLTM3NywxOCArMzUwLDYgQEAKIAkJaGFuZGxlX3R5cGUgPSBPQkpUX1ZOT0RFOwogCX0KIG1h cDoKLQotCS8qCi0JICogRG8gbm90IGFsbG93IG1vcmUgdGhlbiBhIGNlcnRhaW4gbnVtYmVyIG9m IHZtX21hcF9lbnRyeSBzdHJ1Y3R1cmVzCi0JICogcGVyIHByb2Nlc3MuICBTY2FsZSB3aXRoIHRo ZSBudW1iZXIgb2YgcmZvcmtzIHNoYXJpbmcgdGhlIG1hcAotCSAqIHRvIG1ha2UgdGhlIGxpbWl0 IHJlYXNvbmFibGUgZm9yIHRocmVhZHMuCi0JICovCi0JaWYgKG1heF9wcm9jX21tYXAgJiYKLQkg ICAgdm1zLT52bV9tYXAubmVudHJpZXMgPj0gbWF4X3Byb2NfbW1hcCAqIHZtcy0+dm1fcmVmY250 KSB7Ci0JCWVycm9yID0gRU5PTUVNOwotCQlnb3RvIGRvbmU7Ci0JfQotCiAJdGQtPnRkX2Zwb3Ag PSBmcDsKIAllcnJvciA9IHZtX21tYXAoJnZtcy0+dm1fbWFwLCAmYWRkciwgc2l6ZSwgcHJvdCwg bWF4cHJvdCwKIAkgICAgZmxhZ3MsIGhhbmRsZV90eXBlLCBoYW5kbGUsIHBvcyk7Cg== --0016363b8848ed2be3049bb2a3d8--