Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jan 2011 14:58:30 -0600
From:      Alan Cox <alan.l.cox@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Sergey Kandaurov <pluknet@gmail.com>
Subject:   Re: [rfc] allow to boot with >= 256GB physmem
Message-ID:  <AANLkTinWBkd7BuO40DhuRNgKx=5dyEUP9wMesMV_zx2J@mail.gmail.com>
In-Reply-To: <201101211244.13830.jhb@freebsd.org>
References:  <AANLkTikt5=2L0rHyGbsjvG8eV6Ve4JkRM_pcyNiAsPu8@mail.gmail.com> <201101211244.13830.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin <jhb@freebsd.org> wrote:

> On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote:
> > Hello.
> >
> > Some time ago I faced with a problem booting with 400GB physmem.
> > The problem is that vm.max_proc_mmap type overflows with
> > such high value, and that results in a broken mmap() syscall.
> > The max_proc_mmap value is a signed int and roughly calculated
> > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient:
> > vm_kmem_size / sizeof(struct vm_map_entry) / 100.
> >
> > Although at the time it was introduced at svn r57263 the value
> > was quite low (f.e. the related commit log stands:
> > "The value defaults to around 9000 for a 128MB machine."),
> > the problem is observed on amd64 where KVA space after
> > r212784 is factually bound to the only physical memory size.
> >
> > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry)
> > is 120, it's enough to have sligthly less than 256GB to be able
> > to reproduce the problem.
> >
> > I rewrote vmmapentry_rsrc_init() to set large enough limit for
> > max_proc_mmap just to protect from integer type overflow.
> > As it's also possible to live tune this value, I also added a
> > simple anti-shoot constraint to its sysctl handler.
> > I'm not sure though if it's worth to commit the second part.
> >
> > As this patch may cause some bikeshedding,
> > I'd like to hear your comments before I will commit it.
> >
> > http://plukky.net/~pluknet/patches/max_proc_mmap.diff<http://plukky.net/%7Epluknet/patches/max_proc_mmap.diff>;
>
> Is there any reason we can't just make this variable and sysctl a long?
>
>
Or just delete it.

1. Contrary to what the commit message says, this sysctl does not
effectively limit the number of vm map entries.  It only limits the number
that are created by one system call, mmap().  Other system calls create vm
map entries just as easily, for example, mprotect(), madvise(), mlock(), and
minherit().  Basically, anything that alters the properties of a mapping.
Thus, in 2000, after this sysctl was added, the same resource exhaustion
induced crash could have been reproduced by trivially changing the program
in PR/16573 to do an mprotect() or two.

In a nutshell, if you want to really limit the number of vm map entries that
a process can allocate, the implementation is a bit more involved than what
was done for this sysctl.

2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did not.
Moreover, vm map entries for user maps are allocated with M_WAITOK.  So, the
exact crash reported in PR/16573 couldn't happen any longer.

3. We now have the "vmemoryuse" resource limit.  When this sysctl was
defined, we didn't.  Limiting the virtual memory indirectly but effectively
limits the number of vm map entries that a process can allocate.

In summary, I would do a little due diligence, for example, run the program
from PR/16573 with the limit disabled.  If you can't reproduce the crash, in
other words, nothing contradicts point #2 above, then I would just delete
this sysctl.

Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinWBkd7BuO40DhuRNgKx=5dyEUP9wMesMV_zx2J>