Date: Sun, 21 Jul 2013 23:50:38 +0200 From: Jeremie Le Hen <jlh@FreeBSD.org> To: Alan Cox <alc@rice.edu> Cc: Jeremie Le Hen <jlh@FreeBSD.org>, Alan Cox <alc@FreeBSD.org>, freebsd-current@FreeBSD.org, trasz@FreeBSD.org, Konstantin Belousov <kib@freebsd.org> Subject: Re: Fix for sys_munlock(2) with racct Message-ID: <20130721215038.GE13628@caravan.chchile.org> In-Reply-To: <115EDC04-4B95-4FB5-9092-515188D8ADA7@rice.edu> References: <20130720112218.GD13628@caravan.chchile.org> <115EDC04-4B95-4FB5-9092-515188D8ADA7@rice.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 20, 2013 at 08:33:35PM -0700, Alan Cox wrote: > > On Jul 20, 2013, at 4:22 AM, Jeremie Le Hen wrote: > > > Hi Edward, Alan, > > > > I plan to commit the following patch: > > http://people.freebsd.org/~jlh/racct_munlock.diff > > > > This solves the following panic: > > > > panic: racct_sub: freeing 301989888 of resource 5, which is more than allocated 73728 for pwsafe (pid 4177) > > > > The problem is that the racct code in sys_munlock() trusts too much the > > user's input. vm_map_unwire_count() now returns how much memory has > > really been unwired. > > > > Any objection? > > > Can you elaborate on what you mean by "sys_munlock() trusts too much > the user's input." munlock(2) is supposed to return ENOMEM if any > addresses within the specified range are not backed by valid mappings. > (Valid mappings with PROT_NONE access are something of a gray area > here.) However, it is not error for a to call munlock() on a range > that isn't locked, or to call it a second, third, etc. time on the > same range. Is that what is provoking your panic? I'm using sysutils/pwsafe and it seems to mlock(2) 18 pages (end - start = 73728 bytes) but then munlock(2) 73728 pages :-). vm_map_unwire() seems to do the right thing with those buggy boundaries, but then the racct code assumes that those boundaries were correct as long as vm_map_unwire() returned successfully. This is what causes the assertion failure. > By the way, sys_mlock() uses a simpler approach to deal with a similar > situation: > > error = vm_map_wire(map, start, end, > VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES); > #ifdef RACCT > if (error != KERN_SUCCESS) { > PROC_LOCK(proc); > racct_set(proc, RACCT_MEMLOCK, > ptoa(pmap_wired_count(map->pmap))); > PROC_UNLOCK(proc); > } > #endif > > However, the code in sys_mlock() for maintaining RACCT_MEMLOCK, > including the above snippet, is race-y. Two simultaneous callers to > sys_mlock() will produce incorrect results. (I haven't looked at > sys_mlockall() or vm_map_growstack().) Ok, I will commit that as the first bandaid then. > Also, a wired mapping can be destroyed by calling munmap(2) without > first calling munlock(2), in which case, RACCT_MEMLOCK will be > incorrect. So I think the right way to tackle this is to handle racct in the vm layer rather than at the syscall layer. vm_fault_{wire,unwire}() look like natural places for this, but racct functions require the struct proc to be locked. Any idea how to handle this? -- Jeremie Le Hen Scientists say the world is made up of Protons, Neutrons and Electrons. They forgot to mention Morons.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130721215038.GE13628>