Date: Tue, 30 Jul 2013 10:40:45 -0700 From: Alan Cox <alc@rice.edu> To: Jeremie Le Hen <jlh@FreeBSD.org> Cc: Alan Cox <alc@FreeBSD.org>, Konstantin Belousov <kib@freebsd.org>, freebsd-current@FreeBSD.org, trasz@FreeBSD.org Subject: Re: Fix for sys_munlock(2) with racct Message-ID: <EDF1EFE1-E9A9-44CF-8007-A642971572BE@rice.edu> In-Reply-To: <20130721215038.GE13628@caravan.chchile.org> References: <20130720112218.GD13628@caravan.chchile.org> <115EDC04-4B95-4FB5-9092-515188D8ADA7@rice.edu> <20130721215038.GE13628@caravan.chchile.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jul 21, 2013, at 2:50 PM, Jeremie Le Hen wrote: > On Sat, Jul 20, 2013 at 08:33:35PM -0700, Alan Cox wrote: >>=20 >> On Jul 20, 2013, at 4:22 AM, Jeremie Le Hen wrote: >>=20 >>> Hi Edward, Alan, >>>=20 >>> I plan to commit the following patch: >>> http://people.freebsd.org/~jlh/racct_munlock.diff >>>=20 >>> This solves the following panic: >>>=20 >>> panic: racct_sub: freeing 301989888 of resource 5, which is more = than allocated 73728 for pwsafe (pid 4177) >>>=20 >>> The problem is that the racct code in sys_munlock() trusts too much = the >>> user's input. vm_map_unwire_count() now returns how much memory has >>> really been unwired. >>>=20 >>> Any objection? >>=20 >>=20 >> Can you elaborate on what you mean by "sys_munlock() trusts too much >> the user's input." munlock(2) is supposed to return ENOMEM if any >> addresses within the specified range are not backed by valid = mappings. >> (Valid mappings with PROT_NONE access are something of a gray area >> here.) However, it is not error for a to call munlock() on a range >> that isn't locked, or to call it a second, third, etc. time on the >> same range. Is that what is provoking your panic? >=20 > I'm using sysutils/pwsafe and it seems to mlock(2) 18 pages (end - = start > =3D 73728 bytes) but then munlock(2) 73728 pages :-). vm_map_unwire() > seems to do the right thing with those buggy boundaries, but then the > racct code assumes that those boundaries were correct as long as > vm_map_unwire() returned successfully. This is what causes the > assertion failure. >=20 It sounds like the application is calling munlock(2) on valid but = unwired memory. That's not considered an error under the various Unix = standards that specify the behavior of munlock(2). It's sloppy = programming but not buggy programming.=20 >=20 >> By the way, sys_mlock() uses a simpler approach to deal with a = similar >> situation: >>=20 >> error =3D vm_map_wire(map, start, end,=20 >> VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES); >> #ifdef RACCT >> if (error !=3D KERN_SUCCESS) { >> PROC_LOCK(proc); >> racct_set(proc, RACCT_MEMLOCK, >> ptoa(pmap_wired_count(map->pmap))); >> PROC_UNLOCK(proc); >> } >> #endif >>=20 >> However, the code in sys_mlock() for maintaining RACCT_MEMLOCK, >> including the above snippet, is race-y. Two simultaneous callers to >> sys_mlock() will produce incorrect results. (I haven't looked at >> sys_mlockall() or vm_map_growstack().) >=20 > Ok, I will commit that as the first bandaid then. >=20 >=20 >> Also, a wired mapping can be destroyed by calling munmap(2) without >> first calling munlock(2), in which case, RACCT_MEMLOCK will be >> incorrect. >=20 > So I think the right way to tackle this is to handle racct in the vm > layer rather than at the syscall layer. >=20 The VM system already maintains counters equivalent to RACCT_VMEM and = RACCT_MEMLOCK. They are "map->size" and "pmap_wired_count(map->pmap)". = Instead of maintaining duplicate counters, could the resource accounting = framework be extended to support callbacks to obtain a value when it's = actually needed? > vm_fault_{wire,unwire}() look like natural places for this, but racct > functions require the struct proc to be locked. Any idea how to = handle > this? >=20 > --=20 > Jeremie Le Hen >=20 > Scientists say the world is made up of Protons, Neutrons and = Electrons. > They forgot to mention Morons. >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EDF1EFE1-E9A9-44CF-8007-A642971572BE>