Date: Fri, 21 Nov 2014 11:22:18 +0000 From: "Robert N. M. Watson" <rwatson@FreeBSD.org> To: Marko Zec <zec@fer.hr> Cc: Craig Rodrigues <rodrigc@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>, "Bjoern A. Zeeb" <bz@FreeBSD.org> Subject: Re: VIMAGE UDP memory leak fix Message-ID: <597BD146-88B1-47E6-A373-7004CFF8AEBA@FreeBSD.org> In-Reply-To: <20141121120201.6c77ea5b@x23> References: <CAG=rPVehky00X4MuQQ-_Oe5ezWg52ZZrPASAh9GBy7baYv78CA@mail.gmail.com> <20141121002937.4f82daea@x23> <A4D676B3-6C50-47F7-8CFD-50B44FF4BE98@FreeBSD.org> <9300CB5F-6140-4C49-B026-EB69B0E8B37E@FreeBSD.org> <20141121120201.6c77ea5b@x23>
next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Nov 2014, at 11:02, Marko Zec <zec@fer.hr> wrote: > Now that we've found ourselves in this discussion, I'm really > becoming curious why exactly do we need UMA_ZONE_NOFREE for network > stack zones at all? Admittedly, I always thought that the primary > purpose of UMA_ZONE_NOFREE was to prevent uma_reclaim() from paging = out > _used_ zone pages, but reviewing the uma code reveals that this might > not be the case, i.e. that NOFREE only prevents _unused_ pages to be > freed by uma_reclaim(). >=20 > Moreover, all uma_zalloc() calls as far as I can see are flagged as > M_NOWAIT and are followed by checks for allocation failures, so that > part seems to be covered. >=20 > So, what's really the problem which UMA_ZONE_NOFREE flagging is = supposed > to solve these days? (you claim that we clearly need it for TCP - = why)? UMA_ZONE_NOFREE tells UMA that it can't reclaim unused slabs for the = zone to be returned to the VM system for reuse elsewhere under memory = pressure. UMA memory isn't pageable, so there's no link to paging = policy: although soft-TLB systems might experience TLB miss exceptions = on UMA-allocated kernel memory, you should never experience a page fault = against it (in absence of a bug). Reclaim of unused slabs can happen, = for example, if VM discovers it is low on free pages, in which case it = notifies various kernel subsystems that it is feeling a bit cramped -- = that same mechanism that, for example, triggers TCP to throw away = reassembly buffers that haven't yet been ACK'd (although might have been = SACK'd). You might expect this to happen in situations where first a = large load spike happens for a particular UMA type (e.g., a DDoS opens = lots of TCP connections), and then they are freed, leading to lots of = socket/incpb slabs lying around unused, which eventually VM will ask be = returned. It is highly desirable for UMA_ZONE_NOFREE to be removed from = zones wherever possible so that memory can be returned under such = circumstances, and it is not a good feature that the flag is present = anywhere. Subsystems pick up a dependence on UMA_ZONE_NOFREE if freed objects = might be referenced after free. My understanding is that this is pretty = old inherited behaviour from prior kernel memory allocators that didn't = know how to return memory to VM. Given that property, it was safe to = write code that might, for the purposes of efficiency, assume that it = could walk data structures of the type with fewer synchronisation = overheads -- or where synchronisation isn't possible (e.g., for direct = access to kernel memory via /dev/kmem). We have been attempting to = expunge those assumptions wherever possible -- these days, netstat uses = sysctl()s that acquire references to all live inpcbs keeping them valid = while they are copied out (you can't hold low-level locks during = copyout() as sysctl might encounter a paging event writing to user = memory). Convincing yourself that all such assumptions have been removed = is a moderate amount of work, and if you get it wrong, you get = use-after-free races that occur only in low-memory conditions, which are = quite hard to figure out (read: almost impossible). Bjoern can say more about what motivated his specific comment -- I had = hoped that we'd quietly lost dependence on NOFREE over the last decade = and could finally garbage collect it, but perhaps not! Robert=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?597BD146-88B1-47E6-A373-7004CFF8AEBA>