Date: Thu, 5 Mar 2020 01:39:06 +0200 From: Konstantin Belousov <kib@freebsd.org> To: Keno Fischer <keno@juliacomputing.com> Cc: freebsd-hackers@freebsd.org, Elliot Saba <elliot.saba@juliacomputing.com> Subject: Re: FreeBSD Pipe behavior in pipe OOM situations Message-ID: <20200304233906.GB98340@kib.kiev.ua> In-Reply-To: <CABV8kRy2Uu6fZwQR37135LvgUCxYFd6eiNt4NMQLg_jpHq42Lg@mail.gmail.com> References: <CABV8kRy2Uu6fZwQR37135LvgUCxYFd6eiNt4NMQLg_jpHq42Lg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 04, 2020 at 04:42:56PM -0500, Keno Fischer wrote: > Greetings, > > I am debugging intermittent failures we see on the CI system for the Julia > programming language on FreeBSD, but not elsewhere. The Julia ticket > for this issue can be found at > https://github.com/JuliaLang/julia/issues/23143. > > The symptom is an ENOMEM error on a write to a pipe, > together with the following message in dmesg: > > kern.ipc.maxpipekva exceeded; see tuning(7) > > Now, as far as I understand it, what's happening here is that FreeBSD has a > hard limit on the amount of kernel memory that can be used for pipe buffers, > which we are exceeding by creating too many pipes (not entirely surprising, > our test suites spawns many processes and uses lots of pipes). > > I understand that we can likely work around this issue by increasing the > referenced sysctl. However, I am a bit puzzled by the ENOMEM behavior. > I don't have very much experience with the FreeBSD kernel, but from my > experience from working on other operating systems, > I would have expected that either: > > 1) Some minimal buffer is allocated anyway and exempt from such > pipe-specific memory limits (e.g. a few bytes of the pipe struct), or, No. > 2) The writing process is blocked until pipe buffer space becomes available > (e.g. by a different pipe draining and freeing up space), or, Yes, but only as the space inside the allocated buffer, i.e. the bytes that are consumed by reader, not as a space that is provided for buffer. > 3) The writing process is blocked until a reader comes along, at which point > the write is performed directly without intermediate kernel buffer. First, there is a requirement that an atomic write size exists, i.e. writes less than SC_PIPE_BUF are guaranteed to not interleave if succeeded. Our PIPE_BUF is 512 bytes. We pre-allocate some buffers on the pipe creation, and then might adjust it at start of the write. The buffers initially consume only kernel virtual address space (KVA). Physical memory is instantiated when touched and can be swapped out (this is somewhat simplified, but details are not important). The atomicity requirement means that we must not allocate less than PIPE_BUF, but since we are using VM interfaces, we make the lowest limit 4K (actually page size). When there is enough space, we might go to up to 64K per pipe, but retract down when pipe KVA is filled. The KVA used for pipe buffers is shared by all pipes in system among all users. Currently allocation of pipe buffers does not wait for space, if there is no space it fails with ENOMEM. Waiting for the space means that the writer is blocked until some unrelated process does some action that frees pipe buffer, perhaps closes its pipe. I think that unexplained blocking (it is very hard to track down such state) is worse then ENOMEM outcome. > > I.e. I would have expected such an OOM situation for pipe buffers to > degrade pipe performance, but not to have it exposed to the user. Indeed, a > cursory > read of the FreeBSD kernel source seems to reinforce this notion. > In pipe_create, we see the following comment: > > ``` > /* > * Note that these functions can fail if pipe map is exhausted > * (as a result of too many pipes created), but we ignore the > * error as it is not fatal and could be provoked by > * unprivileged users. The only consequence is worse performance > * with given pipe. > */ > if (amountpipekva > maxpipekva / 2) > (void)pipespace_new(pipe, SMALL_PIPE_SIZE); > else > (void)pipespace_new(pipe, PIPE_SIZE); > ``` This happens at pipe open. As you see, we might preallocate only SMALL_PIPE_SIZE (4K) if low on KVA, or not preallocate at all if KVA is exhausted, hoping that at the time of write(2) the situation changes. > > But then later, in pipe_write, we see: > ``` > if (wpipe->pipe_buffer.size == 0) { > /* > * This can only happen for reverse direction use of pipes > * in a complete OOM situation. > */ > error = ENOMEM; > ``` > > >From my (admittedly limited) understanding of the code, it doesn't > seem that either comment is accurate. If the pipe buffer allocation > fails, then `write`s will return `ENOMEM`, even in the forward direction > (the buffer for the reverse direction isn't allocated by default, but > as indicated by the first comment, the allocation for the forward > direction can certainly fail). Yes, this comment is confusing and if both preallocation at the pipe creation time, and then allocation at first write both failed, we return ENOMEM. We reserve 1/64 of the physical memory for pipekva. It costs nothing to increase this number initially for 64bit systems because it is only KVA, but note that eventually this memory will be instantiated with physical backing pages. E.g. on my workstation with 32G RAM I see kern.ipc.maxpipekva: 534261760 (512M) and I do not want to make it larger. What is the amount of memory on the machine where you see ENOMEM ? > > I was hoping a FreeBSD kernel developer could shed some light on > whether the kernel behavior we're experiencing here is indeed expected > on FreeBSD, or whether it would be expected that the kernel would try > harder to service the pipe request in such a situation. > > Thanks, > Keno > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200304233906.GB98340>