From owner-freebsd-bugs@FreeBSD.ORG Thu Jul 7 02:21:29 2011 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C70C6106564A; Thu, 7 Jul 2011 02:21:29 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 5F4A08FC18; Thu, 7 Jul 2011 02:21:29 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p672LPAN027394 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 7 Jul 2011 12:21:26 +1000 Date: Thu, 7 Jul 2011 12:21:25 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Tom Hukins In-Reply-To: <201107041540.p64FeAYs073197@muscovy.scrubhole.org> Message-ID: <20110707110726.T1088@besplex.bde.org> References: <201107041540.p64FeAYs073197@muscovy.scrubhole.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@FreeBSD.org, FreeBSD-gnats-submit@FreeBSD.org Subject: Re: kern/158641: Writing > 8192 bytes to a pipe blocks signal handling X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2011 02:21:29 -0000 On Mon, 4 Jul 2011, Tom Hukins wrote: >> Description: > > When a pipe has more than 8192 bytes written to it, the current process > hangs and does not handle signals correctly. It just blocks and does handle signals correctly. If a pipe is open in not-O_NONBLOCK mode (as is the case here), write()s of as little as 1 byte may block, depending on the pipe's buffering mechanisms and how much is already buffered. The first blockage occurs when more than 8191 (not 8192) bytes are written starting from empty, at least when there is only 1 write. The details are undocumented, but have something to do with the undocumented implementation detail PIPE_MINDIRECT being 8192 (I'm surprised that blocking doesn't start at the the undocumented implementation limit PIPE_SIZE = 16384). The write() can be terminated by a signal, except of course by the thread doing the write(), since that thread blocked. Use fcntl() on the pipe write fd if you don't want the writing thread to block. Then the write() can fail much more easily, or return a short count, so checking its return value is much more important. >> How-To-Repeat: > if ( pipe(pdes) != 0) { > return 1; > } > signal(SIGALRM, catch_alrm); > > int mypid = getpid(); Use fcntl here or earlier. > write( pdes[1], argv[1], strlen(argv[1]) ); This blocks when the write() size is more than 8191... > kill(mypid, SIGALRM); ...so this is never reached. Sending SIGALRM from another thread works correctly (except for bugs in the signal handling: (1): it is unsafe to use printf() in a signal handler; (2) the external alarm invokes the signal handler and also terminates the write(), and after write returns it sends an alarm signal which invokes the signal handler again. With fcntl() to O_NONBLOCK, the write of course doesn't block, but the behaviour is still surprising. I expected a write of 8192 bytes to return 8191 (since it would have blocked at 8192), but it actually returned 8192. This behaviour persists up to a write() size of PIPE_SIZE (65536) -- the full amount is written. It takes a write size of (PIPE_SIZE + 1) for things to work unsurprisingly -- this writes PIPE_SIZE and returns that since writing 1 more would block. So O_NONBLOCK not only prevents blocking but also changes the buffering so that the buffer has the full size. Reviewing of the source code shows that this behaviour is intentional. There are 2 completely different buffering methods. Write()s of <= 8192 bytes uses a simple buffering method. Normally, writes of between 8192 and 65536 bytes (inclusive) use a sophisticated "direct" "zero-copy" method involving vm and no kernel buffers. This tends to be faster, but mainly in silly benchmarks. But O_NONBLOCK turns off the direct writing, so that writes of up to PIPE_SIZE (65536) are buffered simply in kernel buffers of that size, and write() can write that much before returning immediately. I think the direct method is not used for these writes simply because it cannot work for them -- for it to work there must be a reader to own the buffers that are copied directly to, but there may be no such reader, and write() cannot (or should not) simply fail since it is required to (or should) supply at least PIPE_BUF (512) bytes of buffering. The critical test is whether a non-blocking write of 8192 bytes is permitted to fail and return -1 with errno = EAGAIN just because the implementation wants to use direct buffering but no direct buffering is available (because there is no reader yet). Note that there is no problem for blocking write()s -- the kernel simply blocks waiting for a reader. This explains at a lower level why your program blocks, and why the blockage is at 8192 and not at 65536: the kernel wants to use direct writes, but can't do that since there is no reader owning the read buffers; so the kernel waits for a reader; but no reader ever arrives since the program is single-threaded and never supplies one. Apart from being able to write PIPE_BUF bytes atomically, nothing is guaranteed for the buffering of pipes. The man page doesn't even document PIPE_BUF. Reviewing of the POSIX spec shows that there seems to be no requirement that writing of PIPE_BUF bytes ever succeeds. Atomicity just means that the write is of either nothing or >= PIPE_BUF bytes. Of course such may fail if the buffer is too full to hold PIPE_BUF more bytes. In the direct case, we can weaselly say that the buffer is always too full if there is no reader, so as not to have to switch to the slower indirect (kernel buffering) method. This is technically justified so the buffer is too full if it doesn't exist, but is probably too surprising. Bruce