Date: Fri, 27 Jul 2012 12:07:07 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Garrett Cooper <yanegomi@gmail.com> Cc: freebsd-bugs@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org Subject: Re: kern/170203: [kern] piped dd's don't behave sanely when dealing with a fifo Message-ID: <20120727103622.B933@besplex.bde.org> In-Reply-To: <201207262256.q6QMurVf077480@red.freebsd.org> References: <201207262256.q6QMurVf077480@red.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 26 Jul 2012, Garrett Cooper wrote: >> Description: > Creating a fifo and then dd'ing across the fifo using /dev/zero doesn't seem to yield the behavior one would expect to have; dd should either exit thanks to SIGPIPE being sent or the count being completed. > > Furthermore, the count is bogus: > > Terminal 1: > > $ dd if=fifo bs=512k count=4 > 0+4 records in > 0+4 records out > 32768 bytes transferred in 0.002121 secs (15449523 bytes/sec) > $ dd if=fifo bs=512k count=4 > 0+4 records in > 0+4 records out > 32768 bytes transferred in 0.001483 secs (22096295 bytes/sec) > ... I think it's working almost as expected. Large blocks give non-atomic I/O, so the reader sees small blocks, then EOF when it gets ahead of the writer. This always happens without SMP. Not is a bug (debugged below). There is no SIGPIPE at the start of write() because there is a reader then, and no SIGPIPE for the next write() because there is no next write() -- the current one doesn't notice when the reader goes away. This is what happens under FreeBSD-~5.2 with the old fifo implementation, at least. It also shows a bug in truss(1) -- the current write() is not shown, because it hasn't returned. kdump shows that the write() has started but not returned. > $ dd if=fifo bs=512M count=4 > 0+4 records in > 0+4 records out > 32768 bytes transferred in 0.003908 secs (8384514 bytes/sec) > > Terminal 2: > > $ dd if=/dev/zero bs=512k count=4 of=fifo > ^T > load: 0.40 cmd: dd 1779 [sbwait] 2.63r 0.00u 0.00s 0% 1800k FreeBSD-~5.2 shows [runnable] for the wait channel. This is strange. dd should be blocked waiting for a reader, and only sbwait makes sense for that. FreeBSD-9 apparently doesn't have the new named pipe implementation either. -current shows [pipdwt]. This makes it clearer that is waiting in write() and not in open(). dd probably does the wrong thing for fifos, by always trying to open files in O_RDWR mode first. This breaks the normal synchronization of readers and writers. In fact, this explains why there is no SIGPIPE -- there is always a reader since dd can always talk to itself. First the open succeeds without blocking as expected. After changing the O_RDWR to O_WRONLY in FreeBSD-~5.2, dd almost works as expected. The reader reads 4 blocks of size 8K and then exits. The writer first blocks in open. Then it is killed by SIGPIPE. Its SIGPIPE handling is broken (nonexistent), and the signal kills it without it printing a status message: % 1266 dd RET read 524288/0x80000 % 1266 dd CALL write(0x4,0x8063000,0x80000) % 1266 dd RET write -1 errno 32 Broken pipe % 1266 dd PSIG SIGPIPE SIG_DFL The read is from /dev/zero. The write is of 512K to the fifo. This delivers 4*8K then is killed. If dd caught the signal like it should, then we would expect to see either a short write(). The signal handling should clear SA_RESTART, else the write() would be restarted and would deliver endless SIGPIPEs, now for failing writes. Reporting of short writes is quite broken and this is an interesting test for it. -current delivers 4*64K instead of 4*8K. This is because the i/o unit is BIG_PIPE_SIZE = 64K for nameless pipes and now for nameless pipes. Apparently the unit is 8K for sockets. I think the unit of atomicity is only 512 bytes for both. Certainly, PIPE_BUF is still 512 in limits.h. I think limits.h is broken since the unit isn't actually 512 bytes for _all_ file types. For sockets, you can control the watermarks and I think this changes the unit of atomicity. I wonder if the socket ioctls for this the old named pipe implemention. The pipe wait channel names are less than perfect. "pipdw" means "pipe direct write". "wt" looks like an abreviation for "write", but there are 3 waits in pipe_direct_write() and they are distinguished by the suffixes "w", "c" and "t". It isn't clear what these mean. >> How-To-Repeat: > mkfifo fifo > > Terminal 1: > > dd if=fifo bs=512k count=4 > > Terminal 2: > > dd if=/dev/zero bs=512k count=4 of=fifo Remember to kill the writing dd if you stop it with ^Z. Otherwise, since the unhacked version is talking to itself, the fifo acts strangely for other tests. conv=block and conv=noerror (with cbs=512k) change the behaviour only slightly (slightly worse). What works easily is omitting the count. dd then reads until EOF, in 256 records of size exactly 8K each under FreeBSD-~5.2. Not giving the count is normal practice, since you rarely know the block size for pipes and many other file types. It there is another bug here, then it is conv=foo not working. But reblocking is confusing, and I probably did it wrong. ANother thing that doesn't work well here is trying to control the writer with SIGPIPE from the reader. Even if you can get the reblocking right and read precisily 2MB, and fix SIGPIPE, then the SIGPIPE may be delivered after the writer has dirtied the fifo with a little more than 2MB. The unread data then remains to bite the next reader. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120727103622.B933>