From owner-svn-src-head@FreeBSD.ORG Thu May 1 02:00:09 2014 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 050DDF52; Thu, 1 May 2014 02:00:09 +0000 (UTC) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id B235B1191; Thu, 1 May 2014 02:00:07 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id EE598422D3B; Thu, 1 May 2014 11:59:59 +1000 (EST) Date: Thu, 1 May 2014 11:59:56 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Matthew Fleming Subject: Re: svn commit: r265132 - in head: share/man/man4 sys/dev/null In-Reply-To: Message-ID: <20140501094737.J1261@besplex.bde.org> References: <201404300620.s3U6Kmn6074492@svn.freebsd.org> <1398869319.22079.54.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=BuGJK2ebvAQA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8 a=nHynIu3us5LNPkm6OCcA:9 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 Cc: "svn-src-head@freebsd.org" , "svn-src-all@freebsd.org" , "src-committers@freebsd.org" , Eitan Adler , Ian Lepore X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 May 2014 02:00:09 -0000 On Wed, 30 Apr 2014, Matthew Fleming wrote: > On Wed, Apr 30, 2014 at 7:48 AM, Ian Lepore wrote: >> For some reason this reminded me of something I've been wanting for a >> while but never get around to writing... /dev/ones, it's just >> like /dev/zero except it returns 0xff bytes. Useful for dd'ing to wipe >> out flash-based media. > > dd if=/dev/zero | tr "\000" "\377" | dd of= Why all these processes and i/o's? tr But it's not quite the same. It is better, since it is not limited to 0xff bytes :-). Oops, perhaps not. tr not only uses stdio to pessimize the i/o; it uses wide characters 1 at a time. It used to use only characters 1 at a time. yes(1) is limited to newline bytes, or newlines mixed with strings. It also uses stdio to pessimize the i/o, but not wide characters yet. stdio's pessimizations begin with naively believing that st_blksize gives a good i/o size. For most non-regular files, including all (?) devices and all (?) pipes, st_blksize is PAGE_SIZE. For disks, this has been broken signficantly since FreeBSD-4 where it was the disk's si_bsize_best (usually 64K). For pipes, this has been broken significantly since FreeBSD-4 where it was pipe_buffer.size (either PIPE_SIZE = 16K or BIG_PIPE_SIZE = 64K). So standard utilities tend to be too slow to use on disks. You have to use dd and relatively complicated pipelines to get adequate block sizes. Sometimes dd or a special utility is needed to get adequate control and error handling. I have such a special utility for copying disks with bad sectors, but prefer to use just cp fpr copying disks. cp doesn't use stdio, and doesn't use mmap() above certain small size; it uses read/write() with a fixed block size of 64K or maybe larger in -current, so it works OK for copying disks. The most broken utilities that I use often for disk devices are: - md5. This (really libmd/mdXhl.c) has been broken on all devices (really on all non-regular files) since ~2001. It is broken by misusing st_size instead of by trusting st_blksize. st_size is only valid for regular files, but is used on other file types to break them. For example: pts/21:bde@freefall:~> md5 /dev/null MD5 (/dev/null) = d41d8cd98f00b204e9800998ecf8427e pts/21:bde@freefall:~> md5 /dev/zero MD5 (/dev/zero) = d41d8cd98f00b204e9800998ecf8427e Similarly for disk devices. All devices are seen as empty by md5. The workaround is to use a pipeline, or just stdin. "cat /dev/zero | md5" and even "md5 , but don't seem to use stdio except for bugs like this. The result is that the i/o is especially pessimized for the usual regular file case. Buffering in the kernel limits this pessimization. ) The device file case for cmp just uses getc()/putc(). This first gets the st_blksize pessimization. Then it gets the slow per-char i/o fro using getc()/putc(). For disks, the first pessimizations tends to dominate but the second one is noticeable. For fast input devices it is very noticeable. On freefall now: "dd if=/dev/zero bs=1m count=4k of=/dev/null": speed is 21GB/sec; "dd if=/dev/zero bs=1m count=4k | cmp - /dev/zero": speed is 187MB/sec. The overhead is a factor of 110. With iron disks, the overhead would be a factor of about 1/2. The loop in cmp for regular files is slow too, but only in comparison with the memcpy() that is (essentially) used for reading /dev/zero and with the memcmp() that should be used by cmp. It just compares bytewise and has mounds of bookkeeping to count characters and lines for the rare cases that fail. The usual case should just use mmap() of the whole file (if not read()) and memcmp() on that. I recently noticed a very bad case for cmp on regular files too. I was comparing large files on an cd9600 file system on a DVD, under an old version of FreeBSD. cmp mmap()s the whole file. The i/o for this is done by vm, and vm generated only minimal i/o's with the cd9660 block size of 2K. read() would have done clustering to a block size of 64K. Perhaps vm is better now, but it is hard to see how it could do as well as read() without doing the same clustering as read(). One workaround for this is to prefetch files into the buffer (vmio) cache using read(). It is hard to avoid thrashing of the cache with this, so I used workarounds like diff'ing the files instead of cmp'ing them. diff is much heavier weight, but it runs faster since it doesn't use mmap() (gnu diff seems to use fread() and suffers from stdio using st_blksize). Bruce