Date: Thu, 1 May 2014 11:59:56 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Matthew Fleming <mdf@freebsd.org> Cc: "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, Eitan Adler <eadler@freebsd.org>, Ian Lepore <ian@freebsd.org> Subject: Re: svn commit: r265132 - in head: share/man/man4 sys/dev/null Message-ID: <20140501094737.J1261@besplex.bde.org> In-Reply-To: <CAMBSHm9mocqTVBeC0WUwg8=t_5aRcWXQV0eb=jYAqavmS1Z-Cw@mail.gmail.com> References: <201404300620.s3U6Kmn6074492@svn.freebsd.org> <1398869319.22079.54.camel@revolution.hippie.lan> <CAMBSHm9mocqTVBeC0WUwg8=t_5aRcWXQV0eb=jYAqavmS1Z-Cw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 30 Apr 2014, Matthew Fleming wrote: > On Wed, Apr 30, 2014 at 7:48 AM, Ian Lepore <ian@freebsd.org> wrote: >> For some reason this reminded me of something I've been wanting for a >> while but never get around to writing... /dev/ones, it's just >> like /dev/zero except it returns 0xff bytes. Useful for dd'ing to wipe >> out flash-based media. > > dd if=/dev/zero | tr "\000" "\377" | dd of=<xxx> Why all these processes and i/o's? tr </dev/dev/zero "\000" "\377" The dd's may be needed for controlling the block sizes. > But it's not quite the same. It is better, since it is not limited to 0xff bytes :-). Oops, perhaps not. tr not only uses stdio to pessimize the i/o; it uses wide characters 1 at a time. It used to use only characters 1 at a time. yes(1) is limited to newline bytes, or newlines mixed with strings. It also uses stdio to pessimize the i/o, but not wide characters yet. stdio's pessimizations begin with naively believing that st_blksize gives a good i/o size. For most non-regular files, including all (?) devices and all (?) pipes, st_blksize is PAGE_SIZE. For disks, this has been broken signficantly since FreeBSD-4 where it was the disk's si_bsize_best (usually 64K). For pipes, this has been broken significantly since FreeBSD-4 where it was pipe_buffer.size (either PIPE_SIZE = 16K or BIG_PIPE_SIZE = 64K). So standard utilities tend to be too slow to use on disks. You have to use dd and relatively complicated pipelines to get adequate block sizes. Sometimes dd or a special utility is needed to get adequate control and error handling. I have such a special utility for copying disks with bad sectors, but prefer to use just cp fpr copying disks. cp doesn't use stdio, and doesn't use mmap() above certain small size; it uses read/write() with a fixed block size of 64K or maybe larger in -current, so it works OK for copying disks. The most broken utilities that I use often for disk devices are: - md5. This (really libmd/mdXhl.c) has been broken on all devices (really on all non-regular files) since ~2001. It is broken by misusing st_size instead of by trusting st_blksize. st_size is only valid for regular files, but is used on other file types to break them. For example: pts/21:bde@freefall:~> md5 /dev/null MD5 (/dev/null) = d41d8cd98f00b204e9800998ecf8427e pts/21:bde@freefall:~> md5 /dev/zero MD5 (/dev/zero) = d41d8cd98f00b204e9800998ecf8427e Similarly for disk devices. All devices are seen as empty by md5. The workaround is to use a pipeline, or just stdin. "cat /dev/zero | md5" and even "md5 </dev/zero" confuse md5 into using a different input method that works. OTOH, "md5 /dev/fd/0" sees an empty device file, and "cat /dev/zero | md5 /dev/fd/0" fails immediately with a seek error. Pipes have st_size == 0 too, so the input method that stats the file would see an empty file too, so it must not be reached in the working case. "md5 /dev/fd/0" apparently just stats the device file, and this appears to be empty. I'm not sure if it is the tty device file or /dev/fd/0 that is seen. "cat /dev/zero | md5 /dev/fd/0" apparently reaches the buggy code, but somehow gets further and fails trying to seek. To get adequate block sizes for disks, use dd in the pipeline that must be used for other reasons. I only recently noticed that pipes have st_blksize = PAGE_SIZE, so that if you pipe to stdio utilities then the i/o will be pessimized and reblocking using another dd in a pipeline to get back to an adequate size. PAGE_SIZE is large enough to not be very pessimal for some uses. - cmp. cmp uses mmap() excessively for regular files, but for device files it uses per-char stdio excessively. ( More on md5. The i/o routine for the working is are in the application (md5/md5.c). This uses fread() with the bad block size BUFSIZ. This is still 1024. It is more broken than st_blksize. However, fread() is not per-char, so it is reasonably efficient. stdio uses st_blksize for read() from the file. When the file is regular, the block size is again relatively unimportant provided the file system has a large enough block size or does clustering. For device files, clustering might occur at levels below the file system, but usually doesn't for disks. Instead, small i/o's get relatively slower with time except on high-end SSDs with high transactions per second, because clustering at low levels takes too many transactions. The i/o routine for the non0-working case is in the library (libmd/mdXhl.c). It uses read(), but with the silly stdio block size of BUFSIZ. libmd files have several includes of <stdio.h>, but don't seem to use stdio except for bugs like this. The result is that the i/o is especially pessimized for the usual regular file case. Buffering in the kernel limits this pessimization. ) The device file case for cmp just uses getc()/putc(). This first gets the st_blksize pessimization. Then it gets the slow per-char i/o fro using getc()/putc(). For disks, the first pessimizations tends to dominate but the second one is noticeable. For fast input devices it is very noticeable. On freefall now: "dd if=/dev/zero bs=1m count=4k of=/dev/null": speed is 21GB/sec; "dd if=/dev/zero bs=1m count=4k | cmp - /dev/zero": speed is 187MB/sec. The overhead is a factor of 110. With iron disks, the overhead would be a factor of about 1/2. The loop in cmp for regular files is slow too, but only in comparison with the memcpy() that is (essentially) used for reading /dev/zero and with the memcmp() that should be used by cmp. It just compares bytewise and has mounds of bookkeeping to count characters and lines for the rare cases that fail. The usual case should just use mmap() of the whole file (if not read()) and memcmp() on that. I recently noticed a very bad case for cmp on regular files too. I was comparing large files on an cd9600 file system on a DVD, under an old version of FreeBSD. cmp mmap()s the whole file. The i/o for this is done by vm, and vm generated only minimal i/o's with the cd9660 block size of 2K. read() would have done clustering to a block size of 64K. Perhaps vm is better now, but it is hard to see how it could do as well as read() without doing the same clustering as read(). One workaround for this is to prefetch files into the buffer (vmio) cache using read(). It is hard to avoid thrashing of the cache with this, so I used workarounds like diff'ing the files instead of cmp'ing them. diff is much heavier weight, but it runs faster since it doesn't use mmap() (gnu diff seems to use fread() and suffers from stdio using st_blksize). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140501094737.J1261>