Date: Tue, 4 Oct 2005 00:21:15 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Patrick Proniewski <patpro@patpro.net> Cc: freebsd-performance@freebsd.org, Eric Anderson <anderson@centtech.com>, "=?ISO-8859-1?Q? Arne_\"W=F6rner\" ?=" <arne_woerner@yahoo.com> Subject: Re: dd(1) performance when copiing a disk to another Message-ID: <20051003222844.R44500@delplex.bde.org> In-Reply-To: <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <43401C62.2040606@centtech.com> <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Oct 2005, Patrick Proniewski wrote: >>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 >>>> 1000+0 records in >>>> 1000+0 records out >>>> 1048576000 bytes transferred in 17.647464 secs (59417943 >>>> bytes/sec) Many wrong answers to the original question have been given. dd with a blocks size of 1m between (separate) disk devices is much slower just because that block size is far too large... The above is a fairly normal speed. The expected speed depends mainly on the disk technology generation and the placement of the sectors being read. I get the following speeds for _sequential_ _reading- from the outer (fastest) tracks of 6- and 3-year old drives which are about 2 generations apart: %%% Sep 25 21:52:35 besplex kernel: ad0: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata0-master UDMA100 Sep 25 21:52:35 besplex kernel: ad2: 58644MB <IC35L060AVV207-0> [119150/16/63] at ata1-master UDMA100 ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 bytes/sec) ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 bytes/sec) ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 bytes/sec) ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 bytes/sec) ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 bytes/sec) ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 bytes/sec) ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 bytes/sec) ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 bytes/sec) ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 bytes/sec) ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 bytes/sec) ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 bytes/sec) ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 bytes/sec) ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 bytes/sec) ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 bytes/sec) ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 bytes/sec) ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 bytes/sec) ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 bytes/sec) ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec) ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec) ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 bytes/sec) ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec) ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 bytes/sec) ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec) ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 bytes/sec) %%% Drive technology hit a speed plateau a few years ago so newer single drives aren't much faster unless they are more expensive and/or smaller. The speed is low for small block sizes because the device has to be talked too too much and the protocol and firmware are not very good. (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps up to about half speed (26MB/sec) for a block size of 4K but sticks at that speed for block sizes 8K and 16K, then jumps up to full speed for a block sizes of 32K and larger. This indicates some firmware stupidness). Most drives ramp up almost logarithmically (doubling the block size almost doubles the speed). This behaviour is especially evident on slow SCSI drives like some (most?) ZIP and dvd/cd. The command overhead can be 20 msec, so you had better not do 1 512 bytes of i/o per command or you will get a speed of 25K/sec. The command overhead of a new ATA drive is more like 50 usec, but that is still far too much for high speed with a block size of 512 bytes. The speed is insignificantly different for block sizes larger than a limit because the drive's physical limits dominate except possibly with old (slow) CPUs. >>> That seems to be 2 or about 2 times faster than disc->disc >>> transfer... But still slower, than I would have expected... >>> SATA150 sounds like the drive can do 150MB/sec... > > As Eric pointed out, you just can"t reach 150 MB/s with one disk, it's a > technological maximum for the bus, but real world performance is well bellow > this max. > In fact, I've though I would reach about 50 to 60 MB/s. 50-60 MB/s is about right. I haven't benchmarked any SATA or very new drives. Apparently they are not much faster. ISTR that WDC Raptors are speced for 70-80MB/sec. You pay twice as much to get a tiny drive with only 25% more throughput plus faster seeks. >>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>>> without destroying the previous work... :-)) >>>> >>>> well, not very easy both disk are the same size ;) > >>> I thought of the first 1000 1MB blocks... :-) > > damn, I misread this one... :) > I'm gonna try this asap. I divide disks into equally sized (fairly small, or half the disk size) partitions, and cp between them. dd is too hard to use for me ;-). cp is easier to type and automatically picks a reasonable block size. Of course I use dd if the block size needs to be controlled, but mostly I only use it in preference to cp to get its timing info. >... >> Have you tried a smaller block size? What does 8k, 16k, or 512k do for >> you? There really isn't much room for improvement here on a single device. > > nop, I'll try one of them, but I can't do many experiments, the box is in my > living room, it's a 1U rack, and it's VERY VERY noisy. My girlfriend will > kill me if it's running more than an hour a day :)) Smaller block sizes will go much faster, except for copying from a disk to itself. Large block sizes are normally a pessimization and the pessimization is especially noticeable for dd. Just use the smallest block size that gives an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly different for writing). Large block sizes are pessimal for synchronous i/o like dd does. The timing for dd'ing blocks of size N MB at R MB/sec between ad0 and ad2 is something like: time in secs activity on ad0 activity on ad2 ------------ --------------- --------------- 0 start read of 1MB idle N/R finish read; idle start write of 1MB N/R-epsilon start read of 1MB pretend to complete write N/R continue read complete write N/R-epsilon finish read; idle start write of 1MB N/R-2*epsilon ... ... After the first block (which takes a little longer), it takes N/R-epsilon seconds to copy 1 block, where epsilon is the time between the writer's pretending to complete the write and actually completing it. This time is obviously not very dependent on the block size since it is limited by drives resources and policies (in particular, if the drive doesn't do write caching, perhaps because write caching is not enabled, then epsilon is 0, and if out block size is large compared with the drive's cache then the drive won't be able to signal completion until no more than the drive's cache size is left to do). Thus epsilon becomes small relative to the N/R term when N is large. Apparently, in your case the speed drops from 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200. With large block sizes, the speed can be increased using asyncronous output. There is a utility (in ports) named team that fakes async output using separate processes. I have never used it. Somthing as simple as 2 dd's in a pipe should work OK. For copying from a disk itself, a large block sizes is needed to limit the number of seeks, and concurrent reads and writes are exactly what is not needed (since they would give competing seeks). The i/o must be sequentialized, and dd does the right things for this, though the drive might not (you would prefer epsilon == 0, since if the drive signals write completion early then it might get confused when you flood it with the next read and seek to start the read before it completes the write, then thrash back and forth between writing and reading). It is interesting that writing large sequential files to at least the ffs file system (not mounted with -sync) in FreeBSD is slightly faster than writing directly to the raw disk using write(2), even if the device driver sees almost the same block sizes for these different operations. This is because write(2) is synchronous and sync writes always cause idle periods (the idle periods are just much smaller for writing data that is already in memory), while the kernel uses async writes for data. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051003222844.R44500>