Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 03 Oct 2005 12:08:31 -0300
From:      =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= <tuliogs@pgt.mpt.gov.br>
To:        freebsd-performance@freebsd.org
Subject:   Re: dd(1) performance when copying a disk to another
Message-ID:  <4341496F.9020703@pgt.mpt.gov.br>
In-Reply-To: <20051003222844.R44500@delplex.bde.org>
References:  <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>	<43401C62.2040606@centtech.com>	<618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> <20051003222844.R44500@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------080001030101000801070700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

  Phew, thanks for that. :) This seems to answer my question in the 
other "leg" of the thread, though it hadnīt yet arrived to me when I 
wrote the message, though.
  Now THATīs a quite good explanation. ;) Thanks again,

Tulio G. da Silva

Bruce Evans wrote:

> On Mon, 3 Oct 2005, Patrick Proniewski wrote:
>
>>>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes transferred in 17.647464 secs (59417943
>>>>> bytes/sec)
>>>>
>
> Many wrong answers to the original question have been given.  dd with
> a blocks size of 1m between (separate) disk devices is much slower
> just because that block size is far too large...
>
> The above is a fairly normal speed.  The expected speed depends mainly
> on the disk technology generation and the placement of the sectors being
> read.  I get the following speeds for _sequential_ _reading- from the
> outer (fastest) tracks of 6- and 3-year old drives which are about 2
> generations apart:
>
> %%%
> Sep 25 21:52:35 besplex kernel: ad0: 29314MB <IBM-DTLA-307030> 
> [59560/16/63] at ata0-master UDMA100
> Sep 25 21:52:35 besplex kernel: ad2: 58644MB <IC35L060AVV207-0> 
> [119150/16/63] at ata1-master UDMA100
> ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 
> bytes/sec)
> ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 
> bytes/sec)
> ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 
> bytes/sec)
> ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 
> bytes/sec)
> ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 
> bytes/sec)
> ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 
> bytes/sec)
> ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 
> bytes/sec)
> ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 
> bytes/sec)
> ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 
> bytes/sec)
> ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 
> bytes/sec)
> ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 
> bytes/sec)
> ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 
> bytes/sec)
> ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 
> bytes/sec)
> ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 
> bytes/sec)
> ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 
> bytes/sec)
> ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 
> bytes/sec)
> ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 
> bytes/sec)
> ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 
> bytes/sec)
> ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 
> bytes/sec)
> ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 
> bytes/sec)
> ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 
> bytes/sec)
> ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 
> bytes/sec)
> ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 
> bytes/sec)
> ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 
> bytes/sec)
> %%%
>
> Drive technology hit a speed plateau a few years ago so newer single 
> drives
> aren't much faster unless they are more expensive and/or smaller.
>
> The speed is low for small block sizes because the device has to be
> talked too too much and the protocol and firmware are not very good.
> (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps
> up to about half speed (26MB/sec) for a block size of 4K but sticks
> at that speed for block sizes 8K and 16K, then jumps up to full speed
> for a block sizes of 32K and larger.  This indicates some firmware
> stupidness).  Most drives ramp up almost logarithmically (doubling
> the block size almost doubles the speed).  This behaviour is especially
> evident on slow SCSI drives like some (most?) ZIP and dvd/cd.  The
> command overhead can be 20 msec, so you had better not do 1 512 bytes
> of i/o per command or you will get a speed of 25K/sec.  The command
> overhead of a new ATA drive is more like 50 usec, but that is still
> far too much for high speed with a block size of 512 bytes.
>
> The speed is insignificantly different for block sizes larger than a
> limit because the drive's physical limits dominate except possibly
> with old (slow) CPUs.
>
>>>> That seems to be 2 or about 2 times faster than disc->disc
>>>> transfer... But still slower, than I would have expected...
>>>> SATA150 sounds like the drive can do 150MB/sec...
>>>
>>
>> As Eric pointed out, you just can"t reach 150 MB/s with one disk, 
>> it's a technological maximum for the bus, but real world performance 
>> is well bellow this max.
>> In fact, I've though I would reach about 50 to 60 MB/s.
>
>
> 50-60 MB/s is about right.  I haven't benchmarked any SATA or very new
> drives.  Apparently they are not much faster.  ISTR that WDC Raptors are
> speced for 70-80MB/sec.  You pay twice as much to get a tiny drive with
> only 25% more throughput plus faster seeks.
>
>>>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6
>>>>>> without destroying the previous work... :-))
>>>>>
>>>>>
>>>>> well, not very easy both disk are the same size ;)
>>>>
>>
>>>> I thought of the first 1000 1MB blocks... :-)
>>>
>>
>> damn, I misread this one... :)
>> I'm gonna try this asap.
>
>
> I divide disks into equally sized (fairly small, or half the disk size)
> partitions, and cp between them.  dd is too hard to use for me ;-).  cp
> is easier to type and automatically picks a reasonable block size.  Of
> course I use dd if the block size needs to be controlled, but mostly I
> only use it in preference to cp to get its timing info.
>
>> ...
>>
>>> Have you tried a smaller block size?  What does 8k, 16k, or 512k do 
>>> for you?  There really isn't much room for improvement here on a 
>>> single device.
>>
>>
>> nop, I'll try one of them, but I can't do many experiments, the box 
>> is in my living room, it's a 1U rack, and it's VERY VERY noisy. My 
>> girlfriend will kill me if it's running more than an hour a day :))
>
>
> Smaller block sizes will go much faster, except for copying from a 
> disk to
> itself.  Large block sizes are normally a pessimization and the 
> pessimization
> is especially noticeable for dd.  Just use the smallest block size 
> that gives
> an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly
> different for writing).  Large block sizes are pessimal for synchronous
> i/o like dd does.  The timing for dd'ing blocks of size N MB at R MB/sec
> between ad0 and ad2 is something like:
>
>     time in secs    activity on ad0        activity on ad2
>     ------------    ---------------        ---------------
>     0        start read of 1MB    idle
>     N/R        finish read; idle    start write of 1MB
>     N/R-epsilon    start read of 1MB    pretend to complete write
>     N/R        continue read        complete write
>     N/R-epsilon    finish read; idle    start write of 1MB
>     N/R-2*epsilon    ...            ...
>
> After the first block (which takes a little longer), it takes N/R-epsilon
> seconds to copy 1 block, where epsilon is the time between the writer's
> pretending to complete the write and actually completing it.  This time
> is obviously not very dependent on the block size since it is limited by
> drives resources and policies (in particular, if the drive doesn't do 
> write
> caching, perhaps because write caching is not enabled, then epsilon is 0,
> and if out block size is large compared with the drive's cache then the
> drive won't be able to signal completion until no more than the drive's
> cache size is left to do).  Thus epsilon becomes small relative to the
> N/R term when N is large.  Apparently, in your case the speed drops from
> 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200.
>
> With large block sizes, the speed can be increased using asyncronous 
> output.
> There is a utility (in ports) named team that fakes async output using
> separate processes.  I have never used it.  Somthing as simple as 2
> dd's in a pipe should work OK.
>
> For copying from a disk itself, a large block sizes is needed to limit 
> the
> number of seeks, and concurrent reads and writes are exactly what is not
> needed (since they would give competing seeks).  The i/o must be
> sequentialized, and dd does the right things for this, though the drive
> might not (you would prefer epsilon == 0, since if the drive signals
> write completion early then it might get confused when you flood it
> with the next read and seek to start the read before it completes the
> write, then thrash back and forth between writing and reading).
>
> It is interesting that writing large sequential files to at least the
> ffs file system (not mounted with -sync) in FreeBSD is slightly faster
> than writing directly to the raw disk using write(2), even if the
> device driver sees almost the same block sizes for these different
> operations.  This is because write(2) is synchronous and sync writes
> always cause idle periods (the idle periods are just much smaller for
> writing data that is already in memory), while the kernel uses async
> writes for data.
>
> Bruce
> _______________________________________________
> freebsd-performance@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to 
> "freebsd-performance-unsubscribe@freebsd.org"
>
>

--------------080001030101000801070700--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4341496F.9020703>