From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 14:57:12 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 329D316A420
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 14:57:12 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from smtp1-g19.free.fr (smtp1-g19.free.fr [212.27.42.27])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D20F643D46
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 14:57:11 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from [82.235.12.223] (boleskine.patpro.net [82.235.12.223])
	by smtp1-g19.free.fr (Postfix) with ESMTP id 5DF053954F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 16:57:10 +0200 (CEST)
Mime-Version: 1.0 (Apple Message framework v734)
Content-Transfer-Encoding: quoted-printable
Message-Id: <7035DA0C-E43C-478B-9B1C-6A32545E5E63@patpro.net>
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
To: freebsd-performance@freebsd.org
From: Patrick Proniewski <patpro@patpro.net>
Date: Sun, 2 Oct 2005 16:57:09 +0200
X-Mailer: Apple Mail (2.734)
Subject: dd(1) performance when copiing a disk to another 
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 14:57:12 -0000

Hi,
(carte m=E8re supermicro
chip SATA Intel 6300ESB)

I run FreeBSD 5.4 on a PIV 3GHz (SuperMicro motherboard, Intel SATA =20
6300ESB chipset) with 2 SATA HDD. I'm in the process to duplicate the =20=

boot HDD to the second HDD. I run dd for that:

# dd if=3D/dev/ad4 of=3D/dev/ad6 bs=3D1m

It yields to poor performances:

$ iostat -dhKw 1
(...)
              ad4              ad6
   KB/t tps  MB/s   KB/t tps  MB/s
  124.49 252 30.69  128.00 246 30.69
  128.00 285 35.64  128.00 279 34.90
  128.00 282 35.27  128.00 283 35.40
(...)

Is it normal that data rate won't go upper than 35/38 MB/s ?

HDDs are: ad4 -> Maxtor 80 Go 7200 rpm
           ad6 -> Hitachi 80 Go 7200 rpm

one more question: is dd(1) a good way to duplicate a boot drive to =20
make a bootable spare disk ?

patpro=

From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 15:17:05 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 86A5F16A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 15:17:05 +0000 (GMT)
	(envelope-from killing@multiplay.co.uk)
Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B172643D48
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 15:17:04 +0000 (GMT)
	(envelope-from killing@multiplay.co.uk)
Received: from vader ([212.135.219.179])
	by multiplay.co.uk (multiplay.co.uk [85.236.96.23])
	(MDaemon.PRO.v8.1.3.R) with ESMTP id md50001869671.msg
	for <freebsd-performance@freebsd.org>; Sun, 02 Oct 2005 16:16:05 +0100
Message-ID: <003e01c5c764$3391d720$b3db87d4@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: <freebsd-performance@freebsd.org>, "Patrick Proniewski" <patpro@patpro.net>
References: <7035DA0C-E43C-478B-9B1C-6A32545E5E63@patpro.net>
Date: Sun, 2 Oct 2005 16:15:58 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=response
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2670
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670
X-Spam-Processed: multiplay.co.uk, Sun, 02 Oct 2005 16:16:05 +0100
	(not processed: message from valid local sender)
X-MDRemoteIP: 212.135.219.179
X-Return-Path: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-performance@freebsd.org
X-MDAV-Processed: multiplay.co.uk, Sun, 02 Oct 2005 16:16:07 +0100
Cc: 
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 15:17:05 -0000

That's actually pretty good for a sustained read / write on a single disk.

    Steve
----- Original Message ----- 
From: "Patrick Proniewski" <patpro@patpro.net>
To: <freebsd-performance@freebsd.org>
Sent: Sunday, October 02, 2005 3:57 PM
Subject: dd(1) performance when copiing a disk to another


Hi,
(carte mère supermicro
chip SATA Intel 6300ESB)

I run FreeBSD 5.4 on a PIV 3GHz (SuperMicro motherboard, Intel SATA
6300ESB chipset) with 2 SATA HDD. I'm in the process to duplicate the
boot HDD to the second HDD. I run dd for that:

# dd if=/dev/ad4 of=/dev/ad6 bs=1m

It yields to poor performances:

$ iostat -dhKw 1
(...)
              ad4              ad6
   KB/t tps  MB/s   KB/t tps  MB/s
  124.49 252 30.69  128.00 246 30.69
  128.00 285 35.64  128.00 279 34.90
  128.00 282 35.27  128.00 283 35.40
(...)

Is it normal that data rate won't go upper than 35/38 MB/s ?

HDDs are: ad4 -> Maxtor 80 Go 7200 rpm
           ad6 -> Hitachi 80 Go 7200 rpm

one more question: is dd(1) a good way to duplicate a boot drive to
make a bootable spare disk ?

patpro_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org"


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 15:59:32 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 36EB516A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 15:59:32 +0000 (GMT)
	(envelope-from arne_woerner@yahoo.com)
Received: from web30313.mail.mud.yahoo.com (web30313.mail.mud.yahoo.com
	[68.142.201.231])
	by mx1.FreeBSD.org (Postfix) with SMTP id 7339243D6E
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 15:59:27 +0000 (GMT)
	(envelope-from arne_woerner@yahoo.com)
Received: (qmail 19633 invoked by uid 60001); 2 Oct 2005 15:59:26 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=qtqj3U4Srz+NFHnSe6rDXja82fi6yXlnMs2XU3S6LvAIXwK0eIW7s2U9waKc8XlAX7KHsm5lEcr4MSfd5Eo53E4ygSOkbFv7ZICNSw64lpvaiGHkAuJeoDucc7vJU9VhbIE0KMuAk4OL9lNF2s4fBXLt18SOtDgPgtK3nlapmvY=
	; 
Message-ID: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com>
Received: from [213.54.70.142] by web30313.mail.mud.yahoo.com via HTTP;
	Sun, 02 Oct 2005 08:59:26 PDT
Date: Sun, 2 Oct 2005 08:59:26 -0700 (PDT)
From: Arne "Wörner" <arne_woerner@yahoo.com>
To: Steven Hartland <killing@multiplay.co.uk>,
	freebsd-performance@freebsd.org, Patrick Proniewski <patpro@patpro.net>
In-Reply-To: <003e01c5c764$3391d720$b3db87d4@multiplay.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: 
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 15:59:32 -0000

--- Steven Hartland <killing@multiplay.co.uk> wrote:
> From: "Patrick Proniewski" <patpro@patpro.net>
>> # dd if=/dev/ad4 of=/dev/ad6 bs=1m
>> 
>> It yields to poor performances:
>>
> That's actually pretty good for a sustained read / write on a
> single disk.
>
Does somebody know, why this is "pretty good"? I mean: Where is
the bottleneck?

As far as I know, SATA is quite fast... And memory to memory
copies are quite fast... disc<->memory should be quite fast, too.

>> Is it normal that data rate won't go upper than 35/38 MB/s ?
>>
Hmm...

Can u find out, if DMA transfers are enabled for those discs?
What does dmesg say?
What does "sysctl hw.ata.ata_dma" say?
Maybe atacontrol(8) says something useful about SATA discs, too
(e. g. atacontrol mode 0)?

Can u try the following commands, when the system (especially the
discs) is idle?
#dd if=/dev/ad4 of=/dev/null bs=1m count=1000
#dd if=/dev/zero of=/dev/null bs=1m count=1000

(Maybe you could find a way to copy /dev/zero to /dev/ad6 without
destroying the previous work... :-))
E. g.:
# dd if=/dev/ad6 of=/tmp/arne bs=1m count=1000
# dd if=/dev/zero of=/dev/ad6 bs=1m count=1000
# dd if=/tmp/arne of=/dev/ad6 bs=1m count=1000
)

> one more question: is dd(1) a good way to duplicate a boot
> drive to make a bootable spare disk ?
> 
I say, is the file system on /dev/ad4 read only during the "dd"?
If /dev/ad4 changes before "dd" completes, ad6 might need a fsck
or ad6 might be useless...

Btw.: I use gmirror(8)... But then an unintentional, fatal change
von ad4 would be fatal for ad6, too... :-)) So I have to hope,
that I do not type things, I shall not type (luckily I have some
boot CDs for that unlikely case ;-)) )...

-Arne


__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 16:34:34 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C274516A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 16:34:34 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from smtp3-g19.free.fr (smtp3-g19.free.fr [212.27.42.29])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 243D743D45
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 16:34:34 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from [82.235.12.223] (boleskine.patpro.net [82.235.12.223])
	by smtp3-g19.free.fr (Postfix) with ESMTP id 4BA2826548;
	Sun,  2 Oct 2005 18:34:30 +0200 (CEST)
In-Reply-To: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com>
References: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com>
Mime-Version: 1.0 (Apple Message framework v734)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <BA9E66B6-BEAE-47F5-A8BA-08686C24CF1F@patpro.net>
Content-Transfer-Encoding: 7bit
From: Patrick Proniewski <patpro@patpro.net>
Date: Sun, 2 Oct 2005 18:34:29 +0200
To: =?ISO-8859-1?Q?Arne_"W=F6rner"?= <arne_woerner@yahoo.com>
X-Mailer: Apple Mail (2.734)
Cc: freebsd-performance@freebsd.org, Steven Hartland <killing@multiplay.co.uk>
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 16:34:34 -0000

Hi,

> Can u find out, if DMA transfers are enabled for those discs?
> What does dmesg say?

see end of mail for full dmesg output,


> What does "sysctl hw.ata.ata_dma" say?

hw.ata.ata_dma: 1


> Maybe atacontrol(8) says something useful about SATA discs, too
> (e. g. atacontrol mode 0)?

# atacontrol mode 0
Master = BIOSPIO
Slave  = BIOSPIO


> Can u try the following commands, when the system (especially the
> discs) is idle?
> #dd if=/dev/ad4 of=/dev/null bs=1m count=1000

# dd if=/dev/ad4 of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 17.647464 secs (59417943 bytes/sec)

> #dd if=/dev/zero of=/dev/null bs=1m count=1000

# dd if=/dev/zero of=/dev/null bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 0.199381 secs (5259154109 bytes/sec)

> (Maybe you could find a way to copy /dev/zero to /dev/ad6 without
> destroying the previous work... :-))

well, not very easy both disk are the same size ;)


>> one more question: is dd(1) a good way to duplicate a boot
>> drive to make a bootable spare disk ?

> I say, is the file system on /dev/ad4 read only during the "dd"?
> If /dev/ad4 changes before "dd" completes, ad6 might need a fsck
> or ad6 might be useless...

well, ad4 is not read only, but I've shutdown every unnecessary  
services, and finally the ad6 hdd is bootable ! It boots ok and every  
things seems to work as well as on the ad4 disk. It's ok for me, it's  
just a spare emergency disk.

thanks,

Pat

dmesg :

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights  
reserved.
FreeBSD 5.4-RELEASE-p6 #0: Mon Aug 29 15:58:58 CEST 2005
     root@toto.patpro.net:/usr/obj/usr/src/sys/PATPRO-20050829
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2994.90-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
    
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE 
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   Hyperthreading: 2 logical CPUs
real memory  = 1072562176 (1022 MB)
avail memory = 1044230144 (995 MB)
ACPI APIC Table: <IntelR AWRDACPI>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <IntelR AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 3.0 on pci0
pci1: <ACPI PCI bus> on pcib1
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port  
0xc000-0xc01f mem 0xf2000000-0xf201ffff irq 18 at device 1.0 on pci1
em0: Ethernet address: 00:30:48:83:ef:8c
em0:  Speed:N/A  Duplex:N/A
pcib2: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib2
uhci0: <UHCI (generic) USB controller> port 0xe100-0xe11f irq 16 at  
device 29.0 on pci0
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0xe000-0xe01f irq 19 at  
device 29.1 on pci0
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: <base peripheral> at device 29.4 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 29.5 (no  
driver attached)
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf2100000-0xf21003ff  
irq 23 at device 29.7 on pci0
usb2: EHCI version 1.0
usb2: companion controllers, 2 ports each: usb0 usb1
usb2: <EHCI (generic) USB 2.0 controller> on ehci0
usb2: USB revision 2.0
uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 4 ports with 4 removable, self powered
pcib3: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pci3: <display, VGA> at device 9.0 (no driver attached)
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port  
0xd100-0xd13f mem 0xf1000000-0xf101ffff irq 19 at device 10.0 on pci3
em1: Ethernet address: 00:30:48:83:ef:8d
em1:  Speed:N/A  Duplex:N/A
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 6300ESB UDMA100 controller> port 0xf000-0xf00f, 
0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <Intel 6300ESB SATA150 controller> port 0xe600-0xe60f, 
0xe500-0xe503,0xe400-0xe407,0xe300-0xe303,0xe200-0xe207 irq 18 at  
device 31.2 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_tz0: <Thermal Zone> on acpi0
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on  
acpi0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10  
on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <Standard parallel printer port> port 0x778-0x77b,0x378-0x37f  
irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on  
isa0
Timecounters tick every 10.000 msec
em0: Link is up 100 Mbps Full Duplex
ad4: 78167MB <Maxtor 6Y080M0/YAR51HW0> [158816/16/63] at ata2-master  
SATA150
ad6: 194481MB <Maxtor 6L200M0/BANC1E00> [395136/16/63] at ata3-master  
SATA150  <-- this is _not_ the ad6 I've used dd on
                                                                         
             this is my regular ad6 storage disk.
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/ad4s1a
em0: Link is up 100 Mbps Full Duplex
Accounting enabled
pflog0: promiscuous mode enabled
em0: Link is up 100 Mbps Full Duplex
em0: Link is up 100 Mbps Full Duplex


From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 17:04:47 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9E3E916A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 17:04:47 +0000 (GMT)
	(envelope-from arne_woerner@yahoo.com)
Received: from web30303.mail.mud.yahoo.com (web30303.mail.mud.yahoo.com
	[68.142.200.96]) by mx1.FreeBSD.org (Postfix) with SMTP id 378EB43D45
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 17:04:47 +0000 (GMT)
	(envelope-from arne_woerner@yahoo.com)
Received: (qmail 78676 invoked by uid 60001); 2 Oct 2005 17:04:46 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
	b=Ie1Vqihmzx5zEt7vSyA3iyAVDpMVuzPoFEwmFHdoRACalN9qAKq+H1YfeCa/Tl8eyDU2x9c9rruqdMXzaP+pSFDJgPSDjJO0b4MiekYWmG4BpQDTcZCojw09xZCdMDPgoJO89MCvw+FIv8pdQAYOJxyISpgZzPHsVM6esD6/V0Q=
	; 
Message-ID: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
Received: from [213.54.70.142] by web30303.mail.mud.yahoo.com via HTTP;
	Sun, 02 Oct 2005 10:04:46 PDT
Date: Sun, 2 Oct 2005 10:04:46 -0700 (PDT)
From: Arne "Wörner" <arne_woerner@yahoo.com>
To: Patrick Proniewski <patpro@patpro.net>
In-Reply-To: <BA9E66B6-BEAE-47F5-A8BA-08686C24CF1F@patpro.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Cc: freebsd-performance@freebsd.org
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 17:04:47 -0000

Hi!

--- Patrick Proniewski <patpro@patpro.net> wrote:
> > Can u find out, if DMA transfers are enabled for those discs?
> > What does dmesg say?
> 
> see end of mail for full dmesg output,
>
Looks good... :-)) But I never saw FBSD's kernel messages about
SATA drives... ;-)

> > Maybe atacontrol(8) says something useful about SATA discs,
> > too (e. g. atacontrol mode 0)?
> 
> # atacontrol mode 0
> Master = BIOSPIO
> Slave  = BIOSPIO
>
Hmm... 0 seems to be the wrong ata... Thats why the output does
not fit to SATA drives, I think...

> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 17.647464 secs (59417943
> bytes/sec)
>
That seems to be 2 or about 2 times faster than disc->disc
transfer... But still slower, than I would have expected...
SATA150 sounds like the drive can do 150MB/sec...

As far as I know, SATA busses are independant from each other (no
master/slave; every drive gets its own cable)... Maybe "dd" cannot
issue a read request, while the write isn't completed? DMA
shouldn't be the problem, since the memory interface is quite fast
in your case...

So there remain the questions:
1. Why does the read speed drop in ur setting (maybe writing to
ad6 takes more time than reading from ad4? u could try to run two
dd processes one with if=ad4 and the other with if=ad6)?
2. Why can't we reach 150MB/sec?

> > (Maybe you could find a way to copy /dev/zero to /dev/ad6
> > without destroying the previous work... :-))
>
> well, not very easy both disk are the same size ;)
>
I thought of the first 1000 1MB blocks... :-)
The write speed might be interesting...
 
-Arne


__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 17:44:16 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C9C0916A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 17:44:16 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 69D8043D45
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 17:44:15 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [192.168.42.23] (andersonbox3.centtech.com [192.168.42.23])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id j92HiED1004567;
	Sun, 2 Oct 2005 12:44:14 -0500 (CDT)
	(envelope-from anderson@centtech.com)
Message-ID: <43401C62.2040606@centtech.com>
Date: Sun, 02 Oct 2005 12:44:02 -0500
From: Eric Anderson <anderson@centtech.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.11) Gecko/20050914
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: =?ISO-8859-15?Q?=22Arne_=5C=22W=F6rner=5C=22=22?= <arne_woerner@yahoo.com>
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
In-Reply-To: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: ClamAV 0.82/1107/Sun Oct 2 03:09:39 2005 on mh1.centtech.com
X-Virus-Status: Clean
Cc: freebsd-performance@freebsd.org, Patrick Proniewski <patpro@patpro.net>
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 17:44:17 -0000

Arne W=F6rner wrote:
> Hi!
>=20
> --- Patrick Proniewski <patpro@patpro.net> wrote:
>=20
>>>Can u find out, if DMA transfers are enabled for those discs?
>>>What does dmesg say?
>>
>>see end of mail for full dmesg output,
>>
>=20
> Looks good... :-)) But I never saw FBSD's kernel messages about
> SATA drives... ;-)
>=20
>=20
>>>Maybe atacontrol(8) says something useful about SATA discs,
>>>too (e. g. atacontrol mode 0)?
>>
>># atacontrol mode 0
>>Master =3D BIOSPIO
>>Slave  =3D BIOSPIO
>>
>=20
> Hmm... 0 seems to be the wrong ata... Thats why the output does
> not fit to SATA drives, I think...
>=20
>=20
>># dd if=3D/dev/ad4 of=3D/dev/null bs=3D1m count=3D1000
>>1000+0 records in
>>1000+0 records out
>>1048576000 bytes transferred in 17.647464 secs (59417943
>>bytes/sec)
>>
>=20
> That seems to be 2 or about 2 times faster than disc->disc
> transfer... But still slower, than I would have expected...
> SATA150 sounds like the drive can do 150MB/sec...
>=20
> As far as I know, SATA busses are independant from each other (no
> master/slave; every drive gets its own cable)... Maybe "dd" cannot
> issue a read request, while the write isn't completed? DMA
> shouldn't be the problem, since the memory interface is quite fast
> in your case...
>=20
> So there remain the questions:
> 1. Why does the read speed drop in ur setting (maybe writing to
> ad6 takes more time than reading from ad4? u could try to run two
> dd processes one with if=3Dad4 and the other with if=3Dad6)?
> 2. Why can't we reach 150MB/sec?

The reason why 35-40MB/s is good is because the drive itself cannot=20
stream any faster.  SATA-150 interface is rated at 150MB/s, but the disk =

cannot get close.  Look at the specs for the drive, and you'll see that=20
the sustained rate is much lower than the burst speed.  If you want fast =

performance on a SATA disk, you'll need to buy a WD Raptor drive (74GB)=20
- that will get you more speed, but still not the 150MB/s.

>>>(Maybe you could find a way to copy /dev/zero to /dev/ad6
>>>without destroying the previous work... :-))
>>
>>well, not very easy both disk are the same size ;)
>>
>=20
> I thought of the first 1000 1MB blocks... :-)
> The write speed might be interesting...

Instead of dd, why not use gmirror?

Also - reads can be faster since the drive can read-ahead a number of=20
blocks into the cache in an efficient manner, but writes have to be=20
streamed to disk as they come in (going through the cache, and=20
buffering, but you get the idea).

Have you tried a smaller block size?  What does 8k, 16k, or 512k do for=20
you?  There really isn't much room for improvement here on a single devic=
e.


Eric


--=20
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------


From owner-freebsd-performance@FreeBSD.ORG  Sun Oct  2 18:25:37 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 61DCD16A41F
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 18:25:37 +0000 (GMT)
	(envelope-from killing@multiplay.co.uk)
Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id AC5D943D49
	for <freebsd-performance@freebsd.org>;
	Sun,  2 Oct 2005 18:25:36 +0000 (GMT)
	(envelope-from killing@multiplay.co.uk)
Received: from vader ([212.135.219.179])
	by multiplay.co.uk (multiplay.co.uk [85.236.96.23])
	(MDaemon.PRO.v8.1.3.R) with ESMTP id md50001870022.msg
	for <freebsd-performance@freebsd.org>; Sun, 02 Oct 2005 19:25:30 +0100
Message-ID: <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: =?iso-8859-1?Q?Arne_W=F6rner?= <arne_woerner@yahoo.com>,
	"Patrick Proniewski" <patpro@patpro.net>
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
Date: Sun, 2 Oct 2005 19:25:20 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2670
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670
X-Spam-Processed: multiplay.co.uk, Sun, 02 Oct 2005 19:25:30 +0100
	(not processed: message from valid local sender)
X-MDRemoteIP: 212.135.219.179
X-Return-Path: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-performance@freebsd.org
X-MDAV-Processed: multiplay.co.uk, Sun, 02 Oct 2005 19:25:31 +0100
Cc: freebsd-performance@freebsd.org
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Oct 2005 18:25:37 -0000

----- Original Message ----- 
From: "Arne Wörner" <arne_woerner@yahoo.com>
> That seems to be 2 or about 2 times faster than disc->disc
> transfer... But still slower, than I would have expected...
> SATA150 sounds like the drive can do 150MB/sec...

LOL, you might want to read up on what SATA150 means.
In short it the max throughput the interface can sustain. It is NOT
what you can get of a single disk which is still fare from that,
SATA disk transfer rates typically 30 -> 50MB/s sustained.

    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137
or return the E.mail to postmaster@multiplay.co.uk.


From owner-freebsd-performance@FreeBSD.ORG  Mon Oct  3 07:55:58 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 53DFB16A41F
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 07:55:58 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from smtp.univ-lyon2.fr (smtp.univ-lyon2.fr [159.84.143.102])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DA89443D4C
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 07:55:57 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from localhost (localhost [127.0.0.1])
	by smtp.univ-lyon2.fr (Postfix) with ESMTP
	id 7006E170E0AD; Mon,  3 Oct 2005 09:55:56 +0200 (CEST)
Received: from smtp.univ-lyon2.fr ([127.0.0.1])
	by localhost (smtp.univ-lyon2.fr [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id 12936-04; Mon,  3 Oct 2005 09:55:50 +0200 (CEST)
Received: from [159.84.142.75] (dhcp-159-84-142-75.univ-lyon2.fr
	[159.84.142.75]) by smtp.univ-lyon2.fr (Postfix) with ESMTP
	id C6D79170E081; Mon,  3 Oct 2005 09:55:50 +0200 (CEST)
In-Reply-To: <43401C62.2040606@centtech.com>
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
	<43401C62.2040606@centtech.com>
Mime-Version: 1.0 (Apple Message framework v734)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net>
Content-Transfer-Encoding: 7bit
From: Patrick Proniewski <patpro@patpro.net>
Date: Mon, 3 Oct 2005 09:55:49 +0200
To: Eric Anderson <anderson@centtech.com>
X-Mailer: Apple Mail (2.734)
X-Virus-Scanned: amavisd-new at univ-lyon2.fr
Cc: freebsd-performance@freebsd.org,
	"=?ISO-8859-1?Q? Arne_\"W=F6rner\" ?=" <arne_woerner@yahoo.com>
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Oct 2005 07:55:58 -0000

Hi Arne and Eric,

>>> # atacontrol mode 0
>>> Master = BIOSPIO
>>> Slave  = BIOSPIO

>> Hmm... 0 seems to be the wrong ata... Thats why the output does
>> not fit to SATA drives, I think...

oups... I'll have to do it again with channels 2 and 3


>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000
>>> 1000+0 records in
>>> 1000+0 records out
>>> 1048576000 bytes transferred in 17.647464 secs (59417943
>>> bytes/sec)

>> That seems to be 2 or about 2 times faster than disc->disc
>> transfer... But still slower, than I would have expected...
>> SATA150 sounds like the drive can do 150MB/sec...

As Eric pointed out, you just can"t reach 150 MB/s with one disk,  
it's a technological maximum for the bus, but real world performance  
is well bellow this max.
In fact, I've though I would reach about 50 to 60 MB/s.

>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6
>>>> without destroying the previous work... :-))
>>>
>>> well, not very easy both disk are the same size ;)

>> I thought of the first 1000 1MB blocks... :-)

damn, I misread this one... :)
I'm gonna try this asap.


> Instead of dd, why not use gmirror?

I had no idea gmirror exists, but I'll continue with dd. It's a one  
time experiment.


> Have you tried a smaller block size?  What does 8k, 16k, or 512k do  
> for you?  There really isn't much room for improvement here on a  
> single device.

nop, I'll try one of them, but I can't do many experiments, the box  
is in my living room, it's a 1U rack, and it's VERY VERY noisy. My  
girlfriend will kill me if it's running more than an hour a day :))


Pat

From owner-freebsd-performance@FreeBSD.ORG  Mon Oct  3 14:21:29 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C5F7116A422
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 14:21:29 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 109A843D45
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 14:21:28 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])
	by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j93ELJVO012644; Tue, 4 Oct 2005 00:21:19 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j93ELFtj021117; Tue, 4 Oct 2005 00:21:15 +1000
Date: Tue, 4 Oct 2005 00:21:15 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Patrick Proniewski <patpro@patpro.net>
In-Reply-To: <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net>
Message-ID: <20051003222844.R44500@delplex.bde.org>
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
	<43401C62.2040606@centtech.com>
	<618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-performance@freebsd.org, Eric Anderson <anderson@centtech.com>,
	"=?ISO-8859-1?Q? Arne_\"W=F6rner\" ?=" <arne_woerner@yahoo.com>
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Oct 2005 14:21:30 -0000

On Mon, 3 Oct 2005, Patrick Proniewski wrote:

>>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000
>>>> 1000+0 records in
>>>> 1000+0 records out
>>>> 1048576000 bytes transferred in 17.647464 secs (59417943
>>>> bytes/sec)

Many wrong answers to the original question have been given.  dd with
a blocks size of 1m between (separate) disk devices is much slower
just because that block size is far too large...

The above is a fairly normal speed.  The expected speed depends mainly
on the disk technology generation and the placement of the sectors being
read.  I get the following speeds for _sequential_ _reading- from the
outer (fastest) tracks of 6- and 3-year old drives which are about 2
generations apart:

%%%
Sep 25 21:52:35 besplex kernel: ad0: 29314MB <IBM-DTLA-307030> [59560/16/63] at ata0-master UDMA100
Sep 25 21:52:35 besplex kernel: ad2: 58644MB <IC35L060AVV207-0> [119150/16/63] at ata1-master UDMA100
ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 bytes/sec)
ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 bytes/sec)
ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 bytes/sec)
ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 bytes/sec)
ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 bytes/sec)
ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 bytes/sec)
ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 bytes/sec)
ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 bytes/sec)
ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 bytes/sec)
ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 bytes/sec)
ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 bytes/sec)
ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 bytes/sec)
ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 bytes/sec)
ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 bytes/sec)
ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 bytes/sec)
ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 bytes/sec)
ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 bytes/sec)
ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec)
ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec)
ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 bytes/sec)
ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec)
ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 bytes/sec)
ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec)
ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 bytes/sec)
%%%

Drive technology hit a speed plateau a few years ago so newer single drives
aren't much faster unless they are more expensive and/or smaller.

The speed is low for small block sizes because the device has to be
talked too too much and the protocol and firmware are not very good.
(Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps
up to about half speed (26MB/sec) for a block size of 4K but sticks
at that speed for block sizes 8K and 16K, then jumps up to full speed
for a block sizes of 32K and larger.  This indicates some firmware
stupidness).  Most drives ramp up almost logarithmically (doubling
the block size almost doubles the speed).  This behaviour is especially
evident on slow SCSI drives like some (most?) ZIP and dvd/cd.  The
command overhead can be 20 msec, so you had better not do 1 512 bytes
of i/o per command or you will get a speed of 25K/sec.  The command
overhead of a new ATA drive is more like 50 usec, but that is still
far too much for high speed with a block size of 512 bytes.

The speed is insignificantly different for block sizes larger than a
limit because the drive's physical limits dominate except possibly
with old (slow) CPUs.

>>> That seems to be 2 or about 2 times faster than disc->disc
>>> transfer... But still slower, than I would have expected...
>>> SATA150 sounds like the drive can do 150MB/sec...
>
> As Eric pointed out, you just can"t reach 150 MB/s with one disk, it's a 
> technological maximum for the bus, but real world performance is well bellow 
> this max.
> In fact, I've though I would reach about 50 to 60 MB/s.

50-60 MB/s is about right.  I haven't benchmarked any SATA or very new
drives.  Apparently they are not much faster.  ISTR that WDC Raptors are
speced for 70-80MB/sec.  You pay twice as much to get a tiny drive with
only 25% more throughput plus faster seeks.

>>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6
>>>>> without destroying the previous work... :-))
>>>> 
>>>> well, not very easy both disk are the same size ;)
>
>>> I thought of the first 1000 1MB blocks... :-)
>
> damn, I misread this one... :)
> I'm gonna try this asap.

I divide disks into equally sized (fairly small, or half the disk size)
partitions, and cp between them.  dd is too hard to use for me ;-).  cp
is easier to type and automatically picks a reasonable block size.  Of
course I use dd if the block size needs to be controlled, but mostly I
only use it in preference to cp to get its timing info.

>...
>> Have you tried a smaller block size?  What does 8k, 16k, or 512k do for 
>> you?  There really isn't much room for improvement here on a single device.
>
> nop, I'll try one of them, but I can't do many experiments, the box is in my 
> living room, it's a 1U rack, and it's VERY VERY noisy. My girlfriend will 
> kill me if it's running more than an hour a day :))

Smaller block sizes will go much faster, except for copying from a disk to
itself.  Large block sizes are normally a pessimization and the pessimization
is especially noticeable for dd.  Just use the smallest block size that gives
an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly
different for writing).  Large block sizes are pessimal for synchronous
i/o like dd does.  The timing for dd'ing blocks of size N MB at R MB/sec
between ad0 and ad2 is something like:

 	time in secs	activity on ad0		activity on ad2
 	------------	---------------		---------------
 	0		start read of 1MB	idle
 	N/R		finish read; idle	start write of 1MB
 	N/R-epsilon	start read of 1MB	pretend to complete write
 	N/R		continue read		complete write
 	N/R-epsilon	finish read; idle	start write of 1MB
 	N/R-2*epsilon	...			...

After the first block (which takes a little longer), it takes N/R-epsilon
seconds to copy 1 block, where epsilon is the time between the writer's
pretending to complete the write and actually completing it.  This time
is obviously not very dependent on the block size since it is limited by
drives resources and policies (in particular, if the drive doesn't do write
caching, perhaps because write caching is not enabled, then epsilon is 0,
and if out block size is large compared with the drive's cache then the
drive won't be able to signal completion until no more than the drive's
cache size is left to do).  Thus epsilon becomes small relative to the
N/R term when N is large.  Apparently, in your case the speed drops from
59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200.

With large block sizes, the speed can be increased using asyncronous output.
There is a utility (in ports) named team that fakes async output using
separate processes.  I have never used it.  Somthing as simple as 2
dd's in a pipe should work OK.

For copying from a disk itself, a large block sizes is needed to limit the
number of seeks, and concurrent reads and writes are exactly what is not
needed (since they would give competing seeks).  The i/o must be
sequentialized, and dd does the right things for this, though the drive
might not (you would prefer epsilon == 0, since if the drive signals
write completion early then it might get confused when you flood it
with the next read and seek to start the read before it completes the
write, then thrash back and forth between writing and reading).

It is interesting that writing large sequential files to at least the
ffs file system (not mounted with -sync) in FreeBSD is slightly faster
than writing directly to the raw disk using write(2), even if the
device driver sees almost the same block sizes for these different
operations.  This is because write(2) is synchronous and sync writes
always cause idle periods (the idle periods are just much smaller for
writing data that is already in memory), while the kernel uses async
writes for data.

Bruce

From owner-freebsd-performance@FreeBSD.ORG  Mon Oct  3 14:55:43 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E5E9916A41F
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 14:55:43 +0000 (GMT)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Received: from mail.pgt.mpt.gov.br (mail.pgt.mpt.gov.br [200.157.62.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1DE2243D45
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 14:55:42 +0000 (GMT)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Received: from [10.0.0.136] (516e.pgt.mpt.gov.br [10.0.0.136])
	by mail.pgt.mpt.gov.br (8.13.1/8.13.1) with ESMTP id j93EteiT066135
	for <freebsd-performance@freebsd.org>;
	Mon, 3 Oct 2005 11:55:40 -0300 (BRST)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Message-ID: <434146CA.8010803@pgt.mpt.gov.br>
Date: Mon, 03 Oct 2005 11:57:14 -0300
From: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?=
 <tuliogs@pgt.mpt.gov.br>
User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-performance@freebsd.org
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
	<004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk>
In-Reply-To: <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk>
Content-Type: multipart/mixed; boundary="------------030603030606000800030004"
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Oct 2005 14:55:44 -0000

This is a multi-part message in MIME format.
--------------030603030606000800030004
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

Steven Hartland wrote:

> ----- Original Message ----- From: "Arne Wörner" <arne_woerner@yahoo.com>
>
>> That seems to be 2 or about 2 times faster than disc->disc
>> transfer... But still slower, than I would have expected...
>> SATA150 sounds like the drive can do 150MB/sec...
>
>
> LOL, you might want to read up on what SATA150 means.
> In short it the max throughput the interface can sustain. It is NOT
> what you can get of a single disk which is still fare from that,
> SATA disk transfer rates typically 30 -> 50MB/s sustained.
>
>    Steve

  Indeed. In other words, that represents the max transfer rates between 
the SATA controller and the disk´s controller (at best, you´ll get close 
to it when reading from the disk´s onboard cache), but the media will 
always be much slower.
  But just to clear out some questions...
  1) Maxtor´s full specifications for Diamond Max+ 9 Series refers to 
maximum *sustained* transfer rates of 37MB/s and 67MB/s for "ID" and 
"OD", respectively (though I couldn´d find exactly what it means, I 
deduced that represents the rates for center- and border-parts of the 
disk - please correct me if I´m wrong), then your tests show you´re 
getting the best out of it ;) ;
  2) Mr. Hartland mentioned the numbers to be good for a single drive, 
therefore it´s a bit better for a disk-to-disk, where the limit should 
be the slower disk´s performance. I couldn´t look for the specs of the 
Toshiba since I didn´t have the exact model, but I would expect it to be 
equal or faster than the Maxtor, since it does not appear to be a 
bottleneck.

  One last thought, though, for the specialists: iostat showed maximum 
of 128KB/transfer, even though dd should be using 1MB blocks... is that 
an expected behaviour? Shouldn´t iostat show 1024Kb/t, then?
  Thanks for your attention,

Tulio G. da Silva

--------------030603030606000800030004--

From owner-freebsd-performance@FreeBSD.ORG  Mon Oct  3 15:14:29 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C578916A41F
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 15:14:29 +0000 (GMT)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Received: from mail.pgt.mpt.gov.br (mail.pgt.mpt.gov.br [200.157.62.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id CE16043D48
	for <freebsd-performance@freebsd.org>;
	Mon,  3 Oct 2005 15:14:28 +0000 (GMT)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Received: from [10.0.0.136] (516e.pgt.mpt.gov.br [10.0.0.136])
	by mail.pgt.mpt.gov.br (8.13.1/8.13.1) with ESMTP id j93F6vGc066486
	for <freebsd-performance@freebsd.org>;
	Mon, 3 Oct 2005 12:06:57 -0300 (BRST)
	(envelope-from tuliogs@pgt.mpt.gov.br)
Message-ID: <4341496F.9020703@pgt.mpt.gov.br>
Date: Mon, 03 Oct 2005 12:08:31 -0300
From: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?=
 <tuliogs@pgt.mpt.gov.br>
User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-performance@freebsd.org
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>	<43401C62.2040606@centtech.com>	<618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net>
	<20051003222844.R44500@delplex.bde.org>
In-Reply-To: <20051003222844.R44500@delplex.bde.org>
Content-Type: multipart/mixed; boundary="------------080001030101000801070700"
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: dd(1) performance when copying a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Oct 2005 15:14:29 -0000

This is a multi-part message in MIME format.
--------------080001030101000801070700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

  Phew, thanks for that. :) This seems to answer my question in the 
other "leg" of the thread, though it hadn´t yet arrived to me when I 
wrote the message, though.
  Now THAT´s a quite good explanation. ;) Thanks again,

Tulio G. da Silva

Bruce Evans wrote:

> On Mon, 3 Oct 2005, Patrick Proniewski wrote:
>
>>>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000
>>>>> 1000+0 records in
>>>>> 1000+0 records out
>>>>> 1048576000 bytes transferred in 17.647464 secs (59417943
>>>>> bytes/sec)
>>>>
>
> Many wrong answers to the original question have been given.  dd with
> a blocks size of 1m between (separate) disk devices is much slower
> just because that block size is far too large...
>
> The above is a fairly normal speed.  The expected speed depends mainly
> on the disk technology generation and the placement of the sectors being
> read.  I get the following speeds for _sequential_ _reading- from the
> outer (fastest) tracks of 6- and 3-year old drives which are about 2
> generations apart:
>
> %%%
> Sep 25 21:52:35 besplex kernel: ad0: 29314MB <IBM-DTLA-307030> 
> [59560/16/63] at ata0-master UDMA100
> Sep 25 21:52:35 besplex kernel: ad2: 58644MB <IC35L060AVV207-0> 
> [119150/16/63] at ata1-master UDMA100
> ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 
> bytes/sec)
> ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 
> bytes/sec)
> ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 
> bytes/sec)
> ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 
> bytes/sec)
> ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 
> bytes/sec)
> ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 
> bytes/sec)
> ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 
> bytes/sec)
> ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 
> bytes/sec)
> ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 
> bytes/sec)
> ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 
> bytes/sec)
> ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 
> bytes/sec)
> ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 
> bytes/sec)
> ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 
> bytes/sec)
> ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 
> bytes/sec)
> ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 
> bytes/sec)
> ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 
> bytes/sec)
> ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 
> bytes/sec)
> ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 
> bytes/sec)
> ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 
> bytes/sec)
> ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 
> bytes/sec)
> ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 
> bytes/sec)
> ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 
> bytes/sec)
> ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 
> bytes/sec)
> ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 
> bytes/sec)
> %%%
>
> Drive technology hit a speed plateau a few years ago so newer single 
> drives
> aren't much faster unless they are more expensive and/or smaller.
>
> The speed is low for small block sizes because the device has to be
> talked too too much and the protocol and firmware are not very good.
> (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps
> up to about half speed (26MB/sec) for a block size of 4K but sticks
> at that speed for block sizes 8K and 16K, then jumps up to full speed
> for a block sizes of 32K and larger.  This indicates some firmware
> stupidness).  Most drives ramp up almost logarithmically (doubling
> the block size almost doubles the speed).  This behaviour is especially
> evident on slow SCSI drives like some (most?) ZIP and dvd/cd.  The
> command overhead can be 20 msec, so you had better not do 1 512 bytes
> of i/o per command or you will get a speed of 25K/sec.  The command
> overhead of a new ATA drive is more like 50 usec, but that is still
> far too much for high speed with a block size of 512 bytes.
>
> The speed is insignificantly different for block sizes larger than a
> limit because the drive's physical limits dominate except possibly
> with old (slow) CPUs.
>
>>>> That seems to be 2 or about 2 times faster than disc->disc
>>>> transfer... But still slower, than I would have expected...
>>>> SATA150 sounds like the drive can do 150MB/sec...
>>>
>>
>> As Eric pointed out, you just can"t reach 150 MB/s with one disk, 
>> it's a technological maximum for the bus, but real world performance 
>> is well bellow this max.
>> In fact, I've though I would reach about 50 to 60 MB/s.
>
>
> 50-60 MB/s is about right.  I haven't benchmarked any SATA or very new
> drives.  Apparently they are not much faster.  ISTR that WDC Raptors are
> speced for 70-80MB/sec.  You pay twice as much to get a tiny drive with
> only 25% more throughput plus faster seeks.
>
>>>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6
>>>>>> without destroying the previous work... :-))
>>>>>
>>>>>
>>>>> well, not very easy both disk are the same size ;)
>>>>
>>
>>>> I thought of the first 1000 1MB blocks... :-)
>>>
>>
>> damn, I misread this one... :)
>> I'm gonna try this asap.
>
>
> I divide disks into equally sized (fairly small, or half the disk size)
> partitions, and cp between them.  dd is too hard to use for me ;-).  cp
> is easier to type and automatically picks a reasonable block size.  Of
> course I use dd if the block size needs to be controlled, but mostly I
> only use it in preference to cp to get its timing info.
>
>> ...
>>
>>> Have you tried a smaller block size?  What does 8k, 16k, or 512k do 
>>> for you?  There really isn't much room for improvement here on a 
>>> single device.
>>
>>
>> nop, I'll try one of them, but I can't do many experiments, the box 
>> is in my living room, it's a 1U rack, and it's VERY VERY noisy. My 
>> girlfriend will kill me if it's running more than an hour a day :))
>
>
> Smaller block sizes will go much faster, except for copying from a 
> disk to
> itself.  Large block sizes are normally a pessimization and the 
> pessimization
> is especially noticeable for dd.  Just use the smallest block size 
> that gives
> an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly
> different for writing).  Large block sizes are pessimal for synchronous
> i/o like dd does.  The timing for dd'ing blocks of size N MB at R MB/sec
> between ad0 and ad2 is something like:
>
>     time in secs    activity on ad0        activity on ad2
>     ------------    ---------------        ---------------
>     0        start read of 1MB    idle
>     N/R        finish read; idle    start write of 1MB
>     N/R-epsilon    start read of 1MB    pretend to complete write
>     N/R        continue read        complete write
>     N/R-epsilon    finish read; idle    start write of 1MB
>     N/R-2*epsilon    ...            ...
>
> After the first block (which takes a little longer), it takes N/R-epsilon
> seconds to copy 1 block, where epsilon is the time between the writer's
> pretending to complete the write and actually completing it.  This time
> is obviously not very dependent on the block size since it is limited by
> drives resources and policies (in particular, if the drive doesn't do 
> write
> caching, perhaps because write caching is not enabled, then epsilon is 0,
> and if out block size is large compared with the drive's cache then the
> drive won't be able to signal completion until no more than the drive's
> cache size is left to do).  Thus epsilon becomes small relative to the
> N/R term when N is large.  Apparently, in your case the speed drops from
> 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200.
>
> With large block sizes, the speed can be increased using asyncronous 
> output.
> There is a utility (in ports) named team that fakes async output using
> separate processes.  I have never used it.  Somthing as simple as 2
> dd's in a pipe should work OK.
>
> For copying from a disk itself, a large block sizes is needed to limit 
> the
> number of seeks, and concurrent reads and writes are exactly what is not
> needed (since they would give competing seeks).  The i/o must be
> sequentialized, and dd does the right things for this, though the drive
> might not (you would prefer epsilon == 0, since if the drive signals
> write completion early then it might get confused when you flood it
> with the next read and seek to start the read before it completes the
> write, then thrash back and forth between writing and reading).
>
> It is interesting that writing large sequential files to at least the
> ffs file system (not mounted with -sync) in FreeBSD is slightly faster
> than writing directly to the raw disk using write(2), even if the
> device driver sees almost the same block sizes for these different
> operations.  This is because write(2) is synchronous and sync writes
> always cause idle periods (the idle periods are just much smaller for
> writing data that is already in memory), while the kernel uses async
> writes for data.
>
> Bruce
> _______________________________________________
> freebsd-performance@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to 
> "freebsd-performance-unsubscribe@freebsd.org"
>
>

--------------080001030101000801070700--

From owner-freebsd-performance@FreeBSD.ORG  Tue Oct  4 00:48:55 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D390C16A420
	for <freebsd-performance@FreeBSD.org>;
	Tue,  4 Oct 2005 00:48:55 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2E46C43D45
	for <freebsd-performance@FreeBSD.org>;
	Tue,  4 Oct 2005 00:48:55 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])
	by mailout2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j940mqGb006413; Tue, 4 Oct 2005 10:48:52 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j940mmPD019518; Tue, 4 Oct 2005 10:48:50 +1000
Date: Tue, 4 Oct 2005 10:48:48 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= <tuliogs@pgt.mpt.gov.br>
In-Reply-To: <434146CA.8010803@pgt.mpt.gov.br>
Message-ID: <20051004075806.F45947@delplex.bde.org>
References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com>
	<004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk>
	<434146CA.8010803@pgt.mpt.gov.br>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-580949596-1128386928=:45947"
Cc: freebsd-performance@FreeBSD.org
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Oct 2005 00:48:56 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-580949596-1128386928=:45947
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Mon, 3 Oct 2005, [ISO-8859-1] Tulio Guimar=E3es da Silva wrote:

> But just to clear out some questions...
> 1) Maxtor=B4s full specifications for Diamond Max+ 9 Series refers to max=
imum=20
> *sustained* transfer rates of 37MB/s and 67MB/s for "ID" and "OD",=20
> respectively (though I couldn=B4d find exactly what it means, I deduced t=
hat=20
> represents the rates for center- and border-parts of the disk - please=20
> correct me if I=B4m wrong), then your tests show you=B4re getting the bes=
t out of=20
> it ;) ;
> much slower.

Another interesting point is that you can often get closer to the maximum
rate than the average of the maximum and minumum rate.  The outer tracks
contain more sectors (about 67/37 times as many with the above spec), so
the average rate over all sectors is larger than average of the max and min=
,
significantly so since 67/37 is a fairly large fraction.  Also, you can
often partition disks to put less-often accessed stuff in the slow parts.

> One last thought, though, for the specialists: iostat showed maximum of=
=20
> 128KB/transfer, even though dd should be using 1MB blocks... is that an=
=20
> expected behaviour? Shouldn=B4t iostat show 1024Kb/t, then?

The expected size is 64K.  128KB is due to a bug in GEOM, one that was
fixed a couple of days ago by tegge@.

iostat shows the size that reaches the disk driver.  The best size to
show is the size that reaches the disk hardware, but several layers
of abstraction, some excessive, make it impossible to show that size:

First there is the disk firmware layer above the disk hardware layer.
There is no way for the driver to know exacly what the firmware layer
is doing.  Good firmware will cluster i/o's and otherwise cache things
to minimize seeks and other disk accesses, in much the same way that
a good OS will do, but hopefully better because it can understand the
hardware better and use more specialized algorithms.

Next there is the driver layer.  Drivers shouldn't split up i/o, but
some at least used to, and they now cannot report such splitting to
devstat.  I can't see any splitting in the ad driver now -- I can only
see reduction of the max size from 255 to 128 sectors in the non-DMA
case, and the misnamed struct member atadev->max_iosize in this case
(this actually gives the max transfer size; in the DMA case, the max
transfer size is the same as the max i/o size, but in the non-DMA case
it is the number of sectors transferred per interrupt which is usually
much smaller than the max i/o size of DFLTPHYS =3D 64K).  The fd driver
at least used to split up i/o into single sectors.  20-25 years ago
when CPUs were slow even compared with floppies, this used to be a
good way to pessimize i/o.  A few years later, starting with about
386's, CPUs became fast enough to easily generate new requests in the
sector gap time so even poorly written fd drivers could keep floppies
streaming except across seeks to another track.  The fd driver never
reported this internal splitting to devstat, and maybe never should
have since it is close enough to the hardware to know that this splitting
is normal and/or doesn't affect efficiency.

Next there is the GEOM layer.  It splits up i/o's requested by the
next layer up according to the max size advertised by the driver.  The
latter is typically DFLTPHYS =3D 64K and often unrelated to the hardware;
MAXPHYS =3D 128K would be better if the hardware can handle it.  Until
a couple of days ago, reporting of this splitting was broken.  GEOM
reported to devstat the size passed to it and not the size that it
passed to drivers.  tegge@ fixed this.

For writes to raw disks, the next layer up is physread().  (Other cases
are even more complicated :-).)  physread() splits up i/o's into blocks
of max size dev->si_iosize_max.  This splitting is wrong for tape-like
devices but is almost harmless for disk-like devices.  Another bug in
GEOM Is bitrot in the setting of dev->si_iosize_max.  This should
normally be the same as the driver max size, and used to be set to the
same in in individual drivers in many cases including the ad driver,
but now most drivers don't set it and GEOM normally defaults it to
the bogus value MAXPHYS =3D 128K.  physread() also defaults it, but to
the different, safer, value DFLTPHYS =3D 64K.  The different max sizes
cause excessive splitting.  See below for examples.

For writes by dd, there are a few more layers (driver read, devfs read,
and write(2) at least).

So for writes of 1M from dd to an ad device with DMA enabled and the
normal DMA size of 64K, the following reblocking occurs:

     1M is split into 8*128K by physio() since dev->si_iosize_max is 128K
     8*128K is split into 16*64K by GEOM since dp->d_maxsize is mismatched =
(64K)

dp->max_size is 63K for a couple of controllers in the DMA case and possibl=
y
always for the acd driver (see the magic 65534 in atapi-cd.c).  Then the
bogus splitting is more harmful:

     1M is split into 8*128K by physio() (no difference)
     8*128K is split into 8 * (2*63K + 1*2K) by GEOM

The 1*2K splitting is especially pessimal.  The afd driver used to have
this bug internally, and still has it in RELENG_4.  Its max i/o (DMA)
size was 32K for ZIP disks that seem to be IOMEGA ones and 126K for
other drives.  dd'ing to ZIP drives was fast enough if you used a size
smaller than the max i/o size (but not very small), or with nice power
of 2 sizes for disks that seem to be IOMEGA ones, but a nice size of
128K caused the following bad splitting for non-IOMEGA ones:
128K =3D 1*126K + 1*2K.  Since accesses to ZIP disks take about 20 msec
per access, the 2K-block almost halved the transfer speed.

The normal ata DMA size of 64*1024 is also too magic -- it just happens
to equal DFLTPHYS so it only causes 1 bogus splitting in combination
with the other bugs.

For writes by dd, these bugs are easy to avoid if you know about them or
if you just fear them and test all reasonable block sizes to find the best
one.  Just use a block size large enough to be efficient but small enough
to not cause splitting, or in cases where the mismatches are only off-by-a
factor-of 2^n, large enough to cause even splitting.

For cases other than writes by dd, the bugs cause pessimal splitting.
E.g., file system clustering uses yet another bogusly intitialized max
i/o size, vp->v_mount->mnt_iosize_max.  This defaults to DFLTPHYS =3D
64K in the top vfs layer, but many file systems, including ffs, set
it to devvp->v_rdev->si_iosize_max, so it is normally set to the wrong
default set for the latter by GEOM, MAXPHYS =3D 128K.  This normally
causes excessive splitting which is especially harmful if the driver's
max is not a divisor of MAXPHYS.  E.g., when the driver's max is 63K,
writing a 256KB file to an ffs file system with the default fs-block
size of 16K causes the following bogus splitting even if ffs allocates
all the blocks optimally (contiguously):

At ffs level:
 =0912*16K (direct data blocks)
 =091*16K (indirect block; but ffs usually gets this wrong and doesn't
 =09       allocate it contiguously)
 =094*16K (data blocks indirected through the indirect block)

At clustering level:
 =0917*16K reblocked to 2*128K + 1*16K

At device driver level:
 =092*128K + 1*16K split into 63K, 63K, 2K, 63K, 63K, 2K, 16K

So splitting almost half undoes the gathering done by the clustering
level (we start with 17 blocks and end with 7).  Ideally we would end
with 5 (4*63K + 1*20K).

Caching in not-very-old drives (but not ZIP or CD/DVD ones) makes
stupid blocking not very harmful for reads, but doesn't help so much
for writes.

Bruce
--0-580949596-1128386928=:45947--

From owner-freebsd-performance@FreeBSD.ORG  Tue Oct  4 12:14:51 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@freebsd.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3FF8816A41F
	for <performance@freebsd.org>; Tue,  4 Oct 2005 12:14:51 +0000 (GMT)
	(envelope-from phk@critter.freebsd.dk)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D371143D48
	for <performance@freebsd.org>; Tue,  4 Oct 2005 12:14:50 +0000 (GMT)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (unknown [192.168.48.2])
	by phk.freebsd.dk (Postfix) with ESMTP id CD914BC6D
	for <performance@freebsd.org>; Tue,  4 Oct 2005 12:14:48 +0000 (UTC)
To: performance@freebsd.org
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
In-Reply-To: Your message of "Tue, 04 Oct 2005 12:25:21 BST."
	<20051004122459.E69774@fledge.watson.org> 
Date: Tue, 04 Oct 2005 14:14:48 +0200
Message-ID: <205.1128428088@critter.freebsd.dk>
Sender: phk@critter.freebsd.dk
Cc: 
Subject: Re: dd(1) performance when copiing a disk to another (fwd) 
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 Oct 2005 12:14:51 -0000


Robert forwarded this message.

>---------- Forwarded message ----------
>Date: Tue, 4 Oct 2005 10:48:48 +1000 (EST)
>From: Bruce Evans <bde@zeta.org.au>
>To: Tulio Guimar=E3es da Silva <tuliogs@pgt.mpt.gov.br>
>Cc: freebsd-performance@FreeBSD.org
>Subject: Re: dd(1) performance when copiing a disk to another

I raised this subject early in the GEOM era but got very little
feedback, so I decided to sit back and wait until it came up again,
and that seems to be now.

First issue: chopping requests.

In the future we will have even larger I/O requests because (at least
we hope) that bio requests will get rid of the antique requirement to
be mapped into sequential mapped kernel VM.

That means that somebody will have to cut I/O requests up somewhere
and it stands to reason that this happens as far down as possible
for reasons of memory management and workload avoiddance.

So in the future, device drivers will have to accept for all practical
purposes infinite bio requests and service them in pieces as best
they can.

In addition to chopping, drivers/classes which need to access the
data in the I/O request will need to request VM mapping of it.


Second issue: issuing intelligently sized/aligned requests.

Notwithstanding the above, it makes sense to issue requests that
work as efficient as possible further down the GEOM mesh.

The chopping is one case, and it can (and will) be solved by
propagating a non-mandatory size-hint upwards.  physio will
be able to use this to send down requests that require minimal
chopping later on.

But the other issue is alignment.  For a RAID-5 implementation it
is paramount for performance that requests try to align themselves
with the stripe size.  Other transformations have similar
requirements, striping and (gbde) encryption for instance.

Therefore in addition to the size hint, a stripe width and stripe
alignment hint needs to be passed up and then physio can start
to send requests that not only have the right size, but also
the right alignment for downstream processing.

The outline of this was committed to src/sys/geom/notes around
2½ years ago and the only thing that has changed is that after
some consideration I have concluded that the hints should be
non-binding for performance reasons.


Third issue: The problem extends all the way up to sysinstall.

Currently we do systematically shoot RAID-5 performance down by our
strict adherence to MBR formatting rules.  We reserve the first
track of typically 63 sectors to the MBR.

The first slice therefore starts in sector number 63.  All partitions
in that slice inherit that alignment and therefore unless the RAID-5
implementation has a stripe size of 63 sectors, a (too) large
fraction of the requests will have one sector in one raid-stripe
and the rest in another, which they often fail to fill by exactly
one sector.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-performance@FreeBSD.ORG  Wed Oct  5 11:17:57 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@freebsd.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 54CEC16A41F
	for <performance@freebsd.org>; Wed,  5 Oct 2005 11:17:57 +0000 (GMT)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9E43443D45
	for <performance@freebsd.org>; Wed,  5 Oct 2005 11:17:56 +0000 (GMT)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 2F88350A3E; Wed,  5 Oct 2005 13:17:55 +0200 (CEST)
Received: from localhost (pjd.wheel.pl [10.0.1.1])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 6EAC1509F1;
	Wed,  5 Oct 2005 13:17:45 +0200 (CEST)
Date: Wed, 5 Oct 2005 13:17:32 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Message-ID: <20051005111732.GB17298@garage.freebsd.pl>
References: <20051004122459.E69774@fledge.watson.org>
	<205.1128428088@critter.freebsd.dk>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="U+BazGySraz5kW0T"
Content-Disposition: inline
In-Reply-To: <205.1128428088@critter.freebsd.dk>
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
User-Agent: mutt-ng devel (FreeBSD)
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.0.4
Cc: performance@freebsd.org
Subject: Re: dd(1) performance when copiing a disk to another (fwd)
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Oct 2005 11:17:57 -0000


--U+BazGySraz5kW0T
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 04, 2005 at 02:14:48PM +0200, Poul-Henning Kamp wrote:
+> Second issue: issuing intelligently sized/aligned requests.
+>=20
+> Notwithstanding the above, it makes sense to issue requests that
+> work as efficient as possible further down the GEOM mesh.
+>=20
+> The chopping is one case, and it can (and will) be solved by
+> propagating a non-mandatory size-hint upwards.  physio will
+> be able to use this to send down requests that require minimal
+> chopping later on.
+>=20
+> But the other issue is alignment.  For a RAID-5 implementation it
+> is paramount for performance that requests try to align themselves
+> with the stripe size.  Other transformations have similar
+> requirements, striping and (gbde) encryption for instance.

That's true. When I worked on gstripe I was wondering for a moment about
an additional class which will cut IOs for me and gstripe will only
decide where to send all the pieces. In this case it was overkill of
course.

On the other hand, I implemented 'fast' mode in gstripe which is intend
to work fast even for very small stripe size, ie. when stripe size is
equal to 1kB and we receive 128kB request, we don't send 128 requests down,
but only as many requests as many components we have and do all the
shuffle magic when reading is done (or before writting).
Not sure how it can be achived when some upper layer will split the
request for me. How can I avoid sending 128 requests then?

+> Third issue: The problem extends all the way up to sysinstall.
+>=20
+> Currently we do systematically shoot RAID-5 performance down by our
+> strict adherence to MBR formatting rules.  We reserve the first
+> track of typically 63 sectors to the MBR.
+>=20
+> The first slice therefore starts in sector number 63.  All partitions
+> in that slice inherit that alignment and therefore unless the RAID-5
+> implementation has a stripe size of 63 sectors, a (too) large
+> fraction of the requests will have one sector in one raid-stripe
+> and the rest in another, which they often fail to fill by exactly
+> one sector.

Just to be sure I understand it correctly: You're talking about hardware
RAID, so bascially exported provider is attached to rank#1 geom?

If so this is not the case for software implementation which are
configured on top of slices/partitions, instead of raw disks.

As a work around we can configure 'a' partition start at offset
equal to stripe size, right? Of course it's not a solution for anything,
but want to be sure I get the things right.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--U+BazGySraz5kW0T
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (FreeBSD)

iD8DBQFDQ7ZMForvXbEpPzQRAi20AJ4zoFKbteSKqDBQpG7T9nkGx7K+IACeJUlH
3idA9HUvRyLNq4g5e8DFyjo=
=9DAw
-----END PGP SIGNATURE-----

--U+BazGySraz5kW0T--

From owner-freebsd-performance@FreeBSD.ORG  Wed Oct  5 16:12:15 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: performance@FreeBSD.org
Delivered-To: freebsd-performance@FreeBSD.ORG
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EAE4516A41F;
	Wed,  5 Oct 2005 16:12:15 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7C07643D45;
	Wed,  5 Oct 2005 16:12:15 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id 83C5C46BA4;
	Wed,  5 Oct 2005 12:12:14 -0400 (EDT)
Date: Wed, 5 Oct 2005 17:12:14 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: performance@FreeBSD.org
Message-ID: <20051005133730.R87201@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@FreeBSD.org
Subject: Call for performance evaluation: net.isr.direct
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Oct 2005 16:12:16 -0000


In 2003, Jonathan Lemon added initial support for direct dispatch of 
netisr handlers from the calling thread, as part of his DARPA/NAI Labs 
contract in the DARPA CHATS research program.  Over the last two years 
since then, Sam Leffler and I have worked to refine this implementation, 
removing a number of ordering related issues, opportunities for excessive 
parallelism, recursion issues, and testing with a broad range of network 
components.  There has also been a significant effort to complete MPSAFE 
locking work throughout the network stack.  Combined with the earlier move 
to ithreads and a functional direct dispatch ("process to completion" 
implementation), there are a number of exciting possible benefits.

- Possible parallelism by packet source -- ithreads can dispatch
   simultaenously into the higher level network stack layers.  Since
   ithreads can execute in parallel on different CPU, so can code they
   invoke directly.

- Elimination of context switches in the network receive path -- rather
   than context switching to the netisr thread from the ithread, we can now
   directly execute netisr code from the ithread.

- A CPU-bound netisr thread on a multi-processor system will no longer
   rate limit traffic to the available resources on one CPU.

- Eliminating the additional queueing in the handoff reduces the
   opportunity for queues to overfill as a result of scheduling delays.

There are, however, some possible downsides and/or trade-offs:

- Higher level network processing will now compete with the interrupt
   handler for CPU resources available to the ithread.  This means less
   time for the interrupt code to execute in the thread if the thread is
   CPU-bound.

- Lower levels of parallelism between portions of the inbound packet
   processing path.  Without direct dispatch, there is possible parallelism
   between receive network driver execution and higher level stack layers,
   whereas with direct dispatch they can no longer execute in parallel.

- Re-queued packets from tunnel and encapsulation processing will now
   require a context switch to process, since they will be processed in the
   netisr proper rather than in the ithread, whereas before the netisr
   thread would pick them up immediately after completing the current
   processing without a context switch.

- Code that previously ran in the SWI at a SWI priority now runs in the
   ithread at an ithread priority, elevating the general priority at which
   network processing takes place.

And there are a few mixed things, that can offer good and bad elements:

- Less queueing takes place in the network stack in in-bound processing:
   packets are taken directly from the driver and processed to completion
   one by one, rather than queued for batch processing.  Packets will be
   dropped before the link layer, rather than on the boundary between the
   link and protocol layers.  This is good in that we invest less work in
   packets we were going to drop anyway, but bad in that less queueing
   means less room for scheduling delays.

In previous FreeBSD releases, such as several 5.x series releases, 
net.isr.enable could not be turned on by default because there was 
insufficient synchronization in the network stack.  As of 5.5 and 6.0, I 
believe there is sufficient synchronization, especially given that we 
force non-MPSAFE protocol handlers to run in the netisr without direct 
dispatch.  As such, there has been a gradual conversation going on about 
making direct dispatch the default behavior in the 7.x development series, 
and more publically documenting and supporting the use of direct dispatch 
in the 6.x release engineering series.

Obviously, this is about two things: performance, and stability.  Many of 
us have been running with direct dispatch on by default for quite some 
time, so it passes some of the basic "does it run" tests.  However, since 
it significantly increases the opportunity for parallelism in the receive 
path of the network stack, it likely will trigger otherwise latent or 
infrequent races and bugs to occur more frequently.  The second aspect is 
performance: many results suggest that direct dispatch has a significant 
performance benefit.  However, evaluating the impact on a broad range of 
results is required in order for us to go ahead with what is effectively a 
significant architectural change in how we perform network stack 
processing.

To give you a sense of some of the performance effect I've measured 
recently, using the netperf measurement tool (with -DHISTOGRAM removed 
from the FreeBSD port build), here are some results.  In each case, I've 
put parenthesis around host or router to indicate which is the host where 
the configuration change is being tested.  These tests were performed 
using dual Xeon systems, and using back-to-back gigabit ethernet cards and 
the if_em driver:

TCP round trip benchmark (TCP_RR), host-(host):

7.x UP: 0.9% performance improvement
7.x SMP: 0.7% performance improvement

TCP round trip benchmark (TCP_RR), host-(router)-host:

7.x UP: 2.4% performance improvement
7.x SMP: 2.9% performance improvement

UDP round trip benchmark (UDP_RR), host-(host):

7.x UP: 0.7% performance improvement
7.x SMP: 0.6% performance improvement

UDP round trip benchmark (UDP_RR), host-(router)-host:

7.x UP: 2.2% performance improvement
7.x SMP: 3.0% performance improvement

TCP stream banchmark (TCP_STREAM), host-(host):

7.x UP: 0.8% performance improvement
7.x SMP: 1.8% performance improvement

TCP stream benchmark (TCP_STREAM), host-(router)-host:

7.x UP: 13.6% performance improvement
7.x SMP: 15.7% performance improvement

UDP stream benchmark (UDP_STREAM), host-(host):

7.x UP: none
7.x SMP: none

UDP stream benchmark (UDP_STREAM), host-(router)-host:

7.x UP: none
7.x SMP: none

TCP connect benchmark (src/tools/tools/netrate/tcpconnect)

7.x UP: 7.90383% +/- 0.553773%
7.x SMP: 12.2391% +/- 0.500561%

So in some cases, the impact is negligible -- in other places, it is quite 
significant.  So far, I've not measured a case where performance has 
gotten worse, but that's probably because I've only been measuring a 
limited number of cases, and with a fairly limited scope of 
configurations, especially given that the hardware I have is pushing the 
limits of what the wire supports, so minor changes in latency are 
possible, but not large changes in throughput.

So other than a summary of the status quo, this is also a call to action. 
I would like to get more widespread benchmarking of the impact of direct 
dispatch on network-related workloads.  This means a variety of things:

(1) Performance of low level network services, such as routing, bridging,
     and filtering.

(2) Performance of high level application servces, such as web and
     database.

(3) Performance of integrated kernel network services, such as the NFS
     client and server.

(4) Performance of user space distributed file systems, such as Samba and
     AFS.

All you need to do to switch to direct dispatch mode is set the sysctl or 
tunable "net.isr.dispatch" to 1.  To disable it again, remove the setting, 
or set it to 0.  It can be modified at run-time, although during the 
transition from one mode to the other, there may be a small quantity of 
packet misordering, so benchmarking over the transition is discouraged.
FYI: as of 6.0-RC1 and recent 7.0, net.isr.dispatch is the name of the 
variable.  In earlier releases, the name of this variable was 
net.isr.enable.

Some important details:

- Only non-local protocol traffic is affected: loopback traffic still goes
   via the netisr to avoid issues of recursion and lock order.

- In the general case, only in-bound traffic is directly affected by this
   change.  As such, send-only benchmarks may reveal little change.  They
   are still interesting, however.

- However, the send path is indirectly affected due to changes in
   scheduling, workload, interrupt handling, and so on.

- Because network benchmarks, especially micro-benchmarks, are especially
   sensitive to minor perturbations, I highly recommend running in a
   minimal multi-user or ideally single-user environment, and suggest
   isolating undesired sources of network traffic from segments where
   testing is occuring.  For macro-benchmarks this can be less important,
   but should be paid attention to.

- Please make sure debugging features are turned off when running tests --
   especially WITNESS, INVARIANTS, INVARIANT_SUPPORT, and user space malloc
   debugging.  These can have a significant impact on performance, both
   potentially overshadowing changes, and in some cases, actually reversing
   results (due to higher overhead under locks, for example).

- Do not use net.isr.enable in the 5.x line unless you know what you are
   doing.  While it is reasonably safe with 5.4 forwards, it is not a
   supported configuration, and may cause stability issues with specific
   workloads.

- What we're particularly interested in is a statistically meaningful
   comparison of the "before" and "after" case.  When doing measurements, I
   like to run 10-12 samples, and usually discard the first one or two,
   depending on the details of the benchmark.  I'll then use
   src/tools/tools/ministat to compare the data sets.  Running a number of
   samples is quite important, because the variance in many tests can be
   significant, and if the two sample sets overlap, you can quite easily
   draw the entirely wrong conclusion about the results from a small number
   of measurements in a sample.

Assuming you have a fixed width font, typicaly output from ministat looks 
something like the following and may be human readable:

x 7SMP/tcpconnect_queue
+ 7SMP/tcpconnect_direct
+--------------------------------------------------------------------------+
|x xx                                                                +    +|
|xxxxx  xx                                                       ++ +++++ +|
||__A__|                                                          |___A__| |
+--------------------------------------------------------------------------+
     N           Min           Max        Median           Avg        Stddev
x  10          5425          5503          5460        5456.3     26.284977
+  10          6074          6169          6126        6124.1     31.606785
Difference at 95.0% confidence
         667.8 +/- 27.3121
         12.2391% +/- 0.500561%
         (Student's t, pooled s = 29.0679)

Of particular interest is if changing to direct dispatch hurts performance 
in your environment, and understanding why that is.

Thanks,

Robert N M Watson

From owner-freebsd-performance@FreeBSD.ORG  Wed Oct  5 17:30:28 2005
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
X-Original-To: freebsd-performance@freebsd.org
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DED2216A41F
	for <freebsd-performance@freebsd.org>;
	Wed,  5 Oct 2005 17:30:28 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from smtp4-g19.free.fr (smtp4-g19.free.fr [212.27.42.30])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 823C143D45
	for <freebsd-performance@freebsd.org>;
	Wed,  5 Oct 2005 17:30:28 +0000 (GMT)
	(envelope-from patpro@patpro.net)
Received: from [10.0.2.2] (boleskine.patpro.net [82.235.12.223])
	by smtp4-g19.free.fr (Postfix) with ESMTP id 56EA72CB1F
	for <freebsd-performance@freebsd.org>;
	Wed,  5 Oct 2005 19:30:27 +0200 (CEST)
Mime-Version: 1.0 (Apple Message framework v734)
In-Reply-To: <E6243148329F3A468BC8F98468FC4BB404616855@fw1-ex03.c-b.net>
References: <E6243148329F3A468BC8F98468FC4BB404616855@fw1-ex03.c-b.net>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <3509E78F-E90F-4056-AD4C-FDDDC41A4A46@patpro.net>
Content-Transfer-Encoding: 7bit
From: Patrick Proniewski <patpro@patpro.net>
Date: Wed, 5 Oct 2005 19:30:24 +0200
To: freebsd-performance@freebsd.org
X-Mailer: Apple Mail (2.734)
Subject: Re: dd(1) performance when copiing a disk to another
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Oct 2005 17:30:29 -0000

Hi,

thank you all for these interesting explanations.

I've made some more tests with my disks :
As you'll see, for block size greater than 64k, the HDD ad6 (hitachi)  
is the bottleneck.
bs of 1m and 512k yield to best transfert rates between ad4 and ad6  
and using a pipe between to dd will lower the performance.

best regards, and thank you again,

Pat,

#### /dev/zero to ad6

# dd if=/dev/zero of=/dev/ad6 bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 31.047655 secs (33773114 bytes/sec)

# dd if=/dev/zero of=/dev/ad6 bs=8k count=128000
128000+0 records in
128000+0 records out
1048576000 bytes transferred in 31.580223 secs (33203565 bytes/sec)

#### ad4 (SATA150) to ad6 (SATA150)

# dd if=/dev/ad4 of=/dev/ad6 bs=8k count=128000
128000+0 records in
128000+0 records out
1048576000 bytes transferred in 50.916216 secs (20594146 bytes/sec)

# dd if=/dev/ad4 of=/dev/ad6 bs=64k count=16000
16000+0 records in
16000+0 records out
1048576000 bytes transferred in 30.925397 secs (33906630 bytes/sec)

# dd if=/dev/ad4 of=/dev/ad6 bs=128k count=8000
8000+0 records in
8000+0 records out
1048576000 bytes transferred in 31.462153 secs (33328170 bytes/sec)

# dd if=/dev/ad4 of=/dev/ad6 bs=256k count=4000
4000+0 records in
4000+0 records out
1048576000 bytes transferred in 30.819234 secs (34023428 bytes/sec)

# dd if=/dev/ad4 of=/dev/ad6 bs=512k count=2000
2000+0 records in
2000+0 records out
1048576000 bytes transferred in 30.589651 secs (34278783 bytes/sec)

# dd if=/dev/ad4 of=/dev/ad6 bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 30.660553 secs (34199514 bytes/sec)

# dd if=/dev/ad4 bs=1m count=1000 | dd of=/dev/ad6 bs=1m
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 33.998716 secs (30841635 bytes/sec)
0+16000 records in
0+16000 records out
1048576000 bytes transferred in 34.001099 secs (30839474 bytes/sec)