From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 14:57:12 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 329D316A420 for ; Sun, 2 Oct 2005 14:57:12 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from smtp1-g19.free.fr (smtp1-g19.free.fr [212.27.42.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id D20F643D46 for ; Sun, 2 Oct 2005 14:57:11 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from [82.235.12.223] (boleskine.patpro.net [82.235.12.223]) by smtp1-g19.free.fr (Postfix) with ESMTP id 5DF053954F for ; Sun, 2 Oct 2005 16:57:10 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v734) Content-Transfer-Encoding: quoted-printable Message-Id: <7035DA0C-E43C-478B-9B1C-6A32545E5E63@patpro.net> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed To: freebsd-performance@freebsd.org From: Patrick Proniewski Date: Sun, 2 Oct 2005 16:57:09 +0200 X-Mailer: Apple Mail (2.734) Subject: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 14:57:12 -0000 Hi, (carte m=E8re supermicro chip SATA Intel 6300ESB) I run FreeBSD 5.4 on a PIV 3GHz (SuperMicro motherboard, Intel SATA =20 6300ESB chipset) with 2 SATA HDD. I'm in the process to duplicate the =20= boot HDD to the second HDD. I run dd for that: # dd if=3D/dev/ad4 of=3D/dev/ad6 bs=3D1m It yields to poor performances: $ iostat -dhKw 1 (...) ad4 ad6 KB/t tps MB/s KB/t tps MB/s 124.49 252 30.69 128.00 246 30.69 128.00 285 35.64 128.00 279 34.90 128.00 282 35.27 128.00 283 35.40 (...) Is it normal that data rate won't go upper than 35/38 MB/s ? HDDs are: ad4 -> Maxtor 80 Go 7200 rpm ad6 -> Hitachi 80 Go 7200 rpm one more question: is dd(1) a good way to duplicate a boot drive to =20 make a bootable spare disk ? patpro= From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 15:17:05 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 86A5F16A41F for ; Sun, 2 Oct 2005 15:17:05 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23]) by mx1.FreeBSD.org (Postfix) with ESMTP id B172643D48 for ; Sun, 2 Oct 2005 15:17:04 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from vader ([212.135.219.179]) by multiplay.co.uk (multiplay.co.uk [85.236.96.23]) (MDaemon.PRO.v8.1.3.R) with ESMTP id md50001869671.msg for ; Sun, 02 Oct 2005 16:16:05 +0100 Message-ID: <003e01c5c764$3391d720$b3db87d4@multiplay.co.uk> From: "Steven Hartland" To: , "Patrick Proniewski" References: <7035DA0C-E43C-478B-9B1C-6A32545E5E63@patpro.net> Date: Sun, 2 Oct 2005 16:15:58 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2670 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670 X-Spam-Processed: multiplay.co.uk, Sun, 02 Oct 2005 16:16:05 +0100 (not processed: message from valid local sender) X-MDRemoteIP: 212.135.219.179 X-Return-Path: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-performance@freebsd.org X-MDAV-Processed: multiplay.co.uk, Sun, 02 Oct 2005 16:16:07 +0100 Cc: Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 15:17:05 -0000 That's actually pretty good for a sustained read / write on a single disk. Steve ----- Original Message ----- From: "Patrick Proniewski" To: Sent: Sunday, October 02, 2005 3:57 PM Subject: dd(1) performance when copiing a disk to another Hi, (carte mère supermicro chip SATA Intel 6300ESB) I run FreeBSD 5.4 on a PIV 3GHz (SuperMicro motherboard, Intel SATA 6300ESB chipset) with 2 SATA HDD. I'm in the process to duplicate the boot HDD to the second HDD. I run dd for that: # dd if=/dev/ad4 of=/dev/ad6 bs=1m It yields to poor performances: $ iostat -dhKw 1 (...) ad4 ad6 KB/t tps MB/s KB/t tps MB/s 124.49 252 30.69 128.00 246 30.69 128.00 285 35.64 128.00 279 34.90 128.00 282 35.27 128.00 283 35.40 (...) Is it normal that data rate won't go upper than 35/38 MB/s ? HDDs are: ad4 -> Maxtor 80 Go 7200 rpm ad6 -> Hitachi 80 Go 7200 rpm one more question: is dd(1) a good way to duplicate a boot drive to make a bootable spare disk ? patpro_______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 15:59:32 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 36EB516A41F for ; Sun, 2 Oct 2005 15:59:32 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: from web30313.mail.mud.yahoo.com (web30313.mail.mud.yahoo.com [68.142.201.231]) by mx1.FreeBSD.org (Postfix) with SMTP id 7339243D6E for ; Sun, 2 Oct 2005 15:59:27 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 19633 invoked by uid 60001); 2 Oct 2005 15:59:26 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=qtqj3U4Srz+NFHnSe6rDXja82fi6yXlnMs2XU3S6LvAIXwK0eIW7s2U9waKc8XlAX7KHsm5lEcr4MSfd5Eo53E4ygSOkbFv7ZICNSw64lpvaiGHkAuJeoDucc7vJU9VhbIE0KMuAk4OL9lNF2s4fBXLt18SOtDgPgtK3nlapmvY= ; Message-ID: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com> Received: from [213.54.70.142] by web30313.mail.mud.yahoo.com via HTTP; Sun, 02 Oct 2005 08:59:26 PDT Date: Sun, 2 Oct 2005 08:59:26 -0700 (PDT) From: Arne "Wörner" To: Steven Hartland , freebsd-performance@freebsd.org, Patrick Proniewski In-Reply-To: <003e01c5c764$3391d720$b3db87d4@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 15:59:32 -0000 --- Steven Hartland wrote: > From: "Patrick Proniewski" >> # dd if=/dev/ad4 of=/dev/ad6 bs=1m >> >> It yields to poor performances: >> > That's actually pretty good for a sustained read / write on a > single disk. > Does somebody know, why this is "pretty good"? I mean: Where is the bottleneck? As far as I know, SATA is quite fast... And memory to memory copies are quite fast... disc<->memory should be quite fast, too. >> Is it normal that data rate won't go upper than 35/38 MB/s ? >> Hmm... Can u find out, if DMA transfers are enabled for those discs? What does dmesg say? What does "sysctl hw.ata.ata_dma" say? Maybe atacontrol(8) says something useful about SATA discs, too (e. g. atacontrol mode 0)? Can u try the following commands, when the system (especially the discs) is idle? #dd if=/dev/ad4 of=/dev/null bs=1m count=1000 #dd if=/dev/zero of=/dev/null bs=1m count=1000 (Maybe you could find a way to copy /dev/zero to /dev/ad6 without destroying the previous work... :-)) E. g.: # dd if=/dev/ad6 of=/tmp/arne bs=1m count=1000 # dd if=/dev/zero of=/dev/ad6 bs=1m count=1000 # dd if=/tmp/arne of=/dev/ad6 bs=1m count=1000 ) > one more question: is dd(1) a good way to duplicate a boot > drive to make a bootable spare disk ? > I say, is the file system on /dev/ad4 read only during the "dd"? If /dev/ad4 changes before "dd" completes, ad6 might need a fsck or ad6 might be useless... Btw.: I use gmirror(8)... But then an unintentional, fatal change von ad4 would be fatal for ad6, too... :-)) So I have to hope, that I do not type things, I shall not type (luckily I have some boot CDs for that unlikely case ;-)) )... -Arne __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 16:34:34 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C274516A41F for ; Sun, 2 Oct 2005 16:34:34 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from smtp3-g19.free.fr (smtp3-g19.free.fr [212.27.42.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 243D743D45 for ; Sun, 2 Oct 2005 16:34:34 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from [82.235.12.223] (boleskine.patpro.net [82.235.12.223]) by smtp3-g19.free.fr (Postfix) with ESMTP id 4BA2826548; Sun, 2 Oct 2005 18:34:30 +0200 (CEST) In-Reply-To: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com> References: <20051002155926.19631.qmail@web30313.mail.mud.yahoo.com> Mime-Version: 1.0 (Apple Message framework v734) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Patrick Proniewski Date: Sun, 2 Oct 2005 18:34:29 +0200 To: =?ISO-8859-1?Q?Arne_"W=F6rner"?= X-Mailer: Apple Mail (2.734) Cc: freebsd-performance@freebsd.org, Steven Hartland Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 16:34:34 -0000 Hi, > Can u find out, if DMA transfers are enabled for those discs? > What does dmesg say? see end of mail for full dmesg output, > What does "sysctl hw.ata.ata_dma" say? hw.ata.ata_dma: 1 > Maybe atacontrol(8) says something useful about SATA discs, too > (e. g. atacontrol mode 0)? # atacontrol mode 0 Master = BIOSPIO Slave = BIOSPIO > Can u try the following commands, when the system (especially the > discs) is idle? > #dd if=/dev/ad4 of=/dev/null bs=1m count=1000 # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 17.647464 secs (59417943 bytes/sec) > #dd if=/dev/zero of=/dev/null bs=1m count=1000 # dd if=/dev/zero of=/dev/null bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 0.199381 secs (5259154109 bytes/sec) > (Maybe you could find a way to copy /dev/zero to /dev/ad6 without > destroying the previous work... :-)) well, not very easy both disk are the same size ;) >> one more question: is dd(1) a good way to duplicate a boot >> drive to make a bootable spare disk ? > I say, is the file system on /dev/ad4 read only during the "dd"? > If /dev/ad4 changes before "dd" completes, ad6 might need a fsck > or ad6 might be useless... well, ad4 is not read only, but I've shutdown every unnecessary services, and finally the ad6 hdd is bootable ! It boots ok and every things seems to work as well as on the ad4 disk. It's ok for me, it's just a spare emergency disk. thanks, Pat dmesg : Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RELEASE-p6 #0: Mon Aug 29 15:58:58 CEST 2005 root@toto.patpro.net:/usr/obj/usr/src/sys/PATPRO-20050829 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2994.90-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf41 Stepping = 1 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 1072562176 (1022 MB) avail memory = 1044230144 (995 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: on acpi0 cpu1: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 3.0 on pci0 pci1: on pcib1 em0: port 0xc000-0xc01f mem 0xf2000000-0xf201ffff irq 18 at device 1.0 on pci1 em0: Ethernet address: 00:30:48:83:ef:8c em0: Speed:N/A Duplex:N/A pcib2: at device 28.0 on pci0 pci2: on pcib2 uhci0: port 0xe100-0xe11f irq 16 at device 29.0 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xe000-0xe01f irq 19 at device 29.1 on pci0 usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered pci0: at device 29.4 (no driver attached) pci0: at device 29.5 (no driver attached) ehci0: mem 0xf2100000-0xf21003ff irq 23 at device 29.7 on pci0 usb2: EHCI version 1.0 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 4 ports with 4 removable, self powered pcib3: at device 30.0 on pci0 pci3: on pcib3 pci3: at device 9.0 (no driver attached) em1: port 0xd100-0xd13f mem 0xf1000000-0xf101ffff irq 19 at device 10.0 on pci3 em1: Ethernet address: 00:30:48:83:ef:8d em1: Speed:N/A Duplex:N/A isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0xf000-0xf00f, 0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 atapci1: port 0xe600-0xe60f, 0xe500-0xe503,0xe400-0xe407,0xe300-0xe303,0xe200-0xe207 irq 18 at device 31.2 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 pci0: at device 31.3 (no driver attached) acpi_tz0: on acpi0 fdc0: port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: port 0x778-0x77b,0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: on ppc0 ppi0: on ppbus0 atkbdc0: port 0x64,0x60 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 10.000 msec em0: Link is up 100 Mbps Full Duplex ad4: 78167MB [158816/16/63] at ata2-master SATA150 ad6: 194481MB [395136/16/63] at ata3-master SATA150 <-- this is _not_ the ad6 I've used dd on this is my regular ad6 storage disk. SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/ad4s1a em0: Link is up 100 Mbps Full Duplex Accounting enabled pflog0: promiscuous mode enabled em0: Link is up 100 Mbps Full Duplex em0: Link is up 100 Mbps Full Duplex From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 17:04:47 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E3E916A41F for ; Sun, 2 Oct 2005 17:04:47 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: from web30303.mail.mud.yahoo.com (web30303.mail.mud.yahoo.com [68.142.200.96]) by mx1.FreeBSD.org (Postfix) with SMTP id 378EB43D45 for ; Sun, 2 Oct 2005 17:04:47 +0000 (GMT) (envelope-from arne_woerner@yahoo.com) Received: (qmail 78676 invoked by uid 60001); 2 Oct 2005 17:04:46 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Ie1Vqihmzx5zEt7vSyA3iyAVDpMVuzPoFEwmFHdoRACalN9qAKq+H1YfeCa/Tl8eyDU2x9c9rruqdMXzaP+pSFDJgPSDjJO0b4MiekYWmG4BpQDTcZCojw09xZCdMDPgoJO89MCvw+FIv8pdQAYOJxyISpgZzPHsVM6esD6/V0Q= ; Message-ID: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> Received: from [213.54.70.142] by web30303.mail.mud.yahoo.com via HTTP; Sun, 02 Oct 2005 10:04:46 PDT Date: Sun, 2 Oct 2005 10:04:46 -0700 (PDT) From: Arne "Wörner" To: Patrick Proniewski In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 17:04:47 -0000 Hi! --- Patrick Proniewski wrote: > > Can u find out, if DMA transfers are enabled for those discs? > > What does dmesg say? > > see end of mail for full dmesg output, > Looks good... :-)) But I never saw FBSD's kernel messages about SATA drives... ;-) > > Maybe atacontrol(8) says something useful about SATA discs, > > too (e. g. atacontrol mode 0)? > > # atacontrol mode 0 > Master = BIOSPIO > Slave = BIOSPIO > Hmm... 0 seems to be the wrong ata... Thats why the output does not fit to SATA drives, I think... > # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 17.647464 secs (59417943 > bytes/sec) > That seems to be 2 or about 2 times faster than disc->disc transfer... But still slower, than I would have expected... SATA150 sounds like the drive can do 150MB/sec... As far as I know, SATA busses are independant from each other (no master/slave; every drive gets its own cable)... Maybe "dd" cannot issue a read request, while the write isn't completed? DMA shouldn't be the problem, since the memory interface is quite fast in your case... So there remain the questions: 1. Why does the read speed drop in ur setting (maybe writing to ad6 takes more time than reading from ad4? u could try to run two dd processes one with if=ad4 and the other with if=ad6)? 2. Why can't we reach 150MB/sec? > > (Maybe you could find a way to copy /dev/zero to /dev/ad6 > > without destroying the previous work... :-)) > > well, not very easy both disk are the same size ;) > I thought of the first 1000 1MB blocks... :-) The write speed might be interesting... -Arne __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 17:44:16 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C9C0916A41F for ; Sun, 2 Oct 2005 17:44:16 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69D8043D45 for ; Sun, 2 Oct 2005 17:44:15 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [192.168.42.23] (andersonbox3.centtech.com [192.168.42.23]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id j92HiED1004567; Sun, 2 Oct 2005 12:44:14 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <43401C62.2040606@centtech.com> Date: Sun, 02 Oct 2005 12:44:02 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.11) Gecko/20050914 X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?ISO-8859-15?Q?=22Arne_=5C=22W=F6rner=5C=22=22?= References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> In-Reply-To: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV 0.82/1107/Sun Oct 2 03:09:39 2005 on mh1.centtech.com X-Virus-Status: Clean Cc: freebsd-performance@freebsd.org, Patrick Proniewski Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 17:44:17 -0000 Arne W=F6rner wrote: > Hi! >=20 > --- Patrick Proniewski wrote: >=20 >>>Can u find out, if DMA transfers are enabled for those discs? >>>What does dmesg say? >> >>see end of mail for full dmesg output, >> >=20 > Looks good... :-)) But I never saw FBSD's kernel messages about > SATA drives... ;-) >=20 >=20 >>>Maybe atacontrol(8) says something useful about SATA discs, >>>too (e. g. atacontrol mode 0)? >> >># atacontrol mode 0 >>Master =3D BIOSPIO >>Slave =3D BIOSPIO >> >=20 > Hmm... 0 seems to be the wrong ata... Thats why the output does > not fit to SATA drives, I think... >=20 >=20 >># dd if=3D/dev/ad4 of=3D/dev/null bs=3D1m count=3D1000 >>1000+0 records in >>1000+0 records out >>1048576000 bytes transferred in 17.647464 secs (59417943 >>bytes/sec) >> >=20 > That seems to be 2 or about 2 times faster than disc->disc > transfer... But still slower, than I would have expected... > SATA150 sounds like the drive can do 150MB/sec... >=20 > As far as I know, SATA busses are independant from each other (no > master/slave; every drive gets its own cable)... Maybe "dd" cannot > issue a read request, while the write isn't completed? DMA > shouldn't be the problem, since the memory interface is quite fast > in your case... >=20 > So there remain the questions: > 1. Why does the read speed drop in ur setting (maybe writing to > ad6 takes more time than reading from ad4? u could try to run two > dd processes one with if=3Dad4 and the other with if=3Dad6)? > 2. Why can't we reach 150MB/sec? The reason why 35-40MB/s is good is because the drive itself cannot=20 stream any faster. SATA-150 interface is rated at 150MB/s, but the disk = cannot get close. Look at the specs for the drive, and you'll see that=20 the sustained rate is much lower than the burst speed. If you want fast = performance on a SATA disk, you'll need to buy a WD Raptor drive (74GB)=20 - that will get you more speed, but still not the 150MB/s. >>>(Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>without destroying the previous work... :-)) >> >>well, not very easy both disk are the same size ;) >> >=20 > I thought of the first 1000 1MB blocks... :-) > The write speed might be interesting... Instead of dd, why not use gmirror? Also - reads can be faster since the drive can read-ahead a number of=20 blocks into the cache in an efficient manner, but writes have to be=20 streamed to disk as they come in (going through the cache, and=20 buffering, but you get the idea). Have you tried a smaller block size? What does 8k, 16k, or 512k do for=20 you? There really isn't much room for improvement here on a single devic= e. Eric --=20 ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Sun Oct 2 18:25:37 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 61DCD16A41F for ; Sun, 2 Oct 2005 18:25:37 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from multiplay.co.uk (core6.multiplay.co.uk [85.236.96.23]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC5D943D49 for ; Sun, 2 Oct 2005 18:25:36 +0000 (GMT) (envelope-from killing@multiplay.co.uk) Received: from vader ([212.135.219.179]) by multiplay.co.uk (multiplay.co.uk [85.236.96.23]) (MDaemon.PRO.v8.1.3.R) with ESMTP id md50001870022.msg for ; Sun, 02 Oct 2005 19:25:30 +0100 Message-ID: <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk> From: "Steven Hartland" To: =?iso-8859-1?Q?Arne_W=F6rner?= , "Patrick Proniewski" References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> Date: Sun, 2 Oct 2005 19:25:20 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2670 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2670 X-Spam-Processed: multiplay.co.uk, Sun, 02 Oct 2005 19:25:30 +0100 (not processed: message from valid local sender) X-MDRemoteIP: 212.135.219.179 X-Return-Path: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-performance@freebsd.org X-MDAV-Processed: multiplay.co.uk, Sun, 02 Oct 2005 19:25:31 +0100 Cc: freebsd-performance@freebsd.org Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2005 18:25:37 -0000 ----- Original Message ----- From: "Arne Wörner" > That seems to be 2 or about 2 times faster than disc->disc > transfer... But still slower, than I would have expected... > SATA150 sounds like the drive can do 150MB/sec... LOL, you might want to read up on what SATA150 means. In short it the max throughput the interface can sustain. It is NOT what you can get of a single disk which is still fare from that, SATA disk transfer rates typically 30 -> 50MB/s sustained. Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-performance@FreeBSD.ORG Mon Oct 3 07:55:58 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 53DFB16A41F for ; Mon, 3 Oct 2005 07:55:58 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from smtp.univ-lyon2.fr (smtp.univ-lyon2.fr [159.84.143.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA89443D4C for ; Mon, 3 Oct 2005 07:55:57 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from localhost (localhost [127.0.0.1]) by smtp.univ-lyon2.fr (Postfix) with ESMTP id 7006E170E0AD; Mon, 3 Oct 2005 09:55:56 +0200 (CEST) Received: from smtp.univ-lyon2.fr ([127.0.0.1]) by localhost (smtp.univ-lyon2.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12936-04; Mon, 3 Oct 2005 09:55:50 +0200 (CEST) Received: from [159.84.142.75] (dhcp-159-84-142-75.univ-lyon2.fr [159.84.142.75]) by smtp.univ-lyon2.fr (Postfix) with ESMTP id C6D79170E081; Mon, 3 Oct 2005 09:55:50 +0200 (CEST) In-Reply-To: <43401C62.2040606@centtech.com> References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <43401C62.2040606@centtech.com> Mime-Version: 1.0 (Apple Message framework v734) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> Content-Transfer-Encoding: 7bit From: Patrick Proniewski Date: Mon, 3 Oct 2005 09:55:49 +0200 To: Eric Anderson X-Mailer: Apple Mail (2.734) X-Virus-Scanned: amavisd-new at univ-lyon2.fr Cc: freebsd-performance@freebsd.org, "=?ISO-8859-1?Q? Arne_\"W=F6rner\" ?=" Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Oct 2005 07:55:58 -0000 Hi Arne and Eric, >>> # atacontrol mode 0 >>> Master = BIOSPIO >>> Slave = BIOSPIO >> Hmm... 0 seems to be the wrong ata... Thats why the output does >> not fit to SATA drives, I think... oups... I'll have to do it again with channels 2 and 3 >>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 >>> 1000+0 records in >>> 1000+0 records out >>> 1048576000 bytes transferred in 17.647464 secs (59417943 >>> bytes/sec) >> That seems to be 2 or about 2 times faster than disc->disc >> transfer... But still slower, than I would have expected... >> SATA150 sounds like the drive can do 150MB/sec... As Eric pointed out, you just can"t reach 150 MB/s with one disk, it's a technological maximum for the bus, but real world performance is well bellow this max. In fact, I've though I would reach about 50 to 60 MB/s. >>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>> without destroying the previous work... :-)) >>> >>> well, not very easy both disk are the same size ;) >> I thought of the first 1000 1MB blocks... :-) damn, I misread this one... :) I'm gonna try this asap. > Instead of dd, why not use gmirror? I had no idea gmirror exists, but I'll continue with dd. It's a one time experiment. > Have you tried a smaller block size? What does 8k, 16k, or 512k do > for you? There really isn't much room for improvement here on a > single device. nop, I'll try one of them, but I can't do many experiments, the box is in my living room, it's a 1U rack, and it's VERY VERY noisy. My girlfriend will kill me if it's running more than an hour a day :)) Pat From owner-freebsd-performance@FreeBSD.ORG Mon Oct 3 14:21:29 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C5F7116A422 for ; Mon, 3 Oct 2005 14:21:29 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 109A843D45 for ; Mon, 3 Oct 2005 14:21:28 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j93ELJVO012644; Tue, 4 Oct 2005 00:21:19 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j93ELFtj021117; Tue, 4 Oct 2005 00:21:15 +1000 Date: Tue, 4 Oct 2005 00:21:15 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Patrick Proniewski In-Reply-To: <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> Message-ID: <20051003222844.R44500@delplex.bde.org> References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <43401C62.2040606@centtech.com> <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-performance@freebsd.org, Eric Anderson , "=?ISO-8859-1?Q? Arne_\"W=F6rner\" ?=" Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Oct 2005 14:21:30 -0000 On Mon, 3 Oct 2005, Patrick Proniewski wrote: >>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 >>>> 1000+0 records in >>>> 1000+0 records out >>>> 1048576000 bytes transferred in 17.647464 secs (59417943 >>>> bytes/sec) Many wrong answers to the original question have been given. dd with a blocks size of 1m between (separate) disk devices is much slower just because that block size is far too large... The above is a fairly normal speed. The expected speed depends mainly on the disk technology generation and the placement of the sectors being read. I get the following speeds for _sequential_ _reading- from the outer (fastest) tracks of 6- and 3-year old drives which are about 2 generations apart: %%% Sep 25 21:52:35 besplex kernel: ad0: 29314MB [59560/16/63] at ata0-master UDMA100 Sep 25 21:52:35 besplex kernel: ad2: 58644MB [119150/16/63] at ata1-master UDMA100 ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 bytes/sec) ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 bytes/sec) ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 bytes/sec) ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 bytes/sec) ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 bytes/sec) ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 bytes/sec) ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 bytes/sec) ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 bytes/sec) ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 bytes/sec) ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 bytes/sec) ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 bytes/sec) ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 bytes/sec) ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 bytes/sec) ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 bytes/sec) ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 bytes/sec) ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 bytes/sec) ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 bytes/sec) ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec) ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 bytes/sec) ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 bytes/sec) ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec) ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 bytes/sec) ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 bytes/sec) ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 bytes/sec) %%% Drive technology hit a speed plateau a few years ago so newer single drives aren't much faster unless they are more expensive and/or smaller. The speed is low for small block sizes because the device has to be talked too too much and the protocol and firmware are not very good. (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps up to about half speed (26MB/sec) for a block size of 4K but sticks at that speed for block sizes 8K and 16K, then jumps up to full speed for a block sizes of 32K and larger. This indicates some firmware stupidness). Most drives ramp up almost logarithmically (doubling the block size almost doubles the speed). This behaviour is especially evident on slow SCSI drives like some (most?) ZIP and dvd/cd. The command overhead can be 20 msec, so you had better not do 1 512 bytes of i/o per command or you will get a speed of 25K/sec. The command overhead of a new ATA drive is more like 50 usec, but that is still far too much for high speed with a block size of 512 bytes. The speed is insignificantly different for block sizes larger than a limit because the drive's physical limits dominate except possibly with old (slow) CPUs. >>> That seems to be 2 or about 2 times faster than disc->disc >>> transfer... But still slower, than I would have expected... >>> SATA150 sounds like the drive can do 150MB/sec... > > As Eric pointed out, you just can"t reach 150 MB/s with one disk, it's a > technological maximum for the bus, but real world performance is well bellow > this max. > In fact, I've though I would reach about 50 to 60 MB/s. 50-60 MB/s is about right. I haven't benchmarked any SATA or very new drives. Apparently they are not much faster. ISTR that WDC Raptors are speced for 70-80MB/sec. You pay twice as much to get a tiny drive with only 25% more throughput plus faster seeks. >>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>>> without destroying the previous work... :-)) >>>> >>>> well, not very easy both disk are the same size ;) > >>> I thought of the first 1000 1MB blocks... :-) > > damn, I misread this one... :) > I'm gonna try this asap. I divide disks into equally sized (fairly small, or half the disk size) partitions, and cp between them. dd is too hard to use for me ;-). cp is easier to type and automatically picks a reasonable block size. Of course I use dd if the block size needs to be controlled, but mostly I only use it in preference to cp to get its timing info. >... >> Have you tried a smaller block size? What does 8k, 16k, or 512k do for >> you? There really isn't much room for improvement here on a single device. > > nop, I'll try one of them, but I can't do many experiments, the box is in my > living room, it's a 1U rack, and it's VERY VERY noisy. My girlfriend will > kill me if it's running more than an hour a day :)) Smaller block sizes will go much faster, except for copying from a disk to itself. Large block sizes are normally a pessimization and the pessimization is especially noticeable for dd. Just use the smallest block size that gives an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly different for writing). Large block sizes are pessimal for synchronous i/o like dd does. The timing for dd'ing blocks of size N MB at R MB/sec between ad0 and ad2 is something like: time in secs activity on ad0 activity on ad2 ------------ --------------- --------------- 0 start read of 1MB idle N/R finish read; idle start write of 1MB N/R-epsilon start read of 1MB pretend to complete write N/R continue read complete write N/R-epsilon finish read; idle start write of 1MB N/R-2*epsilon ... ... After the first block (which takes a little longer), it takes N/R-epsilon seconds to copy 1 block, where epsilon is the time between the writer's pretending to complete the write and actually completing it. This time is obviously not very dependent on the block size since it is limited by drives resources and policies (in particular, if the drive doesn't do write caching, perhaps because write caching is not enabled, then epsilon is 0, and if out block size is large compared with the drive's cache then the drive won't be able to signal completion until no more than the drive's cache size is left to do). Thus epsilon becomes small relative to the N/R term when N is large. Apparently, in your case the speed drops from 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200. With large block sizes, the speed can be increased using asyncronous output. There is a utility (in ports) named team that fakes async output using separate processes. I have never used it. Somthing as simple as 2 dd's in a pipe should work OK. For copying from a disk itself, a large block sizes is needed to limit the number of seeks, and concurrent reads and writes are exactly what is not needed (since they would give competing seeks). The i/o must be sequentialized, and dd does the right things for this, though the drive might not (you would prefer epsilon == 0, since if the drive signals write completion early then it might get confused when you flood it with the next read and seek to start the read before it completes the write, then thrash back and forth between writing and reading). It is interesting that writing large sequential files to at least the ffs file system (not mounted with -sync) in FreeBSD is slightly faster than writing directly to the raw disk using write(2), even if the device driver sees almost the same block sizes for these different operations. This is because write(2) is synchronous and sync writes always cause idle periods (the idle periods are just much smaller for writing data that is already in memory), while the kernel uses async writes for data. Bruce From owner-freebsd-performance@FreeBSD.ORG Mon Oct 3 14:55:43 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E5E9916A41F for ; Mon, 3 Oct 2005 14:55:43 +0000 (GMT) (envelope-from tuliogs@pgt.mpt.gov.br) Received: from mail.pgt.mpt.gov.br (mail.pgt.mpt.gov.br [200.157.62.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1DE2243D45 for ; Mon, 3 Oct 2005 14:55:42 +0000 (GMT) (envelope-from tuliogs@pgt.mpt.gov.br) Received: from [10.0.0.136] (516e.pgt.mpt.gov.br [10.0.0.136]) by mail.pgt.mpt.gov.br (8.13.1/8.13.1) with ESMTP id j93EteiT066135 for ; Mon, 3 Oct 2005 11:55:40 -0300 (BRST) (envelope-from tuliogs@pgt.mpt.gov.br) Message-ID: <434146CA.8010803@pgt.mpt.gov.br> Date: Mon, 03 Oct 2005 11:57:14 -0300 From: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk> In-Reply-To: <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk> Content-Type: multipart/mixed; boundary="------------030603030606000800030004" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Oct 2005 14:55:44 -0000 This is a multi-part message in MIME format. --------------030603030606000800030004 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Steven Hartland wrote: > ----- Original Message ----- From: "Arne Wörner" > >> That seems to be 2 or about 2 times faster than disc->disc >> transfer... But still slower, than I would have expected... >> SATA150 sounds like the drive can do 150MB/sec... > > > LOL, you might want to read up on what SATA150 means. > In short it the max throughput the interface can sustain. It is NOT > what you can get of a single disk which is still fare from that, > SATA disk transfer rates typically 30 -> 50MB/s sustained. > > Steve Indeed. In other words, that represents the max transfer rates between the SATA controller and the disk´s controller (at best, you´ll get close to it when reading from the disk´s onboard cache), but the media will always be much slower. But just to clear out some questions... 1) Maxtor´s full specifications for Diamond Max+ 9 Series refers to maximum *sustained* transfer rates of 37MB/s and 67MB/s for "ID" and "OD", respectively (though I couldn´d find exactly what it means, I deduced that represents the rates for center- and border-parts of the disk - please correct me if I´m wrong), then your tests show you´re getting the best out of it ;) ; 2) Mr. Hartland mentioned the numbers to be good for a single drive, therefore it´s a bit better for a disk-to-disk, where the limit should be the slower disk´s performance. I couldn´t look for the specs of the Toshiba since I didn´t have the exact model, but I would expect it to be equal or faster than the Maxtor, since it does not appear to be a bottleneck. One last thought, though, for the specialists: iostat showed maximum of 128KB/transfer, even though dd should be using 1MB blocks... is that an expected behaviour? Shouldn´t iostat show 1024Kb/t, then? Thanks for your attention, Tulio G. da Silva --------------030603030606000800030004-- From owner-freebsd-performance@FreeBSD.ORG Mon Oct 3 15:14:29 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C578916A41F for ; Mon, 3 Oct 2005 15:14:29 +0000 (GMT) (envelope-from tuliogs@pgt.mpt.gov.br) Received: from mail.pgt.mpt.gov.br (mail.pgt.mpt.gov.br [200.157.62.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id CE16043D48 for ; Mon, 3 Oct 2005 15:14:28 +0000 (GMT) (envelope-from tuliogs@pgt.mpt.gov.br) Received: from [10.0.0.136] (516e.pgt.mpt.gov.br [10.0.0.136]) by mail.pgt.mpt.gov.br (8.13.1/8.13.1) with ESMTP id j93F6vGc066486 for ; Mon, 3 Oct 2005 12:06:57 -0300 (BRST) (envelope-from tuliogs@pgt.mpt.gov.br) Message-ID: <4341496F.9020703@pgt.mpt.gov.br> Date: Mon, 03 Oct 2005 12:08:31 -0300 From: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <43401C62.2040606@centtech.com> <618C4F4D-A3F6-4F34-9352-C7C86DC1DD9E@patpro.net> <20051003222844.R44500@delplex.bde.org> In-Reply-To: <20051003222844.R44500@delplex.bde.org> Content-Type: multipart/mixed; boundary="------------080001030101000801070700" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: dd(1) performance when copying a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Oct 2005 15:14:29 -0000 This is a multi-part message in MIME format. --------------080001030101000801070700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Phew, thanks for that. :) This seems to answer my question in the other "leg" of the thread, though it hadn´t yet arrived to me when I wrote the message, though. Now THAT´s a quite good explanation. ;) Thanks again, Tulio G. da Silva Bruce Evans wrote: > On Mon, 3 Oct 2005, Patrick Proniewski wrote: > >>>>> # dd if=/dev/ad4 of=/dev/null bs=1m count=1000 >>>>> 1000+0 records in >>>>> 1000+0 records out >>>>> 1048576000 bytes transferred in 17.647464 secs (59417943 >>>>> bytes/sec) >>>> > > Many wrong answers to the original question have been given. dd with > a blocks size of 1m between (separate) disk devices is much slower > just because that block size is far too large... > > The above is a fairly normal speed. The expected speed depends mainly > on the disk technology generation and the placement of the sectors being > read. I get the following speeds for _sequential_ _reading- from the > outer (fastest) tracks of 6- and 3-year old drives which are about 2 > generations apart: > > %%% > Sep 25 21:52:35 besplex kernel: ad0: 29314MB > [59560/16/63] at ata0-master UDMA100 > Sep 25 21:52:35 besplex kernel: ad2: 58644MB > [119150/16/63] at ata1-master UDMA100 > ad0 bs 512: 16777216 bytes transferred in 2.788209 secs (6017201 > bytes/sec) > ad0 bs 1024: 16777216 bytes transferred in 1.433675 secs (11702245 > bytes/sec) > ad0 bs 2048: 16777216 bytes transferred in 0.787466 secs (21305320 > bytes/sec) > ad0 bs 4096: 16777216 bytes transferred in 0.479757 secs (34970249 > bytes/sec) > ad0 bs 8192: 16777216 bytes transferred in 0.477803 secs (35113250 > bytes/sec) > ad0 bs 16384: 16777216 bytes transferred in 0.462006 secs (36313842 > bytes/sec) > ad0 bs 32768: 16777216 bytes transferred in 0.462038 secs (36311331 > bytes/sec) > ad0 bs 65536: 16777216 bytes transferred in 0.486850 secs (34460748 > bytes/sec) > ad0 bs 131072: 16777216 bytes transferred in 0.462046 secs (36310693 > bytes/sec) > ad0 bs 262144: 16777216 bytes transferred in 0.469866 secs (35706382 > bytes/sec) > ad0 bs 524288: 16777216 bytes transferred in 0.462035 secs (36311555 > bytes/sec) > ad0 bs 1048576: 16777216 bytes transferred in 0.478534 secs (35059612 > bytes/sec) > ad2 bs 512: 16777216 bytes transferred in 4.115675 secs (4076419 > bytes/sec) > ad2 bs 1024: 16777216 bytes transferred in 2.105451 secs (7968466 > bytes/sec) > ad2 bs 2048: 16777216 bytes transferred in 1.132157 secs (14818809 > bytes/sec) > ad2 bs 4096: 16777216 bytes transferred in 0.662452 secs (25325935 > bytes/sec) > ad2 bs 8192: 16777216 bytes transferred in 0.454654 secs (36901065 > bytes/sec) > ad2 bs 16384: 16777216 bytes transferred in 0.304761 secs (55050416 > bytes/sec) > ad2 bs 32768: 16777216 bytes transferred in 0.304761 secs (55050416 > bytes/sec) > ad2 bs 65536: 16777216 bytes transferred in 0.304765 secs (55049683 > bytes/sec) > ad2 bs 131072: 16777216 bytes transferred in 0.304762 secs (55050200 > bytes/sec) > ad2 bs 262144: 16777216 bytes transferred in 0.304760 secs (55050588 > bytes/sec) > ad2 bs 524288: 16777216 bytes transferred in 0.304762 secs (55050200 > bytes/sec) > ad2 bs 1048576: 16777216 bytes transferred in 0.304757 secs (55051148 > bytes/sec) > %%% > > Drive technology hit a speed plateau a few years ago so newer single > drives > aren't much faster unless they are more expensive and/or smaller. > > The speed is low for small block sizes because the device has to be > talked too too much and the protocol and firmware are not very good. > (Another drive, a WDC 120GB with more cache (8MB instead of 2), ramps > up to about half speed (26MB/sec) for a block size of 4K but sticks > at that speed for block sizes 8K and 16K, then jumps up to full speed > for a block sizes of 32K and larger. This indicates some firmware > stupidness). Most drives ramp up almost logarithmically (doubling > the block size almost doubles the speed). This behaviour is especially > evident on slow SCSI drives like some (most?) ZIP and dvd/cd. The > command overhead can be 20 msec, so you had better not do 1 512 bytes > of i/o per command or you will get a speed of 25K/sec. The command > overhead of a new ATA drive is more like 50 usec, but that is still > far too much for high speed with a block size of 512 bytes. > > The speed is insignificantly different for block sizes larger than a > limit because the drive's physical limits dominate except possibly > with old (slow) CPUs. > >>>> That seems to be 2 or about 2 times faster than disc->disc >>>> transfer... But still slower, than I would have expected... >>>> SATA150 sounds like the drive can do 150MB/sec... >>> >> >> As Eric pointed out, you just can"t reach 150 MB/s with one disk, >> it's a technological maximum for the bus, but real world performance >> is well bellow this max. >> In fact, I've though I would reach about 50 to 60 MB/s. > > > 50-60 MB/s is about right. I haven't benchmarked any SATA or very new > drives. Apparently they are not much faster. ISTR that WDC Raptors are > speced for 70-80MB/sec. You pay twice as much to get a tiny drive with > only 25% more throughput plus faster seeks. > >>>>>> (Maybe you could find a way to copy /dev/zero to /dev/ad6 >>>>>> without destroying the previous work... :-)) >>>>> >>>>> >>>>> well, not very easy both disk are the same size ;) >>>> >> >>>> I thought of the first 1000 1MB blocks... :-) >>> >> >> damn, I misread this one... :) >> I'm gonna try this asap. > > > I divide disks into equally sized (fairly small, or half the disk size) > partitions, and cp between them. dd is too hard to use for me ;-). cp > is easier to type and automatically picks a reasonable block size. Of > course I use dd if the block size needs to be controlled, but mostly I > only use it in preference to cp to get its timing info. > >> ... >> >>> Have you tried a smaller block size? What does 8k, 16k, or 512k do >>> for you? There really isn't much room for improvement here on a >>> single device. >> >> >> nop, I'll try one of them, but I can't do many experiments, the box >> is in my living room, it's a 1U rack, and it's VERY VERY noisy. My >> girlfriend will kill me if it's running more than an hour a day :)) > > > Smaller block sizes will go much faster, except for copying from a > disk to > itself. Large block sizes are normally a pessimization and the > pessimization > is especially noticeable for dd. Just use the smallest block size > that gives > an almost-maximal throughput (e.g., 16K for reading ad2 above, possibly > different for writing). Large block sizes are pessimal for synchronous > i/o like dd does. The timing for dd'ing blocks of size N MB at R MB/sec > between ad0 and ad2 is something like: > > time in secs activity on ad0 activity on ad2 > ------------ --------------- --------------- > 0 start read of 1MB idle > N/R finish read; idle start write of 1MB > N/R-epsilon start read of 1MB pretend to complete write > N/R continue read complete write > N/R-epsilon finish read; idle start write of 1MB > N/R-2*epsilon ... ... > > After the first block (which takes a little longer), it takes N/R-epsilon > seconds to copy 1 block, where epsilon is the time between the writer's > pretending to complete the write and actually completing it. This time > is obviously not very dependent on the block size since it is limited by > drives resources and policies (in particular, if the drive doesn't do > write > caching, perhaps because write caching is not enabled, then epsilon is 0, > and if out block size is large compared with the drive's cache then the > drive won't be able to signal completion until no more than the drive's > cache size is left to do). Thus epsilon becomes small relative to the > N/R term when N is large. Apparently, in your case the speed drops from > 59MB/sec to 35MB/sec, so with N == 1 and R == 59, epsilon is about 1/200. > > With large block sizes, the speed can be increased using asyncronous > output. > There is a utility (in ports) named team that fakes async output using > separate processes. I have never used it. Somthing as simple as 2 > dd's in a pipe should work OK. > > For copying from a disk itself, a large block sizes is needed to limit > the > number of seeks, and concurrent reads and writes are exactly what is not > needed (since they would give competing seeks). The i/o must be > sequentialized, and dd does the right things for this, though the drive > might not (you would prefer epsilon == 0, since if the drive signals > write completion early then it might get confused when you flood it > with the next read and seek to start the read before it completes the > write, then thrash back and forth between writing and reading). > > It is interesting that writing large sequential files to at least the > ffs file system (not mounted with -sync) in FreeBSD is slightly faster > than writing directly to the raw disk using write(2), even if the > device driver sees almost the same block sizes for these different > operations. This is because write(2) is synchronous and sync writes > always cause idle periods (the idle periods are just much smaller for > writing data that is already in memory), while the kernel uses async > writes for data. > > Bruce > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > --------------080001030101000801070700-- From owner-freebsd-performance@FreeBSD.ORG Tue Oct 4 00:48:55 2005 Return-Path: X-Original-To: freebsd-performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D390C16A420 for ; Tue, 4 Oct 2005 00:48:55 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E46C43D45 for ; Tue, 4 Oct 2005 00:48:55 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j940mqGb006413; Tue, 4 Oct 2005 10:48:52 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j940mmPD019518; Tue, 4 Oct 2005 10:48:50 +1000 Date: Tue, 4 Oct 2005 10:48:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: =?ISO-8859-1?Q?Tulio_Guimar=E3es_da_Silva?= In-Reply-To: <434146CA.8010803@pgt.mpt.gov.br> Message-ID: <20051004075806.F45947@delplex.bde.org> References: <20051002170446.78674.qmail@web30303.mail.mud.yahoo.com> <004701c5c77e$a8ab4310$b3db87d4@multiplay.co.uk> <434146CA.8010803@pgt.mpt.gov.br> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-580949596-1128386928=:45947" Cc: freebsd-performance@FreeBSD.org Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Oct 2005 00:48:56 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-580949596-1128386928=:45947 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 3 Oct 2005, [ISO-8859-1] Tulio Guimar=E3es da Silva wrote: > But just to clear out some questions... > 1) Maxtor=B4s full specifications for Diamond Max+ 9 Series refers to max= imum=20 > *sustained* transfer rates of 37MB/s and 67MB/s for "ID" and "OD",=20 > respectively (though I couldn=B4d find exactly what it means, I deduced t= hat=20 > represents the rates for center- and border-parts of the disk - please=20 > correct me if I=B4m wrong), then your tests show you=B4re getting the bes= t out of=20 > it ;) ; > much slower. Another interesting point is that you can often get closer to the maximum rate than the average of the maximum and minumum rate. The outer tracks contain more sectors (about 67/37 times as many with the above spec), so the average rate over all sectors is larger than average of the max and min= , significantly so since 67/37 is a fairly large fraction. Also, you can often partition disks to put less-often accessed stuff in the slow parts. > One last thought, though, for the specialists: iostat showed maximum of= =20 > 128KB/transfer, even though dd should be using 1MB blocks... is that an= =20 > expected behaviour? Shouldn=B4t iostat show 1024Kb/t, then? The expected size is 64K. 128KB is due to a bug in GEOM, one that was fixed a couple of days ago by tegge@. iostat shows the size that reaches the disk driver. The best size to show is the size that reaches the disk hardware, but several layers of abstraction, some excessive, make it impossible to show that size: First there is the disk firmware layer above the disk hardware layer. There is no way for the driver to know exacly what the firmware layer is doing. Good firmware will cluster i/o's and otherwise cache things to minimize seeks and other disk accesses, in much the same way that a good OS will do, but hopefully better because it can understand the hardware better and use more specialized algorithms. Next there is the driver layer. Drivers shouldn't split up i/o, but some at least used to, and they now cannot report such splitting to devstat. I can't see any splitting in the ad driver now -- I can only see reduction of the max size from 255 to 128 sectors in the non-DMA case, and the misnamed struct member atadev->max_iosize in this case (this actually gives the max transfer size; in the DMA case, the max transfer size is the same as the max i/o size, but in the non-DMA case it is the number of sectors transferred per interrupt which is usually much smaller than the max i/o size of DFLTPHYS =3D 64K). The fd driver at least used to split up i/o into single sectors. 20-25 years ago when CPUs were slow even compared with floppies, this used to be a good way to pessimize i/o. A few years later, starting with about 386's, CPUs became fast enough to easily generate new requests in the sector gap time so even poorly written fd drivers could keep floppies streaming except across seeks to another track. The fd driver never reported this internal splitting to devstat, and maybe never should have since it is close enough to the hardware to know that this splitting is normal and/or doesn't affect efficiency. Next there is the GEOM layer. It splits up i/o's requested by the next layer up according to the max size advertised by the driver. The latter is typically DFLTPHYS =3D 64K and often unrelated to the hardware; MAXPHYS =3D 128K would be better if the hardware can handle it. Until a couple of days ago, reporting of this splitting was broken. GEOM reported to devstat the size passed to it and not the size that it passed to drivers. tegge@ fixed this. For writes to raw disks, the next layer up is physread(). (Other cases are even more complicated :-).) physread() splits up i/o's into blocks of max size dev->si_iosize_max. This splitting is wrong for tape-like devices but is almost harmless for disk-like devices. Another bug in GEOM Is bitrot in the setting of dev->si_iosize_max. This should normally be the same as the driver max size, and used to be set to the same in in individual drivers in many cases including the ad driver, but now most drivers don't set it and GEOM normally defaults it to the bogus value MAXPHYS =3D 128K. physread() also defaults it, but to the different, safer, value DFLTPHYS =3D 64K. The different max sizes cause excessive splitting. See below for examples. For writes by dd, there are a few more layers (driver read, devfs read, and write(2) at least). So for writes of 1M from dd to an ad device with DMA enabled and the normal DMA size of 64K, the following reblocking occurs: 1M is split into 8*128K by physio() since dev->si_iosize_max is 128K 8*128K is split into 16*64K by GEOM since dp->d_maxsize is mismatched = (64K) dp->max_size is 63K for a couple of controllers in the DMA case and possibl= y always for the acd driver (see the magic 65534 in atapi-cd.c). Then the bogus splitting is more harmful: 1M is split into 8*128K by physio() (no difference) 8*128K is split into 8 * (2*63K + 1*2K) by GEOM The 1*2K splitting is especially pessimal. The afd driver used to have this bug internally, and still has it in RELENG_4. Its max i/o (DMA) size was 32K for ZIP disks that seem to be IOMEGA ones and 126K for other drives. dd'ing to ZIP drives was fast enough if you used a size smaller than the max i/o size (but not very small), or with nice power of 2 sizes for disks that seem to be IOMEGA ones, but a nice size of 128K caused the following bad splitting for non-IOMEGA ones: 128K =3D 1*126K + 1*2K. Since accesses to ZIP disks take about 20 msec per access, the 2K-block almost halved the transfer speed. The normal ata DMA size of 64*1024 is also too magic -- it just happens to equal DFLTPHYS so it only causes 1 bogus splitting in combination with the other bugs. For writes by dd, these bugs are easy to avoid if you know about them or if you just fear them and test all reasonable block sizes to find the best one. Just use a block size large enough to be efficient but small enough to not cause splitting, or in cases where the mismatches are only off-by-a factor-of 2^n, large enough to cause even splitting. For cases other than writes by dd, the bugs cause pessimal splitting. E.g., file system clustering uses yet another bogusly intitialized max i/o size, vp->v_mount->mnt_iosize_max. This defaults to DFLTPHYS =3D 64K in the top vfs layer, but many file systems, including ffs, set it to devvp->v_rdev->si_iosize_max, so it is normally set to the wrong default set for the latter by GEOM, MAXPHYS =3D 128K. This normally causes excessive splitting which is especially harmful if the driver's max is not a divisor of MAXPHYS. E.g., when the driver's max is 63K, writing a 256KB file to an ffs file system with the default fs-block size of 16K causes the following bogus splitting even if ffs allocates all the blocks optimally (contiguously): At ffs level: =0912*16K (direct data blocks) =091*16K (indirect block; but ffs usually gets this wrong and doesn't =09 allocate it contiguously) =094*16K (data blocks indirected through the indirect block) At clustering level: =0917*16K reblocked to 2*128K + 1*16K At device driver level: =092*128K + 1*16K split into 63K, 63K, 2K, 63K, 63K, 2K, 16K So splitting almost half undoes the gathering done by the clustering level (we start with 17 blocks and end with 7). Ideally we would end with 5 (4*63K + 1*20K). Caching in not-very-old drives (but not ZIP or CD/DVD ones) makes stupid blocking not very harmful for reads, but doesn't help so much for writes. Bruce --0-580949596-1128386928=:45947-- From owner-freebsd-performance@FreeBSD.ORG Tue Oct 4 12:14:51 2005 Return-Path: X-Original-To: performance@freebsd.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3FF8816A41F for ; Tue, 4 Oct 2005 12:14:51 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id D371143D48 for ; Tue, 4 Oct 2005 12:14:50 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id CD914BC6D for ; Tue, 4 Oct 2005 12:14:48 +0000 (UTC) To: performance@freebsd.org From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 04 Oct 2005 12:25:21 BST." <20051004122459.E69774@fledge.watson.org> Date: Tue, 04 Oct 2005 14:14:48 +0200 Message-ID: <205.1128428088@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Subject: Re: dd(1) performance when copiing a disk to another (fwd) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Oct 2005 12:14:51 -0000 Robert forwarded this message. >---------- Forwarded message ---------- >Date: Tue, 4 Oct 2005 10:48:48 +1000 (EST) >From: Bruce Evans >To: Tulio Guimar=E3es da Silva >Cc: freebsd-performance@FreeBSD.org >Subject: Re: dd(1) performance when copiing a disk to another I raised this subject early in the GEOM era but got very little feedback, so I decided to sit back and wait until it came up again, and that seems to be now. First issue: chopping requests. In the future we will have even larger I/O requests because (at least we hope) that bio requests will get rid of the antique requirement to be mapped into sequential mapped kernel VM. That means that somebody will have to cut I/O requests up somewhere and it stands to reason that this happens as far down as possible for reasons of memory management and workload avoiddance. So in the future, device drivers will have to accept for all practical purposes infinite bio requests and service them in pieces as best they can. In addition to chopping, drivers/classes which need to access the data in the I/O request will need to request VM mapping of it. Second issue: issuing intelligently sized/aligned requests. Notwithstanding the above, it makes sense to issue requests that work as efficient as possible further down the GEOM mesh. The chopping is one case, and it can (and will) be solved by propagating a non-mandatory size-hint upwards. physio will be able to use this to send down requests that require minimal chopping later on. But the other issue is alignment. For a RAID-5 implementation it is paramount for performance that requests try to align themselves with the stripe size. Other transformations have similar requirements, striping and (gbde) encryption for instance. Therefore in addition to the size hint, a stripe width and stripe alignment hint needs to be passed up and then physio can start to send requests that not only have the right size, but also the right alignment for downstream processing. The outline of this was committed to src/sys/geom/notes around 2½ years ago and the only thing that has changed is that after some consideration I have concluded that the hints should be non-binding for performance reasons. Third issue: The problem extends all the way up to sysinstall. Currently we do systematically shoot RAID-5 performance down by our strict adherence to MBR formatting rules. We reserve the first track of typically 63 sectors to the MBR. The first slice therefore starts in sector number 63. All partitions in that slice inherit that alignment and therefore unless the RAID-5 implementation has a stripe size of 63 sectors, a (too) large fraction of the requests will have one sector in one raid-stripe and the rest in another, which they often fail to fill by exactly one sector. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-performance@FreeBSD.ORG Wed Oct 5 11:17:57 2005 Return-Path: X-Original-To: performance@freebsd.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 54CEC16A41F for ; Wed, 5 Oct 2005 11:17:57 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E43443D45 for ; Wed, 5 Oct 2005 11:17:56 +0000 (GMT) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 2F88350A3E; Wed, 5 Oct 2005 13:17:55 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 6EAC1509F1; Wed, 5 Oct 2005 13:17:45 +0200 (CEST) Date: Wed, 5 Oct 2005 13:17:32 +0200 From: Pawel Jakub Dawidek To: Poul-Henning Kamp Message-ID: <20051005111732.GB17298@garage.freebsd.pl> References: <20051004122459.E69774@fledge.watson.org> <205.1128428088@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="U+BazGySraz5kW0T" Content-Disposition: inline In-Reply-To: <205.1128428088@critter.freebsd.dk> X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 User-Agent: mutt-ng devel (FreeBSD) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: performance@freebsd.org Subject: Re: dd(1) performance when copiing a disk to another (fwd) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Oct 2005 11:17:57 -0000 --U+BazGySraz5kW0T Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 04, 2005 at 02:14:48PM +0200, Poul-Henning Kamp wrote: +> Second issue: issuing intelligently sized/aligned requests. +>=20 +> Notwithstanding the above, it makes sense to issue requests that +> work as efficient as possible further down the GEOM mesh. +>=20 +> The chopping is one case, and it can (and will) be solved by +> propagating a non-mandatory size-hint upwards. physio will +> be able to use this to send down requests that require minimal +> chopping later on. +>=20 +> But the other issue is alignment. For a RAID-5 implementation it +> is paramount for performance that requests try to align themselves +> with the stripe size. Other transformations have similar +> requirements, striping and (gbde) encryption for instance. That's true. When I worked on gstripe I was wondering for a moment about an additional class which will cut IOs for me and gstripe will only decide where to send all the pieces. In this case it was overkill of course. On the other hand, I implemented 'fast' mode in gstripe which is intend to work fast even for very small stripe size, ie. when stripe size is equal to 1kB and we receive 128kB request, we don't send 128 requests down, but only as many requests as many components we have and do all the shuffle magic when reading is done (or before writting). Not sure how it can be achived when some upper layer will split the request for me. How can I avoid sending 128 requests then? +> Third issue: The problem extends all the way up to sysinstall. +>=20 +> Currently we do systematically shoot RAID-5 performance down by our +> strict adherence to MBR formatting rules. We reserve the first +> track of typically 63 sectors to the MBR. +>=20 +> The first slice therefore starts in sector number 63. All partitions +> in that slice inherit that alignment and therefore unless the RAID-5 +> implementation has a stripe size of 63 sectors, a (too) large +> fraction of the requests will have one sector in one raid-stripe +> and the rest in another, which they often fail to fill by exactly +> one sector. Just to be sure I understand it correctly: You're talking about hardware RAID, so bascially exported provider is attached to rank#1 geom? If so this is not the case for software implementation which are configured on top of slices/partitions, instead of raw disks. As a work around we can configure 'a' partition start at offset equal to stripe size, right? Of course it's not a solution for anything, but want to be sure I get the things right. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --U+BazGySraz5kW0T Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (FreeBSD) iD8DBQFDQ7ZMForvXbEpPzQRAi20AJ4zoFKbteSKqDBQpG7T9nkGx7K+IACeJUlH 3idA9HUvRyLNq4g5e8DFyjo= =9DAw -----END PGP SIGNATURE----- --U+BazGySraz5kW0T-- From owner-freebsd-performance@FreeBSD.ORG Wed Oct 5 16:12:15 2005 Return-Path: X-Original-To: performance@FreeBSD.org Delivered-To: freebsd-performance@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EAE4516A41F; Wed, 5 Oct 2005 16:12:15 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7C07643D45; Wed, 5 Oct 2005 16:12:15 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with ESMTP id 83C5C46BA4; Wed, 5 Oct 2005 12:12:14 -0400 (EDT) Date: Wed, 5 Oct 2005 17:12:14 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: performance@FreeBSD.org Message-ID: <20051005133730.R87201@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@FreeBSD.org Subject: Call for performance evaluation: net.isr.direct X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Oct 2005 16:12:16 -0000 In 2003, Jonathan Lemon added initial support for direct dispatch of netisr handlers from the calling thread, as part of his DARPA/NAI Labs contract in the DARPA CHATS research program. Over the last two years since then, Sam Leffler and I have worked to refine this implementation, removing a number of ordering related issues, opportunities for excessive parallelism, recursion issues, and testing with a broad range of network components. There has also been a significant effort to complete MPSAFE locking work throughout the network stack. Combined with the earlier move to ithreads and a functional direct dispatch ("process to completion" implementation), there are a number of exciting possible benefits. - Possible parallelism by packet source -- ithreads can dispatch simultaenously into the higher level network stack layers. Since ithreads can execute in parallel on different CPU, so can code they invoke directly. - Elimination of context switches in the network receive path -- rather than context switching to the netisr thread from the ithread, we can now directly execute netisr code from the ithread. - A CPU-bound netisr thread on a multi-processor system will no longer rate limit traffic to the available resources on one CPU. - Eliminating the additional queueing in the handoff reduces the opportunity for queues to overfill as a result of scheduling delays. There are, however, some possible downsides and/or trade-offs: - Higher level network processing will now compete with the interrupt handler for CPU resources available to the ithread. This means less time for the interrupt code to execute in the thread if the thread is CPU-bound. - Lower levels of parallelism between portions of the inbound packet processing path. Without direct dispatch, there is possible parallelism between receive network driver execution and higher level stack layers, whereas with direct dispatch they can no longer execute in parallel. - Re-queued packets from tunnel and encapsulation processing will now require a context switch to process, since they will be processed in the netisr proper rather than in the ithread, whereas before the netisr thread would pick them up immediately after completing the current processing without a context switch. - Code that previously ran in the SWI at a SWI priority now runs in the ithread at an ithread priority, elevating the general priority at which network processing takes place. And there are a few mixed things, that can offer good and bad elements: - Less queueing takes place in the network stack in in-bound processing: packets are taken directly from the driver and processed to completion one by one, rather than queued for batch processing. Packets will be dropped before the link layer, rather than on the boundary between the link and protocol layers. This is good in that we invest less work in packets we were going to drop anyway, but bad in that less queueing means less room for scheduling delays. In previous FreeBSD releases, such as several 5.x series releases, net.isr.enable could not be turned on by default because there was insufficient synchronization in the network stack. As of 5.5 and 6.0, I believe there is sufficient synchronization, especially given that we force non-MPSAFE protocol handlers to run in the netisr without direct dispatch. As such, there has been a gradual conversation going on about making direct dispatch the default behavior in the 7.x development series, and more publically documenting and supporting the use of direct dispatch in the 6.x release engineering series. Obviously, this is about two things: performance, and stability. Many of us have been running with direct dispatch on by default for quite some time, so it passes some of the basic "does it run" tests. However, since it significantly increases the opportunity for parallelism in the receive path of the network stack, it likely will trigger otherwise latent or infrequent races and bugs to occur more frequently. The second aspect is performance: many results suggest that direct dispatch has a significant performance benefit. However, evaluating the impact on a broad range of results is required in order for us to go ahead with what is effectively a significant architectural change in how we perform network stack processing. To give you a sense of some of the performance effect I've measured recently, using the netperf measurement tool (with -DHISTOGRAM removed from the FreeBSD port build), here are some results. In each case, I've put parenthesis around host or router to indicate which is the host where the configuration change is being tested. These tests were performed using dual Xeon systems, and using back-to-back gigabit ethernet cards and the if_em driver: TCP round trip benchmark (TCP_RR), host-(host): 7.x UP: 0.9% performance improvement 7.x SMP: 0.7% performance improvement TCP round trip benchmark (TCP_RR), host-(router)-host: 7.x UP: 2.4% performance improvement 7.x SMP: 2.9% performance improvement UDP round trip benchmark (UDP_RR), host-(host): 7.x UP: 0.7% performance improvement 7.x SMP: 0.6% performance improvement UDP round trip benchmark (UDP_RR), host-(router)-host: 7.x UP: 2.2% performance improvement 7.x SMP: 3.0% performance improvement TCP stream banchmark (TCP_STREAM), host-(host): 7.x UP: 0.8% performance improvement 7.x SMP: 1.8% performance improvement TCP stream benchmark (TCP_STREAM), host-(router)-host: 7.x UP: 13.6% performance improvement 7.x SMP: 15.7% performance improvement UDP stream benchmark (UDP_STREAM), host-(host): 7.x UP: none 7.x SMP: none UDP stream benchmark (UDP_STREAM), host-(router)-host: 7.x UP: none 7.x SMP: none TCP connect benchmark (src/tools/tools/netrate/tcpconnect) 7.x UP: 7.90383% +/- 0.553773% 7.x SMP: 12.2391% +/- 0.500561% So in some cases, the impact is negligible -- in other places, it is quite significant. So far, I've not measured a case where performance has gotten worse, but that's probably because I've only been measuring a limited number of cases, and with a fairly limited scope of configurations, especially given that the hardware I have is pushing the limits of what the wire supports, so minor changes in latency are possible, but not large changes in throughput. So other than a summary of the status quo, this is also a call to action. I would like to get more widespread benchmarking of the impact of direct dispatch on network-related workloads. This means a variety of things: (1) Performance of low level network services, such as routing, bridging, and filtering. (2) Performance of high level application servces, such as web and database. (3) Performance of integrated kernel network services, such as the NFS client and server. (4) Performance of user space distributed file systems, such as Samba and AFS. All you need to do to switch to direct dispatch mode is set the sysctl or tunable "net.isr.dispatch" to 1. To disable it again, remove the setting, or set it to 0. It can be modified at run-time, although during the transition from one mode to the other, there may be a small quantity of packet misordering, so benchmarking over the transition is discouraged. FYI: as of 6.0-RC1 and recent 7.0, net.isr.dispatch is the name of the variable. In earlier releases, the name of this variable was net.isr.enable. Some important details: - Only non-local protocol traffic is affected: loopback traffic still goes via the netisr to avoid issues of recursion and lock order. - In the general case, only in-bound traffic is directly affected by this change. As such, send-only benchmarks may reveal little change. They are still interesting, however. - However, the send path is indirectly affected due to changes in scheduling, workload, interrupt handling, and so on. - Because network benchmarks, especially micro-benchmarks, are especially sensitive to minor perturbations, I highly recommend running in a minimal multi-user or ideally single-user environment, and suggest isolating undesired sources of network traffic from segments where testing is occuring. For macro-benchmarks this can be less important, but should be paid attention to. - Please make sure debugging features are turned off when running tests -- especially WITNESS, INVARIANTS, INVARIANT_SUPPORT, and user space malloc debugging. These can have a significant impact on performance, both potentially overshadowing changes, and in some cases, actually reversing results (due to higher overhead under locks, for example). - Do not use net.isr.enable in the 5.x line unless you know what you are doing. While it is reasonably safe with 5.4 forwards, it is not a supported configuration, and may cause stability issues with specific workloads. - What we're particularly interested in is a statistically meaningful comparison of the "before" and "after" case. When doing measurements, I like to run 10-12 samples, and usually discard the first one or two, depending on the details of the benchmark. I'll then use src/tools/tools/ministat to compare the data sets. Running a number of samples is quite important, because the variance in many tests can be significant, and if the two sample sets overlap, you can quite easily draw the entirely wrong conclusion about the results from a small number of measurements in a sample. Assuming you have a fixed width font, typicaly output from ministat looks something like the following and may be human readable: x 7SMP/tcpconnect_queue + 7SMP/tcpconnect_direct +--------------------------------------------------------------------------+ |x xx + +| |xxxxx xx ++ +++++ +| ||__A__| |___A__| | +--------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 5425 5503 5460 5456.3 26.284977 + 10 6074 6169 6126 6124.1 31.606785 Difference at 95.0% confidence 667.8 +/- 27.3121 12.2391% +/- 0.500561% (Student's t, pooled s = 29.0679) Of particular interest is if changing to direct dispatch hurts performance in your environment, and understanding why that is. Thanks, Robert N M Watson From owner-freebsd-performance@FreeBSD.ORG Wed Oct 5 17:30:28 2005 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DED2216A41F for ; Wed, 5 Oct 2005 17:30:28 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from smtp4-g19.free.fr (smtp4-g19.free.fr [212.27.42.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id 823C143D45 for ; Wed, 5 Oct 2005 17:30:28 +0000 (GMT) (envelope-from patpro@patpro.net) Received: from [10.0.2.2] (boleskine.patpro.net [82.235.12.223]) by smtp4-g19.free.fr (Postfix) with ESMTP id 56EA72CB1F for ; Wed, 5 Oct 2005 19:30:27 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v734) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <3509E78F-E90F-4056-AD4C-FDDDC41A4A46@patpro.net> Content-Transfer-Encoding: 7bit From: Patrick Proniewski Date: Wed, 5 Oct 2005 19:30:24 +0200 To: freebsd-performance@freebsd.org X-Mailer: Apple Mail (2.734) Subject: Re: dd(1) performance when copiing a disk to another X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Oct 2005 17:30:29 -0000 Hi, thank you all for these interesting explanations. I've made some more tests with my disks : As you'll see, for block size greater than 64k, the HDD ad6 (hitachi) is the bottleneck. bs of 1m and 512k yield to best transfert rates between ad4 and ad6 and using a pipe between to dd will lower the performance. best regards, and thank you again, Pat, #### /dev/zero to ad6 # dd if=/dev/zero of=/dev/ad6 bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 31.047655 secs (33773114 bytes/sec) # dd if=/dev/zero of=/dev/ad6 bs=8k count=128000 128000+0 records in 128000+0 records out 1048576000 bytes transferred in 31.580223 secs (33203565 bytes/sec) #### ad4 (SATA150) to ad6 (SATA150) # dd if=/dev/ad4 of=/dev/ad6 bs=8k count=128000 128000+0 records in 128000+0 records out 1048576000 bytes transferred in 50.916216 secs (20594146 bytes/sec) # dd if=/dev/ad4 of=/dev/ad6 bs=64k count=16000 16000+0 records in 16000+0 records out 1048576000 bytes transferred in 30.925397 secs (33906630 bytes/sec) # dd if=/dev/ad4 of=/dev/ad6 bs=128k count=8000 8000+0 records in 8000+0 records out 1048576000 bytes transferred in 31.462153 secs (33328170 bytes/sec) # dd if=/dev/ad4 of=/dev/ad6 bs=256k count=4000 4000+0 records in 4000+0 records out 1048576000 bytes transferred in 30.819234 secs (34023428 bytes/sec) # dd if=/dev/ad4 of=/dev/ad6 bs=512k count=2000 2000+0 records in 2000+0 records out 1048576000 bytes transferred in 30.589651 secs (34278783 bytes/sec) # dd if=/dev/ad4 of=/dev/ad6 bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 30.660553 secs (34199514 bytes/sec) # dd if=/dev/ad4 bs=1m count=1000 | dd of=/dev/ad6 bs=1m 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 33.998716 secs (30841635 bytes/sec) 0+16000 records in 0+16000 records out 1048576000 bytes transferred in 34.001099 secs (30839474 bytes/sec)