From owner-freebsd-stable@FreeBSD.ORG  Tue Oct  5 06:47:22 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9A9791065674
	for <freebsd-stable@freebsd.org>; Tue,  5 Oct 2010 06:47:22 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 776988FC14
	for <freebsd-stable@freebsd.org>; Tue,  5 Oct 2010 06:47:22 +0000 (UTC)
Received: from omta04.emeryville.ca.mail.comcast.net ([76.96.30.35])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id Eufz1f0020lTkoCA7unNq2; Tue, 05 Oct 2010 06:47:22 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta04.emeryville.ca.mail.comcast.net with comcast
	id EunL1f0083LrwQ28QunMVb; Tue, 05 Oct 2010 06:47:21 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 863CF9B427; Mon,  4 Oct 2010 23:47:20 -0700 (PDT)
Date: Mon, 4 Oct 2010 23:47:20 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Dan Langille <dan@langille.org>
Message-ID: <20101005064720.GA25231@icarus.home.lan>
References: <AANLkTik0aTDDSNRUBvfX5sMfhW+-nfSV9Q89v+eJo0ov@mail.gmail.com>
	<4C511EF8-591C-4BB9-B7AA-30D5C3DDC0FF@langille.org>
	<AANLkTinyHZ1r39AYrV_Wwc2H3B=xMv3vbeDLY2Gc+kez@mail.gmail.com>
	<4CA68BBD.6060601@langille.org> <4CA929A8.6000708@langille.org>
	<AANLkTi=-JcAXW3wfJZoQMoQX3885GFpDAJ2Pa3OLKSUE@mail.gmail.com>
	<4CA981FC.80405@FreeBSD.org>
	<283dbba8841ab6da40c1d72b05fda618.squirrel@nyi.unixathome.org>
	<20101004181032.GA12998@icarus.home.lan>
	<4CAA65A4.9080108@langille.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <4CAA65A4.9080108@langille.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable <freebsd-stable@freebsd.org>,
	Martin Matuska <mm@FreeBSD.org>, Artem Belevich <fbsdlist@src.cx>
Subject: Re: zfs send/receive: is this slow?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Oct 2010 06:47:22 -0000

On Mon, Oct 04, 2010 at 07:39:16PM -0400, Dan Langille wrote:
> On 10/4/2010 2:10 PM, Jeremy Chadwick wrote:
> >On Mon, Oct 04, 2010 at 01:31:07PM -0400, Dan Langille wrote:
> >>
> >>On Mon, October 4, 2010 3:27 am, Martin Matuska wrote:
> >>>Try using zfs receive with the -v flag (gives you some stats at the end):
> >>># zfs send storage/bacula@transfer | zfs receive -v
> >>>storage/compressed/bacula
> >>>
> >>>And use the following sysctl (you may set that in /boot/loader.conf, too):
> >>># sysctl vfs.zfs.txg.write_limit_override=805306368
> >>>
> >>>I have good results with the 768MB writelimit on systems with at least
> >>>8GB RAM. With 4GB ram, you might want to try to set the TXG write limit
> >>>to a lower threshold (e.g. 256MB):
> >>># sysctl vfs.zfs.txg.write_limit_override=268435456
> >>>
> >>>You can experiment with that setting to get the best results on your
> >>>system. A value of 0 means using calculated default (which is very high).
> >>
> >>I will experiment with the above.  In the meantime:
> >>
> >>>During the operation you can observe what your disks actually do:
> >>>a) via ZFS pool I/O statistics:
> >>># zpool iostat -v 1
> >>>b) via GEOM:
> >>># gstat -a
> >>
> >>The following output was produced while the original copy was underway.
> >>
> >>$ sudo gstat -a -b -I 20s
> >>dT: 20.002s  w: 20.000s
> >>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
> >>     7    452    387  24801    9.5     64   2128    7.1   79.4  ada0
> >>     7    452    387  24801    9.5     64   2128    7.2   79.4  ada0p1
> >>     4    492    427  24655    6.7     64   2128    6.6   63.0  ada1
> >>     4    494    428  24691    6.9     65   2127    6.6   66.9  ada2
> >>     8    379    313  24798   13.5     65   2127    7.5   78.6  ada3
> >>     5    372    306  24774   14.2     64   2127    7.5   77.6  ada4
> >>    10    355    291  24741   15.9     63   2127    7.4   79.6  ada5
> >>     4    380    316  24807   13.2     64   2128    7.7   77.0  ada6
> >>     7    452    387  24801    9.5     64   2128    7.4   79.7  gpt/disk06-live
> >>     4    492    427  24655    6.7     64   2128    6.7   63.1  ada1p1
> >>     4    494    428  24691    6.9     65   2127    6.6   66.9  ada2p1
> >>     8    379    313  24798   13.5     65   2127    7.6   78.6  ada3p1
> >>     5    372    306  24774   14.2     64   2127    7.6   77.6  ada4p1
> >>    10    355    291  24741   15.9     63   2127    7.5   79.6  ada5p1
> >>     4    380    316  24807   13.2     64   2128    7.8   77.0  ada6p1
> >>     4    492    427  24655    6.8     64   2128    6.9   63.4  gpt/disk01-live
> >>     4    494    428  24691    6.9     65   2127    6.8   67.2  gpt/disk02-live
> >>     8    379    313  24798   13.5     65   2127    7.7   78.8  gpt/disk03-live
> >>     5    372    306  24774   14.2     64   2127    7.8   77.8  gpt/disk04-live
> >>    10    355    291  24741   15.9     63   2127    7.7   79.8  gpt/disk05-live
> >>     4    380    316  24807   13.2     64   2128    8.0   77.2  gpt/disk07-live
> >>
> >>$ zpool ol iostat 10
> >>                capacity     operations    bandwidth
> >>pool         used  avail   read  write   read  write
> >>----------  -----  -----  -----  -----  -----  -----
> >>storage     8.08T  4.60T    364    161  41.7M  7.94M
> >>storage     8.08T  4.60T    926    133   112M  5.91M
> >>storage     8.08T  4.60T    738    164  89.0M  9.75M
> >>storage     8.08T  4.60T  1.18K    179   146M  8.10M
> >>storage     8.08T  4.60T  1.09K    193   135M  9.94M
> >>storage     8.08T  4.60T   1010    185   122M  8.68M
> >>storage     8.08T  4.60T  1.06K    184   131M  9.65M
> >>storage     8.08T  4.60T    867    178   105M  11.8M
> >>storage     8.08T  4.60T  1.06K    198   131M  12.0M
> >>storage     8.08T  4.60T  1.06K    185   131M  12.4M
> >>
> >>Yeterday's write bandwidth was more 80-90M.  It's down, a lot.
> >>
> >>I'll look closer this evening.
> >>
> >>
> >>>
> >>>mm
> >>>
> >>>DÅˆa 4. 10. 2010 4:06, Artem Belevich  wrote / napÃ­sal(a):
> >>>>On Sun, Oct 3, 2010 at 6:11 PM, Dan Langille<dan@langille.org>  wrote:
> >>>>>I'm rerunning my test after I had a drive go offline[1].  But I'm not
> >>>>>getting anything like the previous test:
> >>>>>
> >>>>>time zfs send storage/bacula@transfer | mbuffer | zfs receive
> >>>>>storage/compressed/bacula-buffer
> >>>>>
> >>>>>$ zpool iostat 10 10
> >>>>>               capacity     operations    bandwidth
> >>>>>pool         used  avail   read  write   read  write
> >>>>>----------  -----  -----  -----  -----  -----  -----
> >>>>>storage     6.83T  5.86T      8     31  1.00M  2.11M
> >>>>>storage     6.83T  5.86T    207    481  25.7M  17.8M
> >>>>
> >>>>It may be worth checking individual disk activity using gstat -f 'da.$'
> >>>>
> >>>>Some time back I had one drive that was noticeably slower than the
> >>>>rest of the  drives in RAID-Z2 vdev and was holding everything back.
> >>>>SMART looked OK, there were no obvious errors and yet performance was
> >>>>much worse than what I'd expect. gstat clearly showed that one drive
> >>>>was almost constantly busy with much lower number of reads and writes
> >>>>per second than its peers.
> >>>>
> >>>>Perhaps previously fast transfer rates were due to caching effects.
> >>>>I.e. if all metadata already made it into ARC, subsequent "zfs send"
> >>>>commands would avoid a lot of random seeks and would show much better
> >>>>throughput.
> >>>>
> >>>>--Artem
> >
> >Please read all of the following items before responding in-line.  Some
> >are just informational items for other people reading the thread.
> 
> Thanks for the help.  :)
> 
> >
> >1) I don't see any indication in the thread of what "zpool status" looks
> >like on this machine.  I'd like to assume it's the same machine in
> >another thread you started, but I can't reliably make that assumption.
> 
> It is the same machine as the one that had the ada0 problem over the
> weekend (as shown in the other thread).
> 
> $ zpool status -v
>   pool: storage
>  state: ONLINE
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scrub: scrub completed after 13h48m with 0 errors on Mon Oct  4
> 16:54:15 2010
> config:
> 
>         NAME                 STATE     READ WRITE CKSUM
>         storage              ONLINE       0     0     0
>           raidz2             ONLINE       0     0     0
>             gpt/disk01-live  ONLINE       0     0     0
>             gpt/disk02-live  ONLINE       0     0     0
>             gpt/disk03-live  ONLINE       0     0     0
>             gpt/disk04-live  ONLINE       0     0     0
>             gpt/disk05-live  ONLINE       0     0     0
>             gpt/disk06-live  ONLINE       0     0  141K  3.47G repaired
>             gpt/disk07-live  ONLINE       0     0     0
> 
> errors: No known data errors
> 
> 
> NOTE: an automatic scrub just ended...

scrubbing is a highly intensive process, so I'm not too surprised that
your throughput was garbage since you had a scrub running
simultaneously.  This is normal/expected, and happens on both FreeBSD
ZFS and Solaris ZFS.  You shouldn't be testing performance of things
during a scrub.

> >2) All we know about the controller/disks is from your original dmesg.
> >I've taken the below from it, including only the stuff relevant to
> >disks ada[0-6]:
> >
> >siis0:<SiI3124 SATA controller>  port 0xdc00-0xdc0f mem 0xfbeffc00-0xfbeffc7f,0xfbef0000-0xfbef7fff irq 17 at device 4.0 on pci6
> >siisch0:<SIIS channel>  at channel 0 on siis0
> >siisch1:<SIIS channel>  at channel 1 on siis0
> >siisch2:<SIIS channel>  at channel 2 on siis0
> >siisch3:<SIIS channel>  at channel 3 on siis0
> >siis1:<SiI3124 SATA controller>  port 0xbc00-0xbc0f mem 0xfbcffc00-0xfbcffc7f,0xfbcf0000-0xfbcf7fff irq 19 at device 4.0 on pci3
> >siisch4:<SIIS channel>  at channel 0 on siis1
> >siisch5:<SIIS channel>  at channel 1 on siis1
> >siisch6:<SIIS channel>  at channel 2 on siis1
> >siisch7:<SIIS channel>  at channel 3 on siis1
> >ada0 at siisch0 bus 0 scbus0 target 0 lun 0
> >ada0:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada0: Command Queueing enabled
> >ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada1 at siisch2 bus 0 scbus2 target 0 lun 0
> >ada1:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada1: Command Queueing enabled
> >ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada2 at siisch3 bus 0 scbus3 target 0 lun 0
> >ada2:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada2: Command Queueing enabled
> >ada2: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada3 at siisch4 bus 0 scbus4 target 0 lun 0
> >ada3:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada3: Command Queueing enabled
> >ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada4 at siisch5 bus 0 scbus5 target 0 lun 0
> >ada4:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada4: Command Queueing enabled
> >ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada5 at siisch6 bus 0 scbus6 target 0 lun 0
> >ada5:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada5: Command Queueing enabled
> >ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >ada6 at siisch7 bus 0 scbus7 target 0 lun 0
> >ada6:<Hitachi HDS722020ALA330 JKAOA28A>  ATA-8 SATA 2.x device
> >ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> >ada6: Command Queueing enabled
> >ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> >
> >2) There are no other devices on the machine which share IRQ 17 or 19,
> >but that aside, please provide "vmstat -i" output during the time where
> >the transfer is happening.  I doubt this is the problem though.
> >"pciconf -lvc" would also be useful (include only the siis0 and siis1
> >devices), just to rule out any oddities there (I doubt any).
> 
> 
>  $ vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           5          0
> irq16: ohci0 ohci1                     3          0
> irq18: ohci2 ohci3+                    4          0
> irq20: ahc0                           15          0
> irq22: ahci0                      849710          5
> cpu0: timer                    328126052       1999
> irq256: em0                     72404742        441
> irq257: siis0                  107712551        656
> irq259: siis1                  119319384        727
> cpu1: timer                    328107354       1999
> cpu2: timer                    328107355       1999
> cpu3: timer                    328107353       1999
> Total                         1612734528       9829
> 
>  $ sudo pciconf -lvc
> [...]
> siis0@pci0:6:4:0:       class=0x010400 card=0x71241095 chip=0x31241095 rev=0x01 hdr=0x00
>     vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
>     device     = 'PCI-X to Serial ATA Controller (SiI 3124)'
>     class      = mass storage
>     subclass   = RAID
>     cap 01[64] = powerspec 2  supports D0 D1 D2 D3  current D0
>     cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
>     cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message
> siis1@pci0:3:4:0:       class=0x010400 card=0x71241095 chip=0x31241095 rev=0x01 hdr=0x00
>     vendor     = 'Silicon Image Inc (Was: CMD Technology Inc)'
>     device     = 'PCI-X to Serial ATA Controller (SiI 3124)'
>     class      = mass storage
>     subclass   = RAID
>     cap 01[64] = powerspec 2  supports D0 D1 D2 D3  current D0
>     cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 12 split transactions
>     cap 05[54] = MSI supports 1 message, 64 bit enabled with 1 message

I asked for just the only the siis0 and siis1 devices, sigh.  Sorry to
bark, but when I outline something explicitly and someone can't follow
directions, it's irritating.

Thanks for the vmstat -i output.  When the transfer is finished, run the
command again and look for the interrupt rates associated with siis0 and
siis1.  They should be significantly lower than they are above.  If not,
something odd is going on interrupt-wise.  Translation: you shouldn't be
seeing 656 ints/sec or 727 ints/sec when there's little to no I/O on the
device.  If there is, there's a problem (driver bug, etc.).

> >3) Have you done "sysctl kstat.zfs.misc.arcstats" during the transfer to
> >see which counters get incremented?  If I remember right, the ARC being
> >"starved" or under high utilisation can cause slow I/O, especially in
> >the case where you were seeing decent I/O in the past but not now.  I
> >think kstat.zfs.misc.arcstats.memory_throttle_count monitors this, but
> >it'd be best to watch everything.
> 
> 
> The transfers is still underway so I just ran the command:
> 
> $ sysctl kstat.zfs.misc.arcstats
> kstat.zfs.misc.arcstats.hits: 19654617
> kstat.zfs.misc.arcstats.misses: 18760397
> kstat.zfs.misc.arcstats.demand_data_hits: 17919024
> kstat.zfs.misc.arcstats.demand_data_misses: 142871
> kstat.zfs.misc.arcstats.demand_metadata_hits: 1734974
> kstat.zfs.misc.arcstats.demand_metadata_misses: 534816
> kstat.zfs.misc.arcstats.prefetch_data_hits: 0
> kstat.zfs.misc.arcstats.prefetch_data_misses: 17941342
> kstat.zfs.misc.arcstats.prefetch_metadata_hits: 619
> kstat.zfs.misc.arcstats.prefetch_metadata_misses: 141368
> kstat.zfs.misc.arcstats.mru_hits: 19057650
> kstat.zfs.misc.arcstats.mru_ghost_hits: 85075
> kstat.zfs.misc.arcstats.mfu_hits: 596480
> kstat.zfs.misc.arcstats.mfu_ghost_hits: 64187
> kstat.zfs.misc.arcstats.allocated: 40949126
> kstat.zfs.misc.arcstats.deleted: 38057232
> kstat.zfs.misc.arcstats.stolen: 31511274
> kstat.zfs.misc.arcstats.recycle_miss: 248009
> kstat.zfs.misc.arcstats.mutex_miss: 17349
> kstat.zfs.misc.arcstats.evict_skip: 1981764
> kstat.zfs.misc.arcstats.evict_l2_cached: 0
> kstat.zfs.misc.arcstats.evict_l2_eligible: 2504097772032
> kstat.zfs.misc.arcstats.evict_l2_ineligible: 2377474798592
> kstat.zfs.misc.arcstats.hash_elements: 54548
> kstat.zfs.misc.arcstats.hash_elements_max: 90455
> kstat.zfs.misc.arcstats.hash_collisions: 21246248
> kstat.zfs.misc.arcstats.hash_chains: 14690
> kstat.zfs.misc.arcstats.hash_chain_max: 11
> kstat.zfs.misc.arcstats.p: 754974720
> kstat.zfs.misc.arcstats.c: 805306368
> kstat.zfs.misc.arcstats.c_min: 805306368
> kstat.zfs.misc.arcstats.c_max: 6442450944
> kstat.zfs.misc.arcstats.size: 805091968
> kstat.zfs.misc.arcstats.hdr_size: 11792592
> kstat.zfs.misc.arcstats.data_size: 791902208
> kstat.zfs.misc.arcstats.other_size: 1397168
> kstat.zfs.misc.arcstats.l2_hits: 0
> kstat.zfs.misc.arcstats.l2_misses: 0
> kstat.zfs.misc.arcstats.l2_feeds: 0
> kstat.zfs.misc.arcstats.l2_rw_clash: 0
> kstat.zfs.misc.arcstats.l2_read_bytes: 0
> kstat.zfs.misc.arcstats.l2_write_bytes: 0
> kstat.zfs.misc.arcstats.l2_writes_sent: 0
> kstat.zfs.misc.arcstats.l2_writes_done: 0
> kstat.zfs.misc.arcstats.l2_writes_error: 0
> kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
> kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
> kstat.zfs.misc.arcstats.l2_evict_reading: 0
> kstat.zfs.misc.arcstats.l2_free_on_write: 0
> kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
> kstat.zfs.misc.arcstats.l2_cksum_bad: 0
> kstat.zfs.misc.arcstats.l2_io_error: 0
> kstat.zfs.misc.arcstats.l2_size: 0
> kstat.zfs.misc.arcstats.l2_hdr_size: 0
> kstat.zfs.misc.arcstats.memory_throttle_count: 157695
> kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
> kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
> kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
> kstat.zfs.misc.arcstats.l2_write_in_l2: 0
> kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
> kstat.zfs.misc.arcstats.l2_write_not_cacheable: 18613226
> kstat.zfs.misc.arcstats.l2_write_full: 0
> kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
> kstat.zfs.misc.arcstats.l2_write_pios: 0
> kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
> kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
> kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0

Thanks.  Others more familiar with these counters will need to analyse
them and comment on what could be happening.

> >4) The machine has 4GB of RAM; that said, what /boot/loader.conf tunings
> >are in place on this system?
> 
> $ cat /boot/loader.conf
> geom_mirror_load="YES"
> ahci_load="YES"
> siis_load="YES"
> 
> vm.kmem_size=7G

Thanks.  How about output from "sysctl -a | egrep '^vfs.zfs.arc_'" when
the I/O rate is horrible?  I don't see vfs.zfs.arc_max being set, which
is why I ask.

> >5) The machine where "sudo gstat -a -b -I 20s" is being run from shows
> >both reads and writes happening at the same time (approximately
> >24MBytes/sec read and 2MBytes/sec write, on each individual device).
> >Worth noting.
> >
> >6) Are the abysmal performance numbers here with or without mbuffer?
> >Your numbers without mbuffer were around 40MB/read and 40MB/write (per
> >device), while with mbuffer (once tuned) were around 110MB/read and
> >80MB/write (per device).
> 
> With mbuffer:
> 
> time zfs send storage/bacula@transfer | mbuffer | zfs receive
> storage/compressed/bacula-buffer
> 
> 
> >7) ZFS compression is in use on the filesystems on this system.
> 
> Compression is on the destination only.  Source and destination are
> on the same machine.
> 
> $ zfs list
> NAME                                USED  AVAIL  REFER  MOUNTPOINT
> storage                            6.07T  2.81T  1.75G  /storage
> storage/bacula                     4.85T  2.81T  4.29T  /storage/bacula
> storage/compressed                 1.21T  2.81T  44.8K  /storage/compressed
> storage/compressed/bacula-mbuffer  1.21T  2.81T  1.21T
> /storage/compressed/bacula-mbuffer
> storage/pgsql                      5.50G  2.81T  5.50G  /storage/pgsql
> 
> >8) Since you're testing zfs send/receive, don't you need to show what's
> >happening on *both* systems (the sender and the recipient)?
> 
> Sender = recipient.  It's all in the same machine.

Is this even a legitimate test when it comes to benchmarking I/O?  Your
disks will be thrashing heavily (indicated by the %busy field in gstat):

> >Footnote: Please use "-f '^ada[0-9]+$' in the future on your gstat, to
> >include only the adaXX devices.  The xxxp1 and gpt entries aren't of
> >interest.
> 
> That confirms what I was playing with earlier.  Thanks.
> 
> dT: 1.001s  w: 1.000s  filter: ^ada[0-9]+$
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0    395    116   8561   10.8    276   8830    5.7   77.7| ada0
>     1    408    107   8419   12.0    299   8851    5.7   64.0| ada1
>     1    400    111   8677   15.6    287   8663    8.6   77.9| ada2
>     1    384    107   8587   12.7    275   9246    8.2   70.6| ada3
>     1    393    109   8419   12.6    282   9292    7.8   68.7| ada4
>     1    385    107   8445   16.0    276   9385    7.7   72.8| ada5
>     1    381    101   8575   15.3    278   9428    8.0   67.1| ada6
>     0      0      0      0    0.0      0      0    0.0    0.0| ada7
>     0      0      0      0    0.0      0      0    0.0    0.0| ada8

I don't really have anything else to say at this point.  Benchmarks
being done during a scrub, in addition to send/receive being done on the
same box?  I'm not sure there will be much to investigate.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |