From owner-freebsd-fs@FreeBSD.ORG  Wed Dec  2 21:11:32 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F09171065679
	for <freebsd-fs@freebsd.org>; Wed,  2 Dec 2009 21:11:31 +0000 (UTC)
	(envelope-from kevin@your.org)
Received: from mail.your.org (chi02.mail.your.org [204.9.55.23])
	by mx1.freebsd.org (Postfix) with ESMTP id B71F88FC14
	for <freebsd-fs@freebsd.org>; Wed,  2 Dec 2009 21:11:31 +0000 (UTC)
Received: from mail.your.org (chi02.mail.your.org [204.9.55.23])
	by mail.your.org (Postfix) with ESMTP id 221991806A6F
	for <freebsd-fs@freebsd.org>; Wed,  2 Dec 2009 20:55:24 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=your.org; h=from
	:content-type:content-transfer-encoding:subject:date:message-id
	:to:mime-version; s=selector1; bh=7tg1p5XR2M9gxbTVFs2SrAYQ+Ls=;
	b=o6oceVUY/VIJqJzl3BvSru9D0wSXIU0ykIDWQ4xMKK5nWdaWbZ/ApqRyEKUVX
	2GY8EVfocvTRN5FjGPUPKOssSja/2RvzKuhx6b3hF5b6huPFLN9RN5EbKEdlf3i4
	5mtgRTSGpO48+8IikTtK2y9xrcJg7gOoNvN0DXebrtW78I=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=your.org; h=from:content-type
	:content-transfer-encoding:subject:date:message-id:to:
	mime-version; q=dns; s=selector1; b=b881oy/Z5C6Y63i+7nmK3GJhWuYP
	1M3PMZBmO4FLQAO3ymHY3Qh1OxkaWGAUzTqjMO3bv5vfPP6iTZ2LPjbzQcYWd7D2
	ACaRyki/9D8jm/twJaMJYtYsBjKZHovb2GNAQw4l86YW3PxiEpEG702qaLAkrP2R
	dHSY1KayP9xwwWI=
Received: from vpn177.ord02.your.org (vpn177.ord02.your.org [204.9.55.177])
	by mail.your.org (Postfix) with ESMTPA id BCB8A1806A6D
	for <freebsd-fs@freebsd.org>; Wed,  2 Dec 2009 20:55:23 +0000 (UTC)
From: Kevin <kevin@your.org>
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Date: Wed, 2 Dec 2009 14:55:23 -0600
Message-Id: <30582AF2-B1E8-4C07-A487-C220845963D2@your.org>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1076)
X-Mailer: Apple Mail (2.1076)
Subject: "zfs receive" lock time
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Dec 2009 21:11:32 -0000


I have two very very fast systems (12-disk 15krpm raid array, 16  
cores, etc). I'm using zfs send/receive to replicate a zfs volume from  
the "master" box to the "slave" box.

Every minute, the master takes a new snapshot, then uses "send -i" to  
send an incremental snapshot to the slave. Normally, no files are  
changed during the minute so the operation is very fast (<1 second,  
and most of that is ssh negotiation time).

If the slave is completely idle, "zfs receive" takes a fraction of a  
second. If the slave has been very busy (lots of read activity, no  
writes - the slave has everything mounted read only), suddenly "zfs  
receive" can take 30 seconds or more to complete, the whole time it  
has the filesystem locked. For example, I'd see:

49345 root        1  76    0 13600K  1956K zio->i  9   0:01  1.37% zfs
48910 www         1  46    0 36700K 21932K rrl->r  3   0:24  0.00%  
lighttpd
48913 www         1  46    0 41820K 26108K rrl->r  2   0:24  0.00%  
lighttpd
48912 www         1  46    0 37724K 23484K rrl->r  0   0:24  0.00%  
lighttpd
48911 www         1  46    0 41820K 26460K rrl->r 10   0:23  0.00%  
lighttpd
48909 www         1  46    0 39772K 24488K rrl->r  5   0:22  0.00%  
lighttpd
48908 www         1  46    0 36700K 21460K rrl->r 14   0:19  0.00%  
lighttpd
48907 www         1  45    0 30556K 16216K rrl->r 13   0:14  0.00%  
lighttpd
48906 www         1  44    0 26460K 11452K rrl->r  6   0:06  0.00%  
lighttpd

At first, I thought it was possibly cache pressure... when the system  
was busy, whatever data necessary to create a new snapshot was getting  
pushed out of the cache so it had to be re-read. I increased arc_max  
and arc_meta_limit to very high values, and it seemed to have no  
effect, even when arc_meta_used was far below arc_meta_limit.

Disabling cache flushes had no impact. Disabling zil cut the time in  
half, but it's still too long for this application.

ktrace on the "zfs receive" shows:

   1062 zfs      0.000024 CALL  ioctl(0x3,0xcc285a11 ,0x7fffffffa320)
   1062 zfs      0.000081 RET   ioctl 0
   1062 zfs      0.000058 CALL  ioctl(0x3,0xcc285a05 ,0x7fffffffa2f0)
   1062 zfs      0.000037 RET   ioctl 0
   1062 zfs      0.000019 CALL  ioctl(0x3,0xcc285a11 ,0x7fffffffa320)
   1062 zfs      0.000055 RET   ioctl 0
   1062 zfs      0.000031 CALL  ioctl(0x3,0xcc285a11 ,0x7fffffff9f00)
   1062 zfs      0.000053 RET   ioctl 0
   1062 zfs      0.000020 CALL  ioctl(0x3,0xcc285a1c ,0x7fffffffc930)
   1062 zfs      24.837084 RET   ioctl 0
   1062 zfs      0.000028 CALL  ioctl(0x3,0xcc285a11 ,0x7fffffff9f00)
   1062 zfs      0.000074 RET   ioctl 0
   1062 zfs      0.000037 CALL  close(0x6)
   1062 zfs      0.000006 RET   close 0
   1062 zfs      0.000007 CALL  close(0x3)
   1062 zfs      0.000005 RET   close 0

The 24 second call to 0xcc285a1c is ZFS_IOC_RECV, so whatever is going  
on is in the kernel, not a delay in getting the kernel any data.  
"systat" is showing that the drives are 100%  busy during the  
operation, so it's obviously doing something. :)

Does anyone know what "zfs receive" is doing while it has everything  
locked like this, and why a lot of read activity beforehand would  
drastically effect the performance of doing this?

-- Kevin