From owner-freebsd-stable@FreeBSD.ORG Sun Mar 11 19:00:11 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76E54106564A for ; Sun, 11 Mar 2012 19:00:11 +0000 (UTC) (envelope-from regnauld@x0.dk) Received: from moof.catpipe.net (moof.catpipe.net [194.28.252.64]) by mx1.freebsd.org (Postfix) with ESMTP id DDF578FC0A for ; Sun, 11 Mar 2012 19:00:10 +0000 (UTC) Received: from localhost (moof.catpipe.net [194.28.252.64]) by localhost.catpipe.net (Postfix) with ESMTP id 10A454CEDD5 for ; Sun, 11 Mar 2012 19:55:01 +0100 (CET) Received: from moof.catpipe.net ([194.28.252.64]) by localhost (moof.catpipe.net [194.28.252.64]) (amavisd-new, port 10024) with ESMTP id 51WVviQWzq2p for ; Sun, 11 Mar 2012 19:54:58 +0100 (CET) Received: from macbook.bluepipe.net (x0.dk [194.19.205.214]) (Authenticated sender: relayuser) by moof.catpipe.net (Postfix) with ESMTPA id 161C54CED91 for ; Sun, 11 Mar 2012 19:54:57 +0100 (CET) Received: by macbook.bluepipe.net (Postfix, from userid 1001) id 90595824269; Sun, 11 Mar 2012 19:54:57 +0100 (CET) Date: Sun, 11 Mar 2012 19:54:57 +0100 From: Phil Regnauld To: freebsd-stable@freebsd.org Message-ID: <20120311185457.GB1684@macbook.bluepipe.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: Darwin 11.3.0 x86_64 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Issue with hast replication X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Mar 2012 19:00:11 -0000 Hi, I've got a fairly simple setup: two hosts running 9.0-R (will upgrade to stable if told to, but want to check here first), ZFS and HAST. HAST is configured to run on top of zvols configured on each host, as illustrated: FS FS +------+ +------+ | hvol | <---- hastd -----> | hvol | +------+ +------+ | zvol | | zvol | +------+ +------+ | zfs | | zfs | +------+ +------+ h1 h2 Connection is gigabit to the same switch. No issues with large TCP transfers such as SCP/FTP. Config is vanilla: # zfs create -V 10G zfs/hvol hast.conf: resource hvol { on h1 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.100 } on h2 { local /dev/zvol/zfs/hvol remote tcp4://192.168.1.200 } } h1 is behaving fine as primary, either with h2 turned off or in init - but as soon as I set the role to secondary for h2, the receiver repeatedly crashes and restarts - see the traces below. I've seen http://lists.freebsd.org/pipermail/freebsd-current/2011-May/024871.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2012-01/msg00510.html ... but in the first case the fix is in 9 since last year, and the second is referring to async replication - I'm using the default (fullsync). hastctl status on the primary shows the dirty size diminishing slowly, but obviously this isn't optimal (and causes freezes on I/O to the primary hvol, causing all kinds of issues with the consumers of the hvol). Any idea ? Am I doing something wrong ? Primary: Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:30 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31642091520, 131072). Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:41 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31649693696, 131072). Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:48 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31691243520, 131072). Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:02:59 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31783256064, 131072). Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:13 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31782731776, 131072). Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:18 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31803441152, 131072). Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:28 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to send request (Cannot allocate memory): WRITE(31881953280, 131072). Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Disconnected from tcp4://192.168.1.200. Mar 11 02:03:42 h1 hastd[2282]: [hvol] (primary) Unable to write synchronization data: Cannot allocate memory. Secondary: Mar 11 01:01:30 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2874, exitcode=75). Mar 11 01:01:38 h2 hastd[2875]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:44 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2875, exitcode=75). Mar 11 01:01:45 h2 hastd[2876]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:01:50 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2876, exitcode=75). Mar 11 01:01:56 h2 hastd[2877]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:01 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2877, exitcode=75). Mar 11 01:02:05 h2 hastd[2878]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:11 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2878, exitcode=75). Mar 11 01:02:15 h2 hastd[2879]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:20 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2879, exitcode=75). Mar 11 01:02:30 h2 hastd[2880]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:34 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2880, exitcode=75). Mar 11 01:02:41 h2 hastd[2881]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:47 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2881, exitcode=75). Mar 11 01:02:48 h2 hastd[2882]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:02:54 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2882, exitcode=75). Mar 11 01:02:59 h2 hastd[2883]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:04 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2883, exitcode=75). Mar 11 01:03:13 h2 hastd[2884]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:17 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2884, exitcode=75). Mar 11 01:03:18 h2 hastd[2885]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:23 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2885, exitcode=75). Mar 11 01:03:28 h2 hastd[2886]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:33 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2886, exitcode=75). Mar 11 01:03:42 h2 hastd[2887]: [hvol] (secondary) Unable to receive request header: Socket is not connected. Mar 11 01:03:48 h2 hastd[2506]: [hvol] (secondary) Worker process exited ungracefully (pid=2887, exitcode=75). Mar 11 01:03:48 h2 hastd[2888]: [hvol] (secondary) Unable to receive request header: Socket is not connected.