From owner-freebsd-stable@FreeBSD.ORG Mon Jul 21 15:23:58 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA6EE1065674; Mon, 21 Jul 2008 15:23:58 +0000 (UTC) (envelope-from sven@dmv.com) Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41]) by mx1.freebsd.org (Postfix) with ESMTP id 76F6E8FC12; Mon, 21 Jul 2008 15:23:58 +0000 (UTC) (envelope-from sven@dmv.com) Received: from mail-gw-cl-b.dmv.com (mail-gw-cl-b.dmv.com [216.240.97.39]) by smtp-gw-cl-c.dmv.com (8.12.10/8.12.10) with ESMTP id m6LFNtu6031378; Mon, 21 Jul 2008 11:23:55 -0400 (EDT) (envelope-from sven@dmv.com) Received: from [192.168.0.101] (c-71-200-111-79.hsd1.md.comcast.net [71.200.111.79]) (authenticated bits=0) by mail-gw-cl-b.dmv.com (8.12.9/8.12.9) with ESMTP id m6LFNmee000605; Mon, 21 Jul 2008 11:23:49 -0400 (EDT) (envelope-from sven@dmv.com) Message-ID: <4884AA04.7000404@dmv.com> Date: Mon, 21 Jul 2008 11:23:48 -0400 From: Sven W User-Agent: Thunderbird 2.0.0.14 (X11/20080508) MIME-Version: 1.0 To: Pete French References: In-Reply-To: X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 X-Scanned-By: MIMEDefang 2.48 on 216.240.97.39 Cc: koitsu@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Multi-machine mirroring choices X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2008 15:23:59 -0000 Pete French presumably uttered the following on 07/21/08 07:08: >> The *big* issue I have right now is dealing with the slave machine going >> down. Once the master no longer has a connection to the ggated devices, >> all processes trying to use the device hang in D status. I have tried >> pkill'ing ggatec to no avail and ggatec destroy returns a message of >> gctl being busy. Trying to ggatec destroy -f panics the machine. > > Oddly enough, this was the issue I had with iscsi which made me move > to using ggated instead. On our machines I use '-t 10' as an argument to > ggatec, and this makes it timeout once the connection has been down for > a certain amount of time. I am using gmirror on top, not ZFS, and this > handled the drive vanishing from the mirror quite happily. I haven't > tried it with ZFS, which may not like having the device suddenly dissapear. > > -pete. What I have found is that the master machine will lock up if the slave disappears during a large file transfer. I tested this by setting up zpool mirror on the master using a ggatec device from the slave. Then I: pkill'ed ggated on the slave machine. dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192 [128MB] on the master The dd command finished and the /var/log/messages showed I/O errors to the slave drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds (ggatec was started with the -t 10 parameter). Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u 0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way, using ggatec destroy does not kill the "ggatec create" that created the process to begin with, I had to pkill ggatec to get that stop - bug?) The above behavior would be acceptable for multi-machine mirroring as it would be scriptable. The problem comes with Large writes. I tried to repeat the above with dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB] which then locks zfs, and ultimately the system itself. It seems once the write size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire system becomes unresponsive (cannot even ssh into it). The bottom line is that without some type of "timeout" or "time to fail" (bad I/O to fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a shame as the recover process swapping from master to slave and back again was so much cleaner and faster than using gmirror. Sven