Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Jul 2008 11:23:48 -0400
From:      Sven W <sven@dmv.com>
To:        Pete French <petefrench@ticketswitch.com>
Cc:        koitsu@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: Multi-machine mirroring choices
Message-ID:  <4884AA04.7000404@dmv.com>
In-Reply-To: <E1KKtFs-000J7B-Ow@dilbert.ticketswitch.com>
References:  <E1KKtFs-000J7B-Ow@dilbert.ticketswitch.com>

next in thread | previous in thread | raw e-mail | index | archive | help


Pete French presumably uttered the following on 07/21/08 07:08:
>> The *big* issue I have right now is dealing with the slave machine going
>> down. Once the master no longer has a connection to the ggated devices,
>> all processes trying to use the device hang in D status. I have tried
>> pkill'ing ggatec to no avail and ggatec destroy returns a message of
>> gctl being busy. Trying to ggatec destroy -f panics the machine.
> 
> Oddly enough, this was the issue I had with iscsi which made me move
> to using ggated instead. On our machines I use '-t 10' as an argument to
> ggatec, and this makes it timeout once the connection has been down for
> a certain amount of time. I am using gmirror on top, not ZFS, and this
> handled the drive vanishing from the mirror quite happily. I haven't
> tried it with ZFS, which may not like having the device suddenly dissapear.
> 
> -pete.

What I have found is that the master machine will lock up if the slave disappears 
during a large file transfer. I tested this by setting up zpool mirror on the master 
using a ggatec device from the slave. Then I:

pkill'ed ggated on the slave machine.

dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192   [128MB] on the master

The dd command finished and the /var/log/messages showed I/O errors to the slave 
drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds 
(ggatec was started with the -t 10 parameter).

Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u 
0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way, 
using ggatec destroy does not kill the "ggatec create" that created the process to 
begin with, I had to pkill ggatec to get that stop - bug?)

The above behavior would be acceptable for multi-machine mirroring as it would be 
scriptable.

The problem comes with Large writes. I tried to repeat the above with

dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB]

which then locks zfs,  and ultimately the system itself. It seems once the write 
size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire 
system becomes unresponsive (cannot even ssh into it).

The bottom line is that without some type of "timeout" or "time to fail" (bad I/O to 
fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a 
shame as the recover process swapping from master to slave and back again was so 
much cleaner and faster than using gmirror.


Sven



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4884AA04.7000404>