Date: Mon, 21 Jul 2008 11:23:48 -0400 From: Sven W <sven@dmv.com> To: Pete French <petefrench@ticketswitch.com> Cc: koitsu@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Multi-machine mirroring choices Message-ID: <4884AA04.7000404@dmv.com> In-Reply-To: <E1KKtFs-000J7B-Ow@dilbert.ticketswitch.com> References: <E1KKtFs-000J7B-Ow@dilbert.ticketswitch.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Pete French presumably uttered the following on 07/21/08 07:08: >> The *big* issue I have right now is dealing with the slave machine going >> down. Once the master no longer has a connection to the ggated devices, >> all processes trying to use the device hang in D status. I have tried >> pkill'ing ggatec to no avail and ggatec destroy returns a message of >> gctl being busy. Trying to ggatec destroy -f panics the machine. > > Oddly enough, this was the issue I had with iscsi which made me move > to using ggated instead. On our machines I use '-t 10' as an argument to > ggatec, and this makes it timeout once the connection has been down for > a certain amount of time. I am using gmirror on top, not ZFS, and this > handled the drive vanishing from the mirror quite happily. I haven't > tried it with ZFS, which may not like having the device suddenly dissapear. > > -pete. What I have found is that the master machine will lock up if the slave disappears during a large file transfer. I tested this by setting up zpool mirror on the master using a ggatec device from the slave. Then I: pkill'ed ggated on the slave machine. dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192 [128MB] on the master The dd command finished and the /var/log/messages showed I/O errors to the slave drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds (ggatec was started with the -t 10 parameter). Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u 0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way, using ggatec destroy does not kill the "ggatec create" that created the process to begin with, I had to pkill ggatec to get that stop - bug?) The above behavior would be acceptable for multi-machine mirroring as it would be scriptable. The problem comes with Large writes. I tried to repeat the above with dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB] which then locks zfs, and ultimately the system itself. It seems once the write size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire system becomes unresponsive (cannot even ssh into it). The bottom line is that without some type of "timeout" or "time to fail" (bad I/O to fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a shame as the recover process swapping from master to slave and back again was so much cleaner and faster than using gmirror. Sven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4884AA04.7000404>