Date: Mon, 21 Jul 2008 11:23:48 -0400 From: Sven W <sven@dmv.com> To: Pete French <petefrench@ticketswitch.com> Cc: koitsu@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: Multi-machine mirroring choices Message-ID: <4884AA04.7000404@dmv.com> In-Reply-To: <E1KKtFs-000J7B-Ow@dilbert.ticketswitch.com>
index | next in thread | previous in thread | raw e-mail
Pete French presumably uttered the following on 07/21/08 07:08: >> The *big* issue I have right now is dealing with the slave machine going >> down. Once the master no longer has a connection to the ggated devices, >> all processes trying to use the device hang in D status. I have tried >> pkill'ing ggatec to no avail and ggatec destroy returns a message of >> gctl being busy. Trying to ggatec destroy -f panics the machine. > > Oddly enough, this was the issue I had with iscsi which made me move > to using ggated instead. On our machines I use '-t 10' as an argument to > ggatec, and this makes it timeout once the connection has been down for > a certain amount of time. I am using gmirror on top, not ZFS, and this > handled the drive vanishing from the mirror quite happily. I haven't > tried it with ZFS, which may not like having the device suddenly dissapear. > > -pete. What I have found is that the master machine will lock up if the slave disappears during a large file transfer. I tested this by setting up zpool mirror on the master using a ggatec device from the slave. Then I: pkill'ed ggated on the slave machine. dd if=/dev/zero of=/data1/testfile2 bs=16k count=8192 [128MB] on the master The dd command finished and the /var/log/messages showed I/O errors to the slave drive as expected. Messages also showed ggatec trying to reconnect every 10 seconds (ggatec was started with the -t 10 parameter). Finally zfs marked the drive unavailable which then allowed me to ggatec destroy -u 0 without getting the "ioctl(/dev/ggctl): Device busy" error message. (By the way, using ggatec destroy does not kill the "ggatec create" that created the process to begin with, I had to pkill ggatec to get that stop - bug?) The above behavior would be acceptable for multi-machine mirroring as it would be scriptable. The problem comes with Large writes. I tried to repeat the above with dd if=/dev/zero of=/data1/testfile2 bs=16k count=32768 [512MB] which then locks zfs, and ultimately the system itself. It seems once the write size/buffer is full, zfs is unable to fail/unavail the slave drive and the entire system becomes unresponsive (cannot even ssh into it). The bottom line is that without some type of "timeout" or "time to fail" (bad I/O to fail?) zpool + ggate[cd] seems to be an unworkable solution. This is actually a shame as the recover process swapping from master to slave and back again was so much cleaner and faster than using gmirror. Svenhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4884AA04.7000404>
