From owner-freebsd-questions@FreeBSD.ORG Tue Jul 14 11:55:20 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 495C0106566B for ; Tue, 14 Jul 2009 11:55:20 +0000 (UTC) (envelope-from rvdzwet@transip.nl) Received: from relay1.transip.nl (relay1.transip.nl [80.69.67.19]) by mx1.freebsd.org (Postfix) with ESMTP id ECE868FC0A for ; Tue, 14 Jul 2009 11:55:19 +0000 (UTC) (envelope-from rvdzwet@transip.nl) Received: from mailwww.transip.nl (mailwww.transip.nl [80.69.67.60]) by relay1.transip.nl (Postfix) with ESMTP id C5B3823B86D for ; Tue, 14 Jul 2009 13:55:16 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mailwww.transip.nl (Postfix) with ESMTP id 1AD8C3455822 for ; Tue, 14 Jul 2009 13:55:19 +0200 (CEST) X-Virus-Scanned: amavisd-new at mailwww.transip.nl Received: from mailwww.transip.nl ([127.0.0.1]) by localhost (mailwww.transip.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ojG0cDg7iy69 for ; Tue, 14 Jul 2009 13:55:17 +0200 (CEST) Received: from [10.2.7.226] (kantoor.transip.nl [80.69.69.100]) by mailwww.transip.nl (Postfix) with ESMTP id E438A345581B for ; Tue, 14 Jul 2009 13:55:16 +0200 (CEST) Message-ID: <4A5C7223.6070808@transip.nl> Date: Tue, 14 Jul 2009 13:55:15 +0200 From: Rick van der Zwet User-Agent: Thunderbird 2.0.0.22 (X11/20090608) MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <4A533CEA.7000007@transip.nl> In-Reply-To: <4A533CEA.7000007@transip.nl> Content-Type: multipart/mixed; boundary="------------010009030203030108080607" Subject: Re: FreeBSD HA file cluster possibilities X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jul 2009 11:55:20 -0000 This is a multi-part message in MIME format. --------------010009030203030108080607 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit RIck van der Zwet wrote: > I been (re)searching and reading what the options are with regards to > H(igh) A(vailablility) file storage using FreeBSD, but cannot yet find a > proper working solution. Any advice welcome! > > I like to be able to mirror a full identical disk between two server. So > in case of hardware failure of server A (Master). Server B (Slave) > immediately takes over, without any loss of data. The Network > configuration is easy using ucarp/vppr. But the file system is the hard > part. Paths I have investigated: > > a) ggate & gmirror: Export system on Server B to Server A. Use gmirror > on Server A to keep identical disks. When the ggated on Server B > actually goes down, the whole setup freezes, until the ggated is back up > again. Second on network delays gmirror looses, having to sync all over > again. Leaving the machine at risk. The freezing has come to a end, with the patch attached, but is the patch the right way to go (as C coding is not my strongest point)? To test: # Create backup filesystem & export it serverB$ truncate -s100m /root/ha-slave.img serverB$ echo "192.168.33.41 RW /root/ha-slave.img" > /etc/gg.exports serverB$ ggated # Apply attached patch serverA$ cd /usr/src/sbin/ggate/ggatec serverA$ patch < %%ATTACHED_FILE%% serverA$ make clean install # Local file image serverA$ truncate -s 100m /root/ha-master.img serverA$ mdconfig -t vnode -f /root/ha-master.img #Remote file image serverA$ ggatec create 192.168.33.42 /root/ha-slave.img # Mirror building serverA$ gmirror label hamirror ggate0 md0 serverA$ newfs /dev/mirror/hamirror serverA$ mount /dev/mirror/hamirror /mnt Note: if you have _not_ applied the patch and you kill ggated on machineB you will notice machineA freeze when trying to write to something on /mnt or call `gmirror status'. Same applies if you kill ggatec on machineA without patch. Using net/ucarp I detect failures on serverA and terminate ggated and mount the image on serverB. /Rick --------------010009030203030108080607 Content-Type: text/x-patch; name="ggatec.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ggatec.c.diff" --- ggatec.c.orig 2009-07-09 18:27:12.000000000 +0200 +++ ggatec.c 2009-07-14 10:15:34.000000000 +0200 @@ -156,7 +156,7 @@ break; if (data != sizeof(hdr)) { g_gate_log(LOG_ERR, "Lost connection 1."); - reconnect = 1; + reconnect = 0; pthread_kill(recvtd, SIGUSR1); break; } @@ -168,7 +168,7 @@ break; if (data != ggio.gctl_length) { g_gate_log(LOG_ERR, "Lost connection 2 (%zd != %zd).", data, (ssize_t)ggio.gctl_length); - reconnect = 1; + reconnect = 0; pthread_kill(recvtd, SIGUSR1); break; } @@ -177,6 +177,7 @@ } } g_gate_log(LOG_DEBUG, "%s: Died.", __func__); + g_gate_destroy(unit, 1); return (NULL); } @@ -203,7 +204,7 @@ if (data == -1 && errno == EAGAIN) continue; g_gate_log(LOG_ERR, "Lost connection 3."); - reconnect = 1; + reconnect = 0; pthread_kill(sendtd, SIGUSR1); break; } @@ -223,7 +224,7 @@ g_gate_log(LOG_DEBUG, "Received data packet."); if (data != ggio.gctl_length) { g_gate_log(LOG_ERR, "Lost connection 4."); - reconnect = 1; + reconnect = 0; pthread_kill(sendtd, SIGUSR1); break; } @@ -235,6 +236,7 @@ g_gate_ioctl(G_GATE_CMD_DONE, &ggio); } g_gate_log(LOG_DEBUG, "%s: Died.", __func__); + g_gate_destroy(unit, 1); pthread_exit(NULL); } @@ -410,8 +412,7 @@ static void signop(int sig __unused) { - - /* Do nothing. */ + g_gate_destroy(unit,1); } static void @@ -420,6 +421,7 @@ struct g_gate_ctl_cancel ggioc; signal(SIGUSR1, signop); + signal(SIGINT, signop); for (;;) { g_gatec_start(); g_gate_log(LOG_NOTICE, "Disconnected [%s %s]. Connecting...", --------------010009030203030108080607--