From owner-freebsd-stable@FreeBSD.ORG Thu Dec 13 12:12:25 2012 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D7B0A9E7 for ; Thu, 13 Dec 2012 12:12:25 +0000 (UTC) (envelope-from victor@bsdes.net) Received: from equilibrium.bsdes.net (244.Red-217-126-240.staticIP.rima-tde.net [217.126.240.244]) by mx1.freebsd.org (Postfix) with ESMTP id 79D258FC17 for ; Thu, 13 Dec 2012 12:12:23 +0000 (UTC) Received: by equilibrium.bsdes.net (Postfix, from userid 1001) id 857C639847; Thu, 13 Dec 2012 13:05:32 +0100 (CET) Date: Thu, 13 Dec 2012 13:05:32 +0100 From: Victor Balada Diaz To: stable@freebsd.org Subject: gjournal + HAST data lost Message-ID: <20121213120532.GW1414@equilibrium.bsdes.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2012 12:12:25 -0000 Hello, We've experienced a weird data "rollback" on our NFS servers. We have two NFS servers. Both running 8.3-RELEASE-p3. We've setup HAST for one partition between both of them. To be able to switch fast we configured gjournal on top of HAST. At the time there was no UFS+J. Yesterday one of the servers crashed and CARP changed the slave to master. During that operation we got the following error: GEOM_JOURNAL: Journal 2180207123: hast/shared contains data. GEOM_JOURNAL: Journal 2180207123: hast/shared contains journal. GEOM_JOURNAL: Cannot decode journal header from hast/shared. GEOM_JOURNAL: Journal on hast/shared is broken/corrupted. Initializing. GEOM_JOURNAL: clean=1 flags=0x40 GEOM_JOURNAL: File system hast/shared marked as dirty. Did a full fsck and no errors were detected. The filesystem was working again. After looking at the data we saw that all the files in the last days were missing. Like if both servers were disconnected, but that didn't happen. Even more: after our first NFS server was up again, no split-brain condition was detected. We're sure the first NFS server was working because all of the data is on the backup servers. So it's not like the data never got written. What could explain that data rollback? If gjournal's journal is lost it's possible to lose the data of a few days ago? Is not recommended to use gjournal with HAST? Thanks a lot. Regards. Victor. hast.conf: replication fullsync #compression lzf #checksum sha256 on nfs01 { listen 192.168.23.81 } on nfs02 { listen 192.168.23.82 } resource shared { name shared local /dev/mirror/oss1g on nfs01 { remote 192.168.23.82 } on nfs02 { remote 192.168.23.81 } } -- La prueba más fehaciente de que existe vida inteligente en otros planetas, es que no han intentado contactar con nosotros.