From owner-freebsd-questions@FreeBSD.ORG Fri Apr 24 18:33:12 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3A771065674 for ; Fri, 24 Apr 2009 18:33:12 +0000 (UTC) (envelope-from psteele@maxiscale.com) Received: from exprod7og111.obsmtp.com (exprod7og111.obsmtp.com [64.18.2.175]) by mx1.freebsd.org (Postfix) with SMTP id 72D918FC1F for ; Fri, 24 Apr 2009 18:33:11 +0000 (UTC) (envelope-from psteele@maxiscale.com) Received: from source ([209.85.146.177]) by exprod7ob111.postini.com ([64.18.6.12]) with SMTP ID DSNKSfIF50wXuAbwt2VFo5wiXlzmqcTY2VsF@postini.com; Fri, 24 Apr 2009 11:33:12 PDT Received: by wa-out-1112.google.com with SMTP id m33so418029wag.26 for ; Fri, 24 Apr 2009 11:33:11 -0700 (PDT) Received: by 10.114.53.1 with SMTP id b1mr1561464waa.29.1240597991242; Fri, 24 Apr 2009 11:33:11 -0700 (PDT) Received: from localhost ([76.231.178.131]) by mx.google.com with ESMTPS id f20sm2107784waf.34.2009.04.24.11.33.10 (version=SSLv3 cipher=RC4-MD5); Fri, 24 Apr 2009 11:33:10 -0700 (PDT) Date: Fri, 24 Apr 2009 11:33:07 -0700 (PDT) From: Peter Steele To: #freebsd-questions Message-ID: <22026228.2921240597983327.JavaMail.HALO$@halo> In-Reply-To: <32442523.2901240597865043.JavaMail.HALO$@halo> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Unexpected gmirror behavior: Is this a bug? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2009 18:33:13 -0000 We had a somewhat startling scenario occur with gmirror. We have systems with four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored slice across all four drives. The ad4 drive failed at one point, due to a simple bad connection in its drive bay. While it was offline, the system was continued to be used for a while and new data was added to the mirrored file system. We eventually took the box down to deal with ad4, and tried simply pulling and reinserting the drive. On reboot we saw that the BIOS detected the drive, so that was good. However, when FreeBSD got to the point of starting up the GEOM driver, instead of reinserting ad4 into the more current mirror consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with the data from ad4, causing the new files we had added to those drives to be lost. This only happens with ad4. If ad6 for example goes offline in the same way, when it is reinserted it does not become the dominant drive and resync its data with the other drives. Rather its data is overwritten with the data from the 3 member mirror, as you'd expect. So, clearly ad4, the first disk, is treated specially. The question is this a bug or a feature? Is there anyway to prevent this behavior? This would be a disastrous thing to happen in the field on one of our customer systems.