From owner-freebsd-stable@FreeBSD.ORG Fri Jan 6 22:43:33 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11A31106564A for ; Fri, 6 Jan 2012 22:43:33 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id B39878FC14 for ; Fri, 6 Jan 2012 22:43:32 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta04.westchester.pa.mail.comcast.net with comcast id JEyg1i0071YDfWL54NjYGe; Fri, 06 Jan 2012 22:43:32 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id JNjY1i0091t3BNj3gNjYHy; Fri, 06 Jan 2012 22:43:32 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id AF3BC102C1E; Fri, 6 Jan 2012 14:43:30 -0800 (PST) Date: Fri, 6 Jan 2012 14:43:30 -0800 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20120106224330.GA26856@icarus.home.lan> References: <20120104194313.GA2558@lordcow.org> <4F0573B2.9070301@infracaninophile.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F0573B2.9070301@infracaninophile.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: gmirror not synced X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jan 2012 22:43:33 -0000 On Thu, Jan 05, 2012 at 09:56:02AM +0000, Matthew Seaman wrote: > On 04/01/2012 19:43, Gareth de Vaux wrote: > > Hi all, I've noticed that the md5 hashes of a couple of files on > > a gmirror change when I recalculate the hashes. The output usually > > cycles between 2 hashes per file. > > > > I'm guessing this is because each calculation reads the file > > randomly from 1 of 2 component drives, and the files in question > > had a few bit flips during their original sync. I also assume > > this's something you have to live with for gmirror? Is removing > > and completely rebuilding the secondary drive the only thing you > > can do (which might fix these bit flips but incur others elsewhere)? > > No, that's not something acceptable at all. Randomly flipping bits in > files is a really nasty failure mode. > > What does 'gmirror list' tell you about the state of the gmirror? Is > there any possibility that your hardware is failing? Check the SMART > attributes of the disk in the first instance (it isn't brilliant for > picking up impending failure, but it should be pretty accurate once the > drive is actually generating errors.) Also try a few passes of > memtest86 to try and spot problems with RAM. Cleaning dust out of air > vents and heatsinks and generally making sure the machine is not > overheating is a good idea too. Another possibility is a disk with intermittently faulty cache, or a drive who has basically given up (firmware bug, design flaw, etc.) honouring ECC[1][2] when reading/writing sectors. For the former point, SMART statistics from the drives could help determine if this is the case, but I stress the word "could". This is usually stored in Attribute 184 ("End-to-End_Error") but is not available on very many drives. Gareth, please install ports/sysutils/smartmontools (make sure it's version 5.42 or newer) and provide output from "smartctl -x /dev/disk" and I'll review it for you. [1]: http://www.storagereview.com/guide/error.html (read all subsections too) [2]: http://www.dewassoc.com/kbase/hard_drives/hard_disk_sector_structures.htm -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |