From owner-freebsd-stable@FreeBSD.ORG Thu Jul 19 17:26:19 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 52BCB106566C for ; Thu, 19 Jul 2012 17:26:19 +0000 (UTC) (envelope-from prvs=154700e5ee=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id D8F618FC1B for ; Thu, 19 Jul 2012 17:26:18 +0000 (UTC) X-Spam-Processed: mail1.multiplay.co.uk, Thu, 19 Jul 2012 18:25:51 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50020850564.msg for ; Thu, 19 Jul 2012 18:25:50 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=154700e5ee=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: From: "Steven Hartland" To: "James Snow" , References: <20120719152909.GL32960@teardrop.org> Date: Thu, 19 Jul 2012 18:25:56 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: Subject: Re: Checksum errors across ZFS array X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jul 2012 17:26:19 -0000 ----- Original Message ----- From: "James Snow" >I have a ZFS server on which I've seen periodic checksum errors on > almost every drive. While scrubbing the pool last night, it began to > report unrecoverable data errors on a single file. > > I compared an md5 of the supposedly corrupted file to an md5 of the > original copy, stored on different media. They were the same, suggesting > no corruption. ... Had this before, has always turned out to be failing hardware. Its been a mixture of faults for us:- 1. Memory, even though ECC and not reporting failures in use or via memtest. 2. CPU / Northbridge on old AMD's, not 100% sure which. This started as ZFS checksum issues and then weeks / months later resulting in random untraceable panic and watchdog timeouts in bge nic. Disabling the cores on the second CPU fixed this for us on two separate machines e.g. /boot/loader.conf hint.lapic.2.disabled="1" hint.lapic.3.disabled="1" So while ZFS can report errors on files, that aren't errors on the disks themselves and hence the data, as you confirmed, is fine don't ignore it. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.