From owner-freebsd-fs@FreeBSD.ORG Thu Jun 21 22:52:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D99C2106564A for ; Thu, 21 Jun 2012 22:52:07 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) by mx1.freebsd.org (Postfix) with ESMTP id 8D56E8FC14 for ; Thu, 21 Jun 2012 22:52:07 +0000 (UTC) Received: by qaea16 with SMTP id a16so4073qae.17 for ; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=5rEsbxKoCWQlPlVgNOw1gBbYpYp40Lk18T7OacMLPLA=; b=QRW9LUevtULKy50u9xMPOIiTBL+kUVBQSZg/q2YPkHgyHya1Ec4eewL23s7KXCKyxi rNjkAU/krQOi1VNbBBTwhBqNXbg4QDHovK1OMQux4632/8O3WqnvbBA4KSYgshOzMwKb FFOmkZeRC9TLxYR0fbpXq1hEivgvI2IWWFVlIrbbfX6rnv0nQOyCQUZzE3d8ODZ1pka9 T2HqkKcL1Y0rjUYuhekB3faMoOK5dvnf+E5JnyqcV3FSg7xF7Ex4v2ZD1+zkC88PzCrD z3OiDBVmevCKBIolrIJhbFTjw5InkWb+glZ1jtt7eZ0dDCXuq/4T1LUbrQ5bFsRZ6TMs zd/w== MIME-Version: 1.0 Received: by 10.229.69.31 with SMTP id x31mr15347848qci.101.1340319126615; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) Received: by 10.229.81.1 with HTTP; Thu, 21 Jun 2012 15:52:06 -0700 (PDT) In-Reply-To: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> References: <1953965235.30115.1340315339964.JavaMail.root@sz0192a.westchester.pa.mail.comcast.net> Date: Thu, 21 Jun 2012 15:52:06 -0700 Message-ID: From: Xin LI To: rondzierwa@comcast.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Checksum errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jun 2012 22:52:08 -0000 Hi, On Thu, Jun 21, 2012 at 2:48 PM, wrote: > > ok, i ran a verify on the raid, and it completed, so I believe that, from > the hardware standpoint, da0 should be a functioning, 12TB disk. > > i did a zpool clear and re-ran the scrub, and the results were almost > identical: [...] > config: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 NAME=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 STATE=C2=A0=C2=A0=C2=A0=C2=A0 READ WRITE CKSUM > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool=C2=A0=C2=A0=C2=A0=C2= =A0 ONLINE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0 0 = 6.20K > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 da0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 ONLINE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0 0 12.5K=C2=A0 24K repaired This is very likely be a hardware issue, or a driver issue (less likely, since we have done extensive testing on this RAID card and the problems are believed to fixed years ago). There are however a few erratums from AMD that makes me feel quite concerne= d: http://support.amd.com/us/Embedded_TechDocs/41322.pdf Specifically speaking, #264, #298 seems quite serious. How old is your motherboard BIOS? Are you using ECC memory by the way? > errors: Permanent errors have been detected in the following files: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool/raid:<0x9e241> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 zfsPool/Build:<0x0> > phoenix# > > along with the 6,353 I/O errors, there were over 12,000 checksum mismatch > errors on the console. > > > The recommendation from ZFS is to restore the file in question.=C2=A0 At = this > point, I would just like to delete the two files. > how do i do that? > > its these kind of antics that make me resistant to the thought of allowin= g > ZFS to manage the raid.=C2=A0 it seems to be having problems just managin= g a big > file system.=C2=A0 I don't want it to correct anything, or restore anythi= ng, just > let me delete the files that hurt, fix up the free space list so it doesn= 't > point outside the bounds of the disk, and get on with life. Are you *really* sure that these are files? The second one doesn't seem to be a file, but rather some metadata. If hardware issue have been ruled out, what I would do is to copy data over to a different dataset (e.g. Build.new, then validate the data copied, then destroy the current Build dataset, rename Build.new to Build). > if its finding corrupted files that appear to not have a directory entry > associated with them (unlinked files), why doesn't it just delete them? > fsck asks you if you want to delete unlinked files, why doesn't zfs do th= e > same, or at least give you the option of deleting bad files when it finds > them? Normally, ZFS do tell you which files are corrupted, sometimes it takes time since your file might be present in multiple snapshots, and the current set of utilities only gives you one reference for the file's name, and you may need to remove the file (or the snapshot containing it), scrub, then remove the newly revealed reference, etc. Your case seems to be very serious that I really think there are some metadata corruption, which are serious enough that they are already beyond fix. ZFS replicates metadata into different locations, but that does not prevent it from being corrupted in memory. In these situations you will have to use a backup. > this is causing a lot of down time, and its making linux look very > attractive in my organization. how do I get this untangled short of > reformatting and starting over? Linux does not have comparable end-to-end data validation ability that ZFS offers. Use caution if you go that route. > ron. > > > ________________________________ > From: "Xin LI" > To: rondzierwa@comcast.net > Cc: "Steven Hartland" , freebsd-fs@freebsd.org > Sent: Wednesday, June 20, 2012 6:56:09 PM > > Subject: Re: ZFS Checksum errors > > On Wed, Jun 20, 2012 at 1:55 PM, =C2=A0 wrote: >> Steve. >> >> well, it got done, and it found another anonymous file with errors . any >> idea how to get rid of these? > > Normally you need to "zpool clear zfsPool", and rerun zpool scrub. =C2=A0= If > you see these numbers growing again, it's likely that there are some > other problems with your hardware. =C2=A0The recommended configuration is > to use ZFS to manage disks, or at least split your RAID volumes into > smaller ones by the way, since otherwise the volume is seen as a > "single disk" to ZFS, making it impossible to repair data errors > unless you add additional redundancy (zfs set copies=3D2, etc). > >> >> thanks, >> ron. >> >> >> >> phoenix# zpool status -v zfsPool >> pool: zfsPool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: scrub completed after 8h29m with 6276 errors on Wed Jun 20 16:18:= 01 >> 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> zfsPool ONLINE 0 0 6.17K >> da0 ONLINE 0 0 13.0K 1.34M repaired >> >> errors: Permanent errors have been detected in the following files: >> >> zfsPool/raid:<0x9e241> >> zfsPool/Build:<0x0> >> phoenix# >> >> >> >> >> ----- Original Message ----- >> From: "Steven Hartland" >> To: rondzierwa@comcast.net, freebsd-fs@freebsd.org >> Sent: Wednesday, June 20, 2012 1:58:20 PM >> Subject: Re: ZFS Checksum errors >> >> ----- Original Message ----- >> From: >> .. >> >>> zpool status indicates that a file has errors, but doesn't tell me its >>> name: >>> >>> phoenix# zpool status -v zfsPool >>> pool: zfsPool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: scrub in progress for 5h27m, 18.71% done, 23h42m to go >> >> Try waiting for the scrub to complete and see if its more helpful after >> that. >> >> Regards >> Steve >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirecti= on, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> >> In the event of misdirection, illegible or incomplete transmission pleas= e >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > Xin LI https://www.delphij.net/ > FreeBSD - The Power to Serve! Live free or die --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die