From owner-freebsd-fs@FreeBSD.ORG Sun Jan 24 00:40:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DB131065670 for ; Sun, 24 Jan 2010 00:40:19 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id BD9B68FC1A for ; Sun, 24 Jan 2010 00:40:18 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-214-156.shv.bellsouth.net [98.67.214.156]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 7F59B90A7B00; Sat, 23 Jan 2010 18:40:17 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id o0O0eDcq005533; Sat, 23 Jan 2010 18:40:14 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Sat, 23 Jan 2010 18:40:13 -0600 (CST) From: Wes Morgan X-X-Sender: morganw@volatile To: Rich In-Reply-To: <5da0588e1001231541l246769eao410c5ea6ccca0de4@mail.gmail.com> Message-ID: References: <5da0588e1001222223m773648am907267235bdcf882@mail.gmail.com> <5da0588e1001230014k1b8a32f8v42046497265429ed@mail.gmail.com> <5da0588e1001231415t403f29ceq6e8dcd16edb4a28@mail.gmail.com> <5da0588e1001231541l246769eao410c5ea6ccca0de4@mail.gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="3224958491-39306717-1264292171=:2160" Content-ID: X-Virus-Scanned: clamav-milter 0.95.3 at warped X-Virus-Status: Clean Cc: freebsd-fs Subject: Re: Errors on a file on a zpool: How to remove? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jan 2010 00:40:19 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --3224958491-39306717-1264292171=:2160 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15 Content-Transfer-Encoding: 8BIT Content-ID: On Sat, 23 Jan 2010, Rich wrote: > I have no files named 0x0. > > I have a number of files which, on attempting to do anything to them > (stat, mv, rm), EIO occurs, the checksum error number on three of the > disks in that pool ticks up, and /var/log/messages reports what I > reported in my initial post. (i discovered this due to FreeBSD's daily > check-for-setuid-bits-in-strange-places find command reporting EIO on > some files.) > > My original post in this thread is about how to resolve this. Do these bad files show up on "zpool status -v" after a scrub? This really sounds much more like an issue of corrupt metadata. ZFS keeps multiple copies of filesystem metadata even on non-redundant pools (ditto blocks). You said there was bad ram in this machine at one point, which may mean that *all* of the metadata was corrupt. In my encounter with a bad stick of ram, the data was correct but the stored checksums were wrong. I was able to "recover" the data by simply changing zfs_read() to not report EIO when it encounters an ECKSUM error from the zfs layer -- essentially ignoring the checksum error. I have no idea what this might do if the metadata itself is corrupt, so that could be risky. Another option is the zdb solution mentioned earlier. > > On Sat, Jan 23, 2010 at 6:34 PM, Wes Morgan wrote: > > On Sat, 23 Jan 2010, Rich wrote: > > > >> On Sat, Jan 23, 2010 at 4:21 PM, Wes Morgan wrote: > >> > On Sat, 23 Jan 2010, Rich wrote: > >> > > >> >> I already diagnosed the bad hardware - one of the two sticks of RAM > >> >> had gone bad, and fails memtest in the other machine. > >> >> > >> >>   pool: rigatoni > >> >>  state: ONLINE > >> >> status: One or more devices has experienced an error resulting in data > >> >>       corruption.  Applications may be affected. > >> >> action: Restore the file in question if possible.  Otherwise restore the > >> >>       entire pool from backup. > >> >>    see: http://www.sun.com/msg/ZFS-8000-8A > >> >>  scrub: scrub completed after 15h28m with 1 errors on Thu Jan 21 18:09:25 2010 > >> >> config: > >> >> > >> >>       NAME        STATE     READ WRITE CKSUM > >> >>       rigatoni    ONLINE       0     0     1 > >> >>         da4       ONLINE       0     0     2 > >> >>         da5       ONLINE       0     0     2 > >> >>         da7       ONLINE       0     0     0 > >> >>         da6       ONLINE       0     0     0 > >> >>         da2       ONLINE       0     0     2 > >> >> > >> >> errors: Permanent errors have been detected in the following files: > >> >> > >> >>         rigatoni/mirrors:<0x0> > >> > > >> > Can you post your entire pool filesystem structure? That message above > >> > looks like an unreferenced block or corrupted metadata rather than an > >> > actual file. Also, if it's part of a snapshot, you simply have to destroy > >> > the snapshot. > >> > > >> > I had a pool become corrupted due to bad memory, and all of the files were > >> > still able to be manipulated. The only time EIO popped up was on the > >> > specific block that had a checksum error. > >> > >> # zfs list -r -t all rigatoni > >> NAME                  USED  AVAIL  REFER  MOUNTPOINT > >> rigatoni             5.73T   984G    19K  /rigatoni > >> rigatoni/logs_bitch   269M   984G   269M  /rigatoni/logs_bitch > >> rigatoni/mirrors     5.73T   984G  5.73T  /mirrors > >> > >> No snapshots here. :/ > >> > >> EIO only pops up on the files I mentioned above - everything else in > >> those directories, including renaming that directory, is fine. > > > > I must have missed it, what files is it showing besides the <0x0> address? > > Or do you have a file named "<0x0>"? > > > > --3224958491-39306717-1264292171=:2160--