From owner-freebsd-fs@FreeBSD.ORG Fri Feb 8 23:15:57 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5FFF16A420; Fri, 8 Feb 2008 23:15:56 +0000 (UTC) (envelope-from gcorcoran@rcn.com) Received: from smtp02.lnh.mail.rcn.net (smtp02.lnh.mail.rcn.net [207.172.157.102]) by mx1.freebsd.org (Postfix) with ESMTP id A2A9B13C45B; Fri, 8 Feb 2008 23:15:55 +0000 (UTC) (envelope-from gcorcoran@rcn.com) Received: from mr02.lnh.mail.rcn.net ([207.172.157.22]) by smtp02.lnh.mail.rcn.net with ESMTP; 08 Feb 2008 17:46:28 -0500 Received: from smtp01.lnh.mail.rcn.net (smtp01.lnh.mail.rcn.net [207.172.4.11]) by mr02.lnh.mail.rcn.net (MOS 3.8.6-GA) with ESMTP id OKA67028; Fri, 8 Feb 2008 17:46:23 -0500 (EST) Received: from 207-172-55-230.c3-0.tlg-ubr5.atw-tlg.pa.cable.rcn.com (HELO [10.56.78.161]) ([207.172.55.230]) by smtp01.lnh.mail.rcn.net with ESMTP; 08 Feb 2008 17:45:19 -0500 Message-ID: <47ACDBB1.40604@rcn.com> Date: Fri, 08 Feb 2008 17:46:09 -0500 From: Gary Corcoran User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Joe Peterson References: <47ACD7D4.5050905@skyrush.com> In-Reply-To: <47ACD7D4.5050905@skyrush.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Junkmail-Whitelist: YES (by domain whitelist at mr02.lnh.mail.rcn.net) Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Analysis of disk file block with ZFS checksum error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Feb 2008 23:15:57 -0000 Joe Peterson wrote: > In my experimentation with the ZFS filesystem, I encountered one case of > a file block with a checksum mismatch. Doing a "zpool scrub" revealed > it, and trying to read the file yielded an error - only the part of the > file before the bad block was read (ZFS aborts reading at this point, > which makes sense), resulting in a short file. The reason the CKSUM > error is not fixable is because my ZFS pool contains only one device (no > mirror or RAIDZ), but I do have the original/good version of the file > affected. Here's the output of zpool status (new scrub in process): > > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub in progress, 64.36% done, 0h18m to go > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 2 > hda6 ONLINE 0 0 2 > > errors: Permanent errors have been detected in the following files: > > /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/ > Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3 > > > I was curious about what actually happened: was this a ZFS bug, trouble > with its metadata, or truly a bad block? In order to determine this, I > modified ZFS's source code temporarily to ignore the checksum mismatch > and let the file read fully. What I then got was the full-length file > and no errors, showing that there were no disk read errors associated > with the read (I already had assumed this from the fact that zpool > status showed only a non-zero CKSUM count), however, I may have seen > other error counts previously (ZFS resets them to zero on, e.g., > reboot). I received no errors when originally copying this file *to* > the ZFS pool - only on subsequent reads/scrubs. > > (Note that I have posted before about DMA errors in my log for the disk > I am using, but I have had nothing but successful SeaTools tests > (surface scans) of the drive. Jeremy Chadwick had similar issues, as > did others, so I think it is worth investigating if there is some > OS/software cause rather than real HW issues. This is one reason I > wanted to investigate my ZFS checksum issue more deeply.) > > I also have a good backup of the file in question, so I now have two > copies of the file: one good, and one with a bad block. The file is > 3575936 bytes long, and recordsize (in ZFS) is 128K, making the file > about 27 blocks long. Curiously, the bad section of the file is exactly > 65536 bytes long (1/2 a block). The bad block starts at exactly the 5th > 128K block (byte 65536 or hex a0000). > > I wanted to see the characteristics of the bad data. Was just one bit > flipped randomly? No. It is just one bit or set of bits in the bytes > that are affected? It doesn't seem so. Were there any other stange > patterns here? Well, yes, and maybe someout out there with more > knowledge/experience in disk modes of failure will recognize something > (I have included some data below). > > For one thing (as I mentioned), only 65536 bytes are bad (and it's > exactly this many, with a few "good" bytes thrown in, but not far from > what matches random chance would produce. Also, all bad bytes have a > zero in the high bit - interesting? Also, near the end of the block, > the bad bytes all go to zero, strangely coincident with the first "good" > zero in that bad block - not sure if that's coincidence or not. Also, I > calculated the number of "Bits same" (matching bits) in the good vs. bad > bytes, and it appears fairly random, so it appears that the bad bytes > are very random in nature and not correlated much at all with the good > bytes. > > So except for the fact that the 2nd half (65536 bytes) of the ZFS block > are good, the bad block seems to consist of random data, except for the > string of zero bytes near the end and the zero high-bit. It's not as if > one bit on the disk flipped - it affects the whole (1/2) block. Does > this seem like a disk error, controller error/bug, cable problem (I > recently put a new cable on, so I doubt this). It seems to me something > more systemic rather than a random bit error - opinions are more than > welcome. > > Here is some info from a python program I wrote to look at the data > (I've left out spans of essentially uninteresting portions showing > similar stuff, but I can get you the whole thing if interested): > > File pos Good Bad Match Good (bin) Bad (bin) Bits same > 0009fff0 d9 d9 Yes 11011001 11011001 8 > 0009fff1 05 05 Yes 00000101 00000101 8 > 0009fff2 c1 c1 Yes 11000001 11000001 8 > 0009fff3 81 81 Yes 10000001 10000001 8 > 0009fff4 5f 5f Yes 01011111 01011111 8 > 0009fff5 66 66 Yes 01100110 01100110 8 > 0009fff6 5e 5e Yes 01011110 01011110 8 > 0009fff7 a1 a1 Yes 10100001 10100001 8 > 0009fff8 ca ca Yes 11001010 11001010 8 > 0009fff9 9d 9d Yes 10011101 10011101 8 > 0009fffa 00 00 Yes 00000000 00000000 8 > 0009fffb 90 90 Yes 10010000 10010000 8 > 0009fffc 32 32 Yes 00110010 00110010 8 > 0009fffd 62 62 Yes 01100010 01100010 8 > 0009fffe a8 a8 Yes 10101000 10101000 8 > 0009ffff b2 b2 Yes 10110010 10110010 8 > --- Start of bad block --- > 000a0000 d1 24 No 11010001 00100100 2 > 000a0001 6b 7b No 01101011 01111011 7 > 000a0002 d1 31 No 11010001 00110001 5 > 000a0003 56 33 No 01010110 00110011 4 > 000a0004 44 38 No 01000100 00111000 3 > 000a0005 c3 41 No 11000011 01000001 6 > 000a0006 df 46 No 11011111 01000110 4 > 000a0007 07 45 No 00000111 01000101 6 > 000a0008 4c 7b No 01001100 01111011 3 > 000a0009 a0 40 No 10100000 01000000 5 > 000a000a 54 0a No 01010100 00001010 3 > 000a000b 35 40 No 00110101 01000000 3 > 000a000c 88 24 No 10001000 00100100 4 > 000a000d 38 24 No 00111000 00100100 5 > 000a000e f5 7d No 11110101 01111101 6 > 000a000f 28 31 No 00101000 00110001 5 > . > . > . > 000af6c1 d3 33 No 11010011 00110011 5 > 000af6c2 97 39 No 10010111 00111001 3 > 000af6c3 a5 32 No 10100101 00110010 3 > 000af6c4 6a 41 No 01101010 01000001 4 > 000af6c5 16 39 No 00010110 00111001 3 > 000af6c6 f2 7d No 11110010 01111101 3 > 000af6c7 21 40 No 00100001 01000000 5 > 000af6c8 52 0a No 01010010 00001010 5 > 000af6c9 00 00 Yes 00000000 00000000 8 > 000af6ca 2c 00 No 00101100 00000000 5 > 000af6cb 42 00 No 01000010 00000000 6 > 000af6cc 31 00 No 00110001 00000000 5 > 000af6cd a1 00 No 10100001 00000000 5 > 000af6ce d1 00 No 11010001 00000000 4 > 000af6cf 90 00 No 10010000 00000000 6 > 000af6d0 9c 00 No 10011100 00000000 4 > . > . > . > 000afff8 26 00 No 00100110 00000000 5 > 000afff9 8c 00 No 10001100 00000000 5 > 000afffa a8 00 No 10101000 00000000 5 > 000afffb 0c 00 No 00001100 00000000 6 > 000afffc f1 00 No 11110001 00000000 3 > 000afffd 93 00 No 10010011 00000000 4 > 000afffe 2c 00 No 00101100 00000000 5 > 000affff 2e 00 No 00101110 00000000 4 > --- End of bad block --- > 000b0000 62 62 Yes 01100010 01100010 8 > 000b0001 56 56 Yes 01010110 01010110 8 > 000b0002 91 91 Yes 10010001 10010001 8 > 000b0003 04 04 Yes 00000100 00000100 8 > 000b0004 01 01 Yes 00000001 00000001 8 > 000b0005 2d 2d Yes 00101101 00101101 8 > 000b0006 0e 0e Yes 00001110 00001110 8 > 000b0007 89 89 Yes 10001001 10001001 8 > 000b0008 8a 8a Yes 10001010 10001010 8 > 000b0009 ad ad Yes 10101101 10101101 8 > 000b000a 4e 4e Yes 01001110 01001110 8 > 000b000b a3 a3 Yes 10100011 10100011 8 > 000b000c 13 13 Yes 00010011 00010011 8 > 000b000d 4d 4d Yes 01001101 01001101 8 > 000b000e 07 07 Yes 00000111 00000111 8 > 000b000f 66 66 Yes 01100110 01100110 8 > . > . > . A wild guess: bytes all with the top bit zero, plus I see occasional 0x0a bytes in there - it looks like that 1/2 block might have gotten overwritten with some ASCII text. Don't ask me how though... (but if it did, I am afraid I would suspect a bug in ZFS :-( ) Gary