Date: Fri, 08 Feb 2008 15:29:40 -0700 From: Joe Peterson <joe@skyrush.com> To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Analysis of disk file block with ZFS checksum error Message-ID: <47ACD7D4.5050905@skyrush.com>
next in thread | raw e-mail | index | archive | help
In my experimentation with the ZFS filesystem, I encountered one case of a file block with a checksum mismatch. Doing a "zpool scrub" revealed it, and trying to read the file yielded an error - only the part of the file before the bad block was read (ZFS aborts reading at this point, which makes sense), resulting in a short file. The reason the CKSUM error is not fixable is because my ZFS pool contains only one device (no mirror or RAIDZ), but I do have the original/good version of the file affected. Here's the output of zpool status (new scrub in process): pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress, 64.36% done, 0h18m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 2 hda6 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/ Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3 I was curious about what actually happened: was this a ZFS bug, trouble with its metadata, or truly a bad block? In order to determine this, I modified ZFS's source code temporarily to ignore the checksum mismatch and let the file read fully. What I then got was the full-length file and no errors, showing that there were no disk read errors associated with the read (I already had assumed this from the fact that zpool status showed only a non-zero CKSUM count), however, I may have seen other error counts previously (ZFS resets them to zero on, e.g., reboot). I received no errors when originally copying this file *to* the ZFS pool - only on subsequent reads/scrubs. (Note that I have posted before about DMA errors in my log for the disk I am using, but I have had nothing but successful SeaTools tests (surface scans) of the drive. Jeremy Chadwick had similar issues, as did others, so I think it is worth investigating if there is some OS/software cause rather than real HW issues. This is one reason I wanted to investigate my ZFS checksum issue more deeply.) I also have a good backup of the file in question, so I now have two copies of the file: one good, and one with a bad block. The file is 3575936 bytes long, and recordsize (in ZFS) is 128K, making the file about 27 blocks long. Curiously, the bad section of the file is exactly 65536 bytes long (1/2 a block). The bad block starts at exactly the 5th 128K block (byte 65536 or hex a0000). I wanted to see the characteristics of the bad data. Was just one bit flipped randomly? No. It is just one bit or set of bits in the bytes that are affected? It doesn't seem so. Were there any other stange patterns here? Well, yes, and maybe someout out there with more knowledge/experience in disk modes of failure will recognize something (I have included some data below). For one thing (as I mentioned), only 65536 bytes are bad (and it's exactly this many, with a few "good" bytes thrown in, but not far from what matches random chance would produce. Also, all bad bytes have a zero in the high bit - interesting? Also, near the end of the block, the bad bytes all go to zero, strangely coincident with the first "good" zero in that bad block - not sure if that's coincidence or not. Also, I calculated the number of "Bits same" (matching bits) in the good vs. bad bytes, and it appears fairly random, so it appears that the bad bytes are very random in nature and not correlated much at all with the good bytes. So except for the fact that the 2nd half (65536 bytes) of the ZFS block are good, the bad block seems to consist of random data, except for the string of zero bytes near the end and the zero high-bit. It's not as if one bit on the disk flipped - it affects the whole (1/2) block. Does this seem like a disk error, controller error/bug, cable problem (I recently put a new cable on, so I doubt this). It seems to me something more systemic rather than a random bit error - opinions are more than welcome. Here is some info from a python program I wrote to look at the data (I've left out spans of essentially uninteresting portions showing similar stuff, but I can get you the whole thing if interested): File pos Good Bad Match Good (bin) Bad (bin) Bits same 0009fff0 d9 d9 Yes 11011001 11011001 8 0009fff1 05 05 Yes 00000101 00000101 8 0009fff2 c1 c1 Yes 11000001 11000001 8 0009fff3 81 81 Yes 10000001 10000001 8 0009fff4 5f 5f Yes 01011111 01011111 8 0009fff5 66 66 Yes 01100110 01100110 8 0009fff6 5e 5e Yes 01011110 01011110 8 0009fff7 a1 a1 Yes 10100001 10100001 8 0009fff8 ca ca Yes 11001010 11001010 8 0009fff9 9d 9d Yes 10011101 10011101 8 0009fffa 00 00 Yes 00000000 00000000 8 0009fffb 90 90 Yes 10010000 10010000 8 0009fffc 32 32 Yes 00110010 00110010 8 0009fffd 62 62 Yes 01100010 01100010 8 0009fffe a8 a8 Yes 10101000 10101000 8 0009ffff b2 b2 Yes 10110010 10110010 8 --- Start of bad block --- 000a0000 d1 24 No 11010001 00100100 2 000a0001 6b 7b No 01101011 01111011 7 000a0002 d1 31 No 11010001 00110001 5 000a0003 56 33 No 01010110 00110011 4 000a0004 44 38 No 01000100 00111000 3 000a0005 c3 41 No 11000011 01000001 6 000a0006 df 46 No 11011111 01000110 4 000a0007 07 45 No 00000111 01000101 6 000a0008 4c 7b No 01001100 01111011 3 000a0009 a0 40 No 10100000 01000000 5 000a000a 54 0a No 01010100 00001010 3 000a000b 35 40 No 00110101 01000000 3 000a000c 88 24 No 10001000 00100100 4 000a000d 38 24 No 00111000 00100100 5 000a000e f5 7d No 11110101 01111101 6 000a000f 28 31 No 00101000 00110001 5 . . . 000af6c1 d3 33 No 11010011 00110011 5 000af6c2 97 39 No 10010111 00111001 3 000af6c3 a5 32 No 10100101 00110010 3 000af6c4 6a 41 No 01101010 01000001 4 000af6c5 16 39 No 00010110 00111001 3 000af6c6 f2 7d No 11110010 01111101 3 000af6c7 21 40 No 00100001 01000000 5 000af6c8 52 0a No 01010010 00001010 5 000af6c9 00 00 Yes 00000000 00000000 8 000af6ca 2c 00 No 00101100 00000000 5 000af6cb 42 00 No 01000010 00000000 6 000af6cc 31 00 No 00110001 00000000 5 000af6cd a1 00 No 10100001 00000000 5 000af6ce d1 00 No 11010001 00000000 4 000af6cf 90 00 No 10010000 00000000 6 000af6d0 9c 00 No 10011100 00000000 4 . . . 000afff8 26 00 No 00100110 00000000 5 000afff9 8c 00 No 10001100 00000000 5 000afffa a8 00 No 10101000 00000000 5 000afffb 0c 00 No 00001100 00000000 6 000afffc f1 00 No 11110001 00000000 3 000afffd 93 00 No 10010011 00000000 4 000afffe 2c 00 No 00101100 00000000 5 000affff 2e 00 No 00101110 00000000 4 --- End of bad block --- 000b0000 62 62 Yes 01100010 01100010 8 000b0001 56 56 Yes 01010110 01010110 8 000b0002 91 91 Yes 10010001 10010001 8 000b0003 04 04 Yes 00000100 00000100 8 000b0004 01 01 Yes 00000001 00000001 8 000b0005 2d 2d Yes 00101101 00101101 8 000b0006 0e 0e Yes 00001110 00001110 8 000b0007 89 89 Yes 10001001 10001001 8 000b0008 8a 8a Yes 10001010 10001010 8 000b0009 ad ad Yes 10101101 10101101 8 000b000a 4e 4e Yes 01001110 01001110 8 000b000b a3 a3 Yes 10100011 10100011 8 000b000c 13 13 Yes 00010011 00010011 8 000b000d 4d 4d Yes 01001101 01001101 8 000b000e 07 07 Yes 00000111 00000111 8 000b000f 66 66 Yes 01100110 01100110 8 . . . -Joe
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47ACD7D4.5050905>