Date: Wed, 24 Sep 2014 06:08:05 -0500 From: Scott Bennett <bennett@sdf.org> To: freebsd-questions@freebsd.org Subject: ZFS and 2 TB disk drive technology :-( Message-ID: <201409241108.s8OB85mY021922@sdf.org>
next in thread | raw e-mail | index | archive | help
I've now tried some testing with ZFS on four of the five drives that I currently have ready to put into use for a raidz2 cluster. In the process, I've found that some of the recommendations made for setting various kernel variables in /boot/loader.conf don't seem to work as represented, at least not in i386. To the best of my memory, setting vfs.zfs.arc_max or vm.kmem_size results in a panic in very short order. Secondly, setting vm.kmem_size_max works, but only if the value to which it is set does not exceed 512 MB. 512 MB, however, does seem to be sufficient to eliminate the ZFS kernel module's initialization warning that says to expect unstable behavior, so that problem appears to have been resolved. I created a four-way mirror vdev, where the four drives were as follows. da1 WD 2TB drive (new, in old "MyBook" case with USB 2.0, Firewire 400, and eSATA interfaces, connected via Firewire 400) da2 Seagate 2TB drive (refurbished and seems to work tolerably well, in old Backups Plus case with USB 3.0 interface) da5 Seagate 2TB drive (refurbished, already shown to get between 1900 and 2000 bytes in error on a 1.08 TB file copy, in old Backups Plus case with USB 3.0 interface) da7 Samsung 2TB drive (Samsung D3 Station, new in June, already shown to get between 1900 and 2000 bytes in error on a 1.08 TB file copy, with USB 3.0 interface) Then I copied the 1.08 TB file again from another Seagate 2 TB drive to the mirror vdev. No errors were detected during the copy. Then I began creating a tar file from large parts of a nearly full 1.2 TB file system (UFS2) on yet another Seagate 2TB on the Firewire 400 bus with the tar output going to a file in the mirror in order to try to have written something to most of the sectors on the four-drive mirror. I terminated tar after the empty space in the mirror got down to about 3% because the process had slowed to a crawl. (Apparently, space allocation in ZFS slows down far more than UFS2 when available space gets down to the last few percent.:-( ) Next, I ran a scrub on the mirror and, after the scrub finished, got the following output from a "zpool status -v". pool: testmirror state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 1.38M in 17h59m with 1 errors on Mon Sep 15 19:53:45 2014 config: NAME STATE READ WRITE CKSUM testmirror ONLINE 0 0 1 mirror-0 ONLINE 0 0 2 da1p5 ONLINE 0 0 2 da2p5 ONLINE 0 0 2 da5p5 ONLINE 0 0 8 da7p5 ONLINE 0 0 7 errors: Permanent errors have been detected in the following files: /backups/testmirror/backups.s2A Note that the choices of recommended action above do *not* include replacing a bad drive and having ZFS rebuild its content on the replacement. Why is that so? Thinking, apparently naively, that the scrub had repaired some or most of the errors and wanting to know which drives had ended up with permanent errors, I did a "zpool clear testmirror" and ran another scrub. During this scrub, I got some kernel messages on the console: (da7:umass-sim5:5:0:0): WRITE(10). CDB: 2a 00 3b 20 4d 36 00 00 05 00 (da7:umass-sim5:5:0:0): CAM status: CCB request completed with an error (da7:umass-sim5:5:0:0): Retrying command (da7:umass-sim5:5:0:0): WRITE(10). CDB: 2a 00 3b 20 4d 36 00 00 05 00 (da7:umass-sim5:5:0:0): CAM status: CCB request completed with an error (da7:umass-sim5:5:0:0): Retrying command I don't know how to decipher these error messages (i.e., what do the hex digits after "CDB: " mean?) When it had finished, another "zpool status -v" showed these results. pool: testmirror state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 1.25M in 18h4m with 1 errors on Tue Sep 16 15:02:56 2014 config: NAME STATE READ WRITE CKSUM testmirror ONLINE 0 0 1 mirror-0 ONLINE 0 0 2 da1p5 ONLINE 0 0 2 da2p5 ONLINE 0 0 2 da5p5 ONLINE 0 0 6 da7p5 ONLINE 0 0 8 errors: Permanent errors have been detected in the following files: /backups/testmirror/backups.s2A So it is not clear to me that either scrub fixed *any* errors at all. I next ran a comparison ("cmp -z -l") of the original against the copy now on the mirror, which found these differences before cmp(1) was terminated because the vm_pager got an error while trying to read in a block from the mirror vdev. (The cpuset stuff was to prevent cmp(1) from interfering too much with another ongoing, but unrelated, process.) Script started on Wed Sep 17 01:37:38 2014 [hellas] 101 % time nice +12 cpuset -l 3,0 cmp -z -l /backups/s2C/save/backups.s2A /backups/testmirror/backups.s2A 8169610513 164 124 71816953105 344 304 121604893969 273 233 160321633553 170 130 388494183697 42 2 488384007441 266 226 574339165457 141 101 662115138833 145 105 683519290641 157 117 683546029329 60 20 cmp: Input/output error (caught SIGSEGV) 4144.600u 3948.457s 8:08:08.33 27.6% 15+-393k 5257820+0io 10430953pf+0w [hellas] 104 % time nice +12 cpuset -l 3,0 cmp -z -l /backups/s2C/save/backups.s2A /backups/testmirror/backups.s2A 6022126866 164 124 69669469458 344 304 119457410322 273 233 158174149906 170 130 386346700050 42 2 486236523794 266 226 572191681810 141 101 659967655186 145 105 681371806994 157 117 681398545682 60 20 cmp: Input/output error (caught SIGSEGV) 4132.551u 4003.112s 8:13:20.95 27.4% 15+-345k 5241297+0io 10560652pf+0w [hellas] 105 % time nice +12 cpuset -l 3,0 cmp -z -l /backups/s2C/save/backups.s2A /backups/testmirror/backups.s2A 8169610513 164 124 71816953105 344 304 121604893969 273 233 160321633553 170 130 388494183697 42 2 488384007441 266 226 574339165457 141 101 662115138833 145 105 683519290641 157 117 683546029329 60 20 cmp: Input/output error (caught SIGSEGV) 4136.621u 3977.459s 8:07:43.85 27.7% 15+-378k 5257810+0io 10430951pf+0w [hellas] 106 % As you can see, the hard error seems to be pretty consistent. Also, the bytes found to differ up until termination all differ by a single bit that was on in the original and is off in the copy, always the same bit in the byte. Another issue revealed above is that ZFS, in spite of having *four* copies of the data and checksums of them, failed to detect any problem while reading the data back for cmp(1), much less feed cmp(1) the correct version of the data rather than a corrupted version. Similarly, the hard error (not otherwise logged by the kernel) apparently encountered by vm_pager resulted in termination of cmp(1) rather than resulting in ZFS reading the page from one of the other three drives. I don't see how ZFS is of much help here, so I guess I must have misunderstood the claims for ZFS that I've read on this list and in the available materials on-line. I don't know where to turn next. I will try to call Seagate/Samsung later today again about the bad Samsung drive and the bad, refurbished Seagate drive, but they already told me once that having a couple of kB of errors in a ~1.08 TB file copy does not mean that the drive is bad. I don't know whether they will consider a hard write error to mean the drive is bad. The kernel messages shown above are the first ones I've gotten about any of the drives involved in the copy operation or the tests described above. If anyone reading this has any suggestions for a course of action here, I'd be most interested in reading them. Thanks in advance for any ideas and also for any corrections if I've misunderstood what a ZFS mirror was supposed to have done to preserve the data and maintain correct operation at the application level. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201409241108.s8OB85mY021922>