Date: Sat, 18 Feb 2012 18:55:17 +0100 From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> To: Martin Simmons <martin@lispworks.com> Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: 9-stable : geli + one-disk ZFS fails Message-ID: <wpty2orpq2.fsf@heho.snv.jussieu.fr> In-Reply-To: <201202141820.q1EIK1MP032526@higson.cam.lispworks.com> (Martin Simmons's message of "Tue\, 14 Feb 2012 18\:20\:01 GMT") References: <wpty2xcqop.fsf@heho.snv.jussieu.fr> <wppqdifjed.fsf@heho.snv.jussieu.fr> <201202141820.q1EIK1MP032526@higson.cam.lispworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, Martin Simmons <martin@lispworks.com> writes: > Some random ideas: > > 1) Can you dd the whole of ada0s3.eli without errors? > > 2) If you scrub a few more times, does it find the same number of errors each > time and are they always in that XNAT.tar file? > > 3) Can you try zfs without geli? yeah, and it seems to rule out geli : [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and /dev/ada0s4 ] geli init /dev/ada0s3 geli attach /dev/ada0s3 zpool create zgeli /dev/ada0s3.eli zfs create zgeli/home zfs create zgeli/home/arno zfs create zgeli/home/arno/.priv zfs create zgeli/home/arno/.scito zfs set copies=2 zgeli/home/arno/.priv zfs set atime=off zgeli [put some files on it, wait a little : ] [root@cc ~]# zpool status -v pool: zgeli state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub in progress since Sat Feb 18 17:46:54 2012 425M scanned out of 2.49G at 85.0M/s, 0h0m to go 0 repaired, 16.64% done config: NAME STATE READ WRITE CKSUM zgeli ONLINE 0 0 1 ada0s3.eli ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso [root@cc ~]# zpool scrub -s zgeli [root@cc ~]# [then idem directly on next partition ] zpool create zgpart /dev/ada0s4 zfs create zgpart/home zfs create zgpart/home/arno zfs create zgpart/home/arno/.priv zfs create zgpart/home/arno/.scito zfs set copies=2 zgpart/home/arno/.priv zfs set atime=off zgpart [put some files on it, wait a little : ] pool: zgpart state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012 config: NAME STATE READ WRITE CKSUM zgpart ONLINE 0 0 1 ada0s4 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: /zgpart/home/arno/.scito/ .... [root@cc ~]# I still do not particuliarly suspect the disk since I cannot reproduce similar behaviour on UFS. That said, this disk is supposed to be 'hybrid-SSD', maybe something special ZFS doesn't like ??? : ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <ST95005620AS SD23> ATA-8 SATA 2.x device ada0: Serial Number 5YX0J5YD ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 GEOM: new disk ada0 Please let me know what information to provide more. Best, Arno > 4) Is the slice/partition layout definitely correct? > > __Martin > > >>>>>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said: >> >> hello, >> >> to eventually gain interest in this issue : >> >> I updated to today's -stable, tested with vfs.zfs.debug=1 >> and vfs.zfs.prefetch_disable=0, no difference. >> >> I also tested to read the raw partition : >> >> [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096 conv=noerror >> 103746636+0 records in >> 103746636+0 records out >> 424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec) >> [root@cc /usr/ports]# >> >> Disk is brand new, looks ok, either my setup is not good or there is >> a bug somewhere; I can play around with this box for some more time, >> please feel free to provide me with some hints what to do to be useful >> for you. >> >> Best, >> >> Arno >> >> >> "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> writes: >> >> > Hello, >> > >> > >> > I finally decided to 'play' a bit with ZFS on a notebook, some years >> > old, but I installed a brand new disk and memtest passes OK. >> > >> > I installed base+ports on partition 2, using 'classical' UFS. >> > >> > I crypted partition 3 and created a single zpool on it containing >> > 4 Z-"file-systems" : >> > >> > [root@cc ~]# zfs list >> > NAME USED AVAIL REFER MOUNTPOINT >> > zfiles 10.7G 377G 152K /zfiles >> > zfiles/home 10.6G 377G 119M /zfiles/home >> > zfiles/home/arno 10.5G 377G 2.35G /zfiles/home/arno >> > zfiles/home/arno/.priv 192K 377G 192K /zfiles/home/arno/.priv >> > zfiles/home/arno/.scito 8.18G 377G 8.18G /zfiles/home/arno/.scito >> > >> > >> > I export the ZFS's via nfs and rsynced on the other machine some backup >> > of my current note-book (geli + UFS, (almost) same 9-stable version, no >> > problem) to the ZFS's. >> > >> > >> > Quite fast, I see on the notebook : >> > >> > >> > [root@cc /usr/temp]# zpool status -v >> > pool: zfiles >> > state: ONLINE >> > status: One or more devices has experienced an error resulting in data >> > corruption. Applications may be affected. >> > action: Restore the file in question if possible. Otherwise restore the >> > entire pool from backup. >> > see: http://www.sun.com/msg/ZFS-8000-8A >> > scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34 >> > 2012 >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > zfiles ONLINE 0 0 11 >> > ada0s3.eli ONLINE 0 0 23 >> > >> > errors: Permanent errors have been detected in the following files: >> > >> > /zfiles/home/arno/.scito/contrib/XNAT.tar >> > [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar >> > md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error >> > [root@cc /usr/temp]# >> > >> > >> > As said, memtest is OK, nothing is logged to the console, UFS on the >> > same disk works OK (I did some tests copying and comparing random data) >> > and smartctl as well seems to trust the disk : >> > >> > SMART Self-test log structure revision number 1 >> > Num Test_Description Status Remaining LifeTime(hours) >> > # 1 Extended offline Completed without error 00% 388 >> > # 2 Short offline Completed without error 00% 387 >> > >> > >> > Am I doing something wrong and/or let me know what I could provide as >> > extra info to try to solve this (dmesg.boot at the end of this mail). >> > >> > Thanx a lot in advance, >> > >> > best, Arno >> > >> > >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wpty2orpq2.fsf>