Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Feb 2012 15:53:37 +0100
From:      "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
To:        Martin Simmons <martin@lispworks.com>
Cc:        freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: 9-stable : geli + one-disk ZFS fails
Message-ID:  <wpvcn8m9la.fsf@heho.snv.jussieu.fr>
In-Reply-To: <201202141820.q1EIK1MP032526@higson.cam.lispworks.com> (Martin Simmons's message of "Tue\, 14 Feb 2012 18\:20\:01 GMT")
References:  <wpty2xcqop.fsf@heho.snv.jussieu.fr> <wppqdifjed.fsf@heho.snv.jussieu.fr> <201202141820.q1EIK1MP032526@higson.cam.lispworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Hello,

Martin Simmons <martin@lispworks.com> writes:

> Some random ideas:
>
> 1) Can you dd the whole of ada0s3.eli without errors?

[root@cc ~]# dd if=/dev/ada0s3.eli of=/dev/null bs=4096 conv=noerror
103746635+0 records in
103746635+0 records out
424946216960 bytes transferred in 18773.796016 secs (22635072 bytes/sec)
[root@cc ~]# 


> 2) If you scrub a few more times, does it find the same number of errors each
> time and are they always in that XNAT.tar file?


Looks like each scrub worsens the situation :


[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 148K in 0h14m with 26 errors on Mon Feb 13 18:54:33 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zfiles        ONLINE       0     0    26
          ada0s3.eli  ONLINE       0     0    87

errors: Permanent errors have been detected in the following files:

 [ 11 files ]

[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Feb 15 14:36:52 2012
        17.7G scanned out of 28.7G at 72.1M/s, 0h2m to go
        0 repaired, 61.56% done
config:

        NAME          STATE     READ WRITE CKSUM
        zfiles        ONLINE       0     0    54
          ada0s3.eli  ONLINE       0     0   143

errors: Permanent errors have been detected in the following files:

  [ 11 files ]

# [root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 4K in 0h7m with 70 errors on Wed Feb 15 14:43:57 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zfiles        ONLINE       0     0    96
          ada0s3.eli  ONLINE       0     0   228

errors: Permanent errors have been detected in the following files:

  [ 25 files (cannot quickly see iff it contains all old 11 files) ] 

[root@cc ~]# 

[root@cc ~]# zpool status -v
  pool: zfiles
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0h6m with 70 errors on Wed Feb 15 15:19:28 2012
config:

        NAME          STATE     READ WRITE CKSUM
        zfiles        ONLINE       0     0   166
          ada0s3.eli  ONLINE       0     0   368

errors: Permanent errors have been detected in the following files:

  [ 25 files  ] 

[root@cc ~]# 


> 3) Can you try zfs without geli?
>
> 4) Is the slice/partition layout definitely correct?
>
> __Martin
>
>
>>>>>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
>> 
>> hello,
>> 
>> to eventually gain interest in this issue :
>> 
>>  I updated to today's -stable, tested with vfs.zfs.debug=1
>>  and vfs.zfs.prefetch_disable=0, no difference.
>> 
>>  I also tested to read the raw partition :
>> 
>>   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
>>   103746636+0 records in
>>   103746636+0 records out
>>   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
>>   [root@cc /usr/ports]#
>> 
>>  Disk is brand new, looks ok, either my setup is not good or there is
>>  a bug somewhere; I can play around with this box for some more time,
>>  please feel free to provide me with some hints what to do to be useful
>>  for you.
>> 
>> Best,
>> 
>> Arno
>> 
>> 
>> "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> writes:
>> 
>> > Hello,
>> >
>> >
>> > I finally decided to 'play' a bit with ZFS on a notebook, some years
>> > old, but I installed a brand new disk and memtest passes OK.
>> >
>> > I installed base+ports on partition 2, using 'classical' UFS.
>> >
>> > I crypted partition 3 and created a single zpool on it containing
>> > 4 Z-"file-systems" :
>> >
>> >  [root@cc ~]# zfs list
>> >  NAME                      USED  AVAIL  REFER  MOUNTPOINT
>> >  zfiles                   10.7G   377G   152K  /zfiles
>> >  zfiles/home              10.6G   377G   119M  /zfiles/home
>> >  zfiles/home/arno         10.5G   377G  2.35G  /zfiles/home/arno
>> >  zfiles/home/arno/.priv    192K   377G   192K  /zfiles/home/arno/.priv
>> >  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
>> >
>> >
>> > I export the ZFS's via nfs and rsynced on the other machine some backup
>> > of my current note-book (geli + UFS, (almost) same 9-stable version, no
>> > problem) to the ZFS's.
>> >
>> >
>> > Quite fast, I see on the notebook :
>> >
>> >
>> >  [root@cc /usr/temp]# zpool status -v
>> >    pool: zfiles
>> >   state: ONLINE
>> >  status: One or more devices has experienced an error resulting in data
>> >          corruption.  Applications may be affected.
>> >  action: Restore the file in question if possible.  Otherwise restore the
>> >          entire pool from backup.
>> >     see: http://www.sun.com/msg/ZFS-8000-8A
>> >    scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
>> >    2012
>> >  config: 
>> >  
>> >          NAME          STATE     READ WRITE CKSUM
>> >          zfiles        ONLINE       0     0    11
>> >            ada0s3.eli  ONLINE       0     0    23
>> >
>> >  errors: Permanent errors have been detected in the following files:
>> >
>> >          /zfiles/home/arno/.scito/contrib/XNAT.tar
>> >  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
>> >  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
>> >  [root@cc /usr/temp]#
>> >
>> >
>> > As said, memtest is OK, nothing is logged to the console, UFS on the
>> > same disk works OK (I did some tests copying and comparing random data)
>> > and smartctl as well seems to trust the disk :
>> >
>> >  SMART Self-test log structure revision number 1
>> >  Num  Test_Description    Status                  Remaining  LifeTime(hours)
>> >  # 1  Extended offline    Completed without error       00%       388
>> >  # 2  Short offline       Completed without error       00%       387 
>> >
>> >
>> > Am I doing something wrong and/or let me know what I could provide as
>> > extra info to try to solve this (dmesg.boot at the end of this mail).
>> >
>> > Thanx a lot in advance,
>> >
>> > best, Arno
>> >
>> >
>> >



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wpvcn8m9la.fsf>