From owner-freebsd-fs@FreeBSD.ORG  Sun Feb 19 16:55:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 548241065672;
	Sun, 19 Feb 2012 16:55:46 +0000 (UTC)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129])
	by mx1.freebsd.org (Postfix) with ESMTP id BBE0F8FC0A;
	Sun, 19 Feb 2012 16:55:45 +0000 (UTC)
Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22])
	by shiva.jussieu.fr (8.14.4/jtpda-5.4) with ESMTP id q1JGtInM021294
	; Sun, 19 Feb 2012 17:55:31 +0100 (CET)
X-Ids: 168
Received: from heho.snv.jussieu.fr (localhost [127.0.0.1])
	by heho.snv.jussieu.fr (8.14.3/8.14.3) with ESMTP id q1JGsoLU054604;
	Sun, 19 Feb 2012 17:54:50 +0100 (CET)
	(envelope-from arno@heho.snv.jussieu.fr)
Received: (from arno@localhost)
	by heho.snv.jussieu.fr (8.14.3/8.14.3/Submit) id q1JGsoIr054599;
	Sun, 19 Feb 2012 17:54:50 +0100 (CET) (envelope-from arno)
To: Martin Simmons <martin@lispworks.com>
From: "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
References: <wpty2xcqop.fsf@heho.snv.jussieu.fr>
	<wppqdifjed.fsf@heho.snv.jussieu.fr>
	<201202141820.q1EIK1MP032526@higson.cam.lispworks.com>
	<wpty2orpq2.fsf@heho.snv.jussieu.fr>
Date: Sun, 19 Feb 2012 17:54:50 +0100
In-Reply-To: <wpty2orpq2.fsf@heho.snv.jussieu.fr> (Arno J. Klaassen's message
	of "Sat\, 18 Feb 2012 18\:55\:17 +0100")
Message-ID: <wpzkce2279.fsf_-_@heho.snv.jussieu.fr>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Miltered: at jchkmail.jussieu.fr with ID 4F412976.000 by Joe's j-chkmail
	(http : // j-chkmail dot ensmp dot fr)!
X-j-chkmail-Enveloppe: 4F412976.000/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/<arno@heho.snv.jussieu.fr>
Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org
Subject: 9-stable: one-device ZFS fails [was: 9-stable : geli + one-disk ZFS
	fails]
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Feb 2012 16:55:46 -0000


a followup to myself

> Hello,
>
> Martin Simmons <martin@lispworks.com> writes:
>
>> Some random ideas:
>>
>> 1) Can you dd the whole of ada0s3.eli without errors?
>>
>> 2) If you scrub a few more times, does it find the same number of errors each
>> time and are they always in that XNAT.tar file?
>>
>> 3) Can you try zfs without geli?
>
>
> yeah, and it seems to rule out geli :
>
> [ splitted original /dev/ada0s3 in equally sized /dev/ada0s3 and
> /dev/ada0s4 ]
>
>  geli init /dev/ada0s3
>  geli attach /dev/ada0s3
>
>  zpool create zgeli /dev/ada0s3.eli
>
>  zfs create zgeli/home
>  zfs create zgeli/home/arno
>  zfs create zgeli/home/arno/.priv
>  zfs create zgeli/home/arno/.scito
>  zfs set copies=2 zgeli/home/arno/.priv
>  zfs set atime=off zgeli
>
>
> [put some files on it, wait a little : ]
>
>
>    [root@cc ~]# zpool status -v
>    pool: zgeli
>   state: ONLINE
>  status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
>  action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>    scan: scrub in progress since Sat Feb 18 17:46:54 2012
>          425M scanned out of 2.49G at 85.0M/s, 0h0m to go
>          0 repaired, 16.64% done
>  config: 
>  
>          NAME          STATE     READ WRITE CKSUM
>          zgeli         ONLINE       0     0     1
>            ada0s3.eli  ONLINE       0     0     2
>
>  errors: Permanent errors have been detected in the following files:
>
>         /zgeli/home/arno/8.0-CURRENT-200902-amd64-livefs.iso
>  [root@cc ~]# zpool scrub -s zgeli
>  [root@cc ~]# 
>
>
> [then idem directly on next partition ]
>
>  zpool create zgpart /dev/ada0s4
>
>  zfs create zgpart/home
>  zfs create zgpart/home/arno
>  zfs create zgpart/home/arno/.priv
>  zfs create zgpart/home/arno/.scito
>  zfs set copies=2 zgpart/home/arno/.priv
>  zfs set atime=off zgpart
>
> [put some files on it, wait a little : ]
>
>    pool: zgpart
>   state: ONLINE
>  status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
>  action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://www.sun.com/msg/ZFS-8000-8A
>    scan: scrub repaired 0 in 0h0m with 1 errors on Sat Feb 18 18:04:45 2012
>  config:
>
>          NAME        STATE     READ WRITE CKSUM
>          zgpart      ONLINE       0     0     1
>            ada0s4    ONLINE       0     0     2
>
>  errors: Permanent errors have been detected in the following files:
>
>          /zgpart/home/arno/.scito/ ....
>  [root@cc ~]# 


I tested a bit more this afternoon :


  - zpool create zgpart /dev/ada0s4d  => 
    
    KO

  - split ada0s4 in two equally sized partitions and then
      
      zpool create zgpart mirror /dev/ada0s4d /dev/ada0s4e =>

    works like a charm .....

   ( [root@cc /zgpart]# zpool status -v zgpart
       pool: zgpart
       state: ONLINE
       scan: scrub repaired 0 in 0h36m with 0 errors on Sun Feb 19
       17:20:34 2012
     config:

        NAME         STATE     READ WRITE CKSUM
        zgpart       ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            ada0s4d  ONLINE       0     0     0
            ada0s4e  ONLINE       0     0     0

     errors: No known data errors )
  

FYI, best, Arno


>
> I still do not particuliarly suspect the disk since I cannot reproduce
> similar behaviour on UFS.
>
> That said, this disk is supposed to be 'hybrid-SSD', maybe something
> special ZFS doesn't like ??? :
>
>
>  ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
>  ada0: <ST95005620AS SD23> ATA-8 SATA 2.x device
>  ada0: Serial Number 5YX0J5YD
>  ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
>  ada0: Command Queueing enabled
>  ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C)
>  ada0: Previously was known as ad4
>  GEOM: new disk ada0
>
>
> Please let me know what information to provide more.
>
> Best,
>
> Arno
>
>
>
>
>> 4) Is the slice/partition layout definitely correct?
>>
>> __Martin
>>
>>
>>>>>>> On Mon, 13 Feb 2012 23:39:06 +0100, Arno J Klaassen said:
>>> 
>>> hello,
>>> 
>>> to eventually gain interest in this issue :
>>> 
>>>  I updated to today's -stable, tested with vfs.zfs.debug=1
>>>  and vfs.zfs.prefetch_disable=0, no difference.
>>> 
>>>  I also tested to read the raw partition :
>>> 
>>>   [root@cc /usr/ports]# dd if=/dev/ada0s3 of=/dev/null bs=4096  conv=noerror
>>>   103746636+0 records in
>>>   103746636+0 records out
>>>   424946221056 bytes transferred in 13226.346738 secs (32128768 bytes/sec)
>>>   [root@cc /usr/ports]#
>>> 
>>>  Disk is brand new, looks ok, either my setup is not good or there is
>>>  a bug somewhere; I can play around with this box for some more time,
>>>  please feel free to provide me with some hints what to do to be useful
>>>  for you.
>>> 
>>> Best,
>>> 
>>> Arno
>>> 
>>> 
>>> "Arno J. Klaassen" <arno@heho.snv.jussieu.fr> writes:
>>> 
>>> > Hello,
>>> >
>>> >
>>> > I finally decided to 'play' a bit with ZFS on a notebook, some years
>>> > old, but I installed a brand new disk and memtest passes OK.
>>> >
>>> > I installed base+ports on partition 2, using 'classical' UFS.
>>> >
>>> > I crypted partition 3 and created a single zpool on it containing
>>> > 4 Z-"file-systems" :
>>> >
>>> >  [root@cc ~]# zfs list
>>> >  NAME                      USED  AVAIL  REFER  MOUNTPOINT
>>> >  zfiles                   10.7G   377G   152K  /zfiles
>>> >  zfiles/home              10.6G   377G   119M  /zfiles/home
>>> >  zfiles/home/arno         10.5G   377G  2.35G  /zfiles/home/arno
>>> >  zfiles/home/arno/.priv    192K   377G   192K  /zfiles/home/arno/.priv
>>> >  zfiles/home/arno/.scito  8.18G   377G  8.18G  /zfiles/home/arno/.scito
>>> >
>>> >
>>> > I export the ZFS's via nfs and rsynced on the other machine some backup
>>> > of my current note-book (geli + UFS, (almost) same 9-stable version, no
>>> > problem) to the ZFS's.
>>> >
>>> >
>>> > Quite fast, I see on the notebook :
>>> >
>>> >
>>> >  [root@cc /usr/temp]# zpool status -v
>>> >    pool: zfiles
>>> >   state: ONLINE
>>> >  status: One or more devices has experienced an error resulting in data
>>> >          corruption.  Applications may be affected.
>>> >  action: Restore the file in question if possible.  Otherwise restore the
>>> >          entire pool from backup.
>>> >     see: http://www.sun.com/msg/ZFS-8000-8A
>>> >    scan: scrub repaired 0 in 0h1m with 11 errors on Sat Feb 11 14:55:34
>>> >    2012
>>> >  config: 
>>> >  
>>> >          NAME          STATE     READ WRITE CKSUM
>>> >          zfiles        ONLINE       0     0    11
>>> >            ada0s3.eli  ONLINE       0     0    23
>>> >
>>> >  errors: Permanent errors have been detected in the following files:
>>> >
>>> >          /zfiles/home/arno/.scito/contrib/XNAT.tar
>>> >  [root@cc /usr/temp]# md5 /zfiles/home/arno/.scito/contrib/XNAT.tar
>>> >  md5: /zfiles/home/arno/.scito/contrib/XNAT.tar: Input/output error
>>> >  [root@cc /usr/temp]#
>>> >
>>> >
>>> > As said, memtest is OK, nothing is logged to the console, UFS on the
>>> > same disk works OK (I did some tests copying and comparing random data)
>>> > and smartctl as well seems to trust the disk :
>>> >
>>> >  SMART Self-test log structure revision number 1
>>> >  Num  Test_Description    Status                  Remaining  LifeTime(hours)
>>> >  # 1  Extended offline    Completed without error       00%       388
>>> >  # 2  Short offline       Completed without error       00%       387 
>>> >
>>> >
>>> > Am I doing something wrong and/or let me know what I could provide as
>>> > extra info to try to solve this (dmesg.boot at the end of this mail).
>>> >
>>> > Thanx a lot in advance,
>>> >
>>> > best, Arno
>>> >
>>> >
>>> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"