FreeBSD Mail Archives

Date:      Tue, 26 Jan 2010 14:57:20 +0100
From:      Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: immense delayed write to file system (ZFS and UFS2), performance issues
Message-ID:  <20100126145720.ad9115ff.gerrit@pmp.uni-hannover.de>
In-Reply-To: <20100119112449.GA73052@icarus.home.lan>
References:  <4B54C100.9080906@mail.zedat.fu-berlin.de> <4B54C5EE.5070305@pp.dyndns.biz> <201001191250.23625.doconnor@gsoft.com.au> <7346c5c61001181841j3653a7c3m32bc033c8c146a92@mail.gmail.com> <4B557B5A.8040902@pp.dyndns.biz> <20100119095736.GA71824@icarus.home.lan> <20100119110724.ec01a3ed.gerrit@pmp.uni-hannover.de> <20100119112449.GA73052@icarus.home.lan>

index | next in thread | previous in thread | raw e-mail


On Tue, 19 Jan 2010 03:24:49 -0800 Jeremy Chadwick
<freebsd@jdc.parodius.com> wrote about Re: immense delayed write to file
system (ZFS and UFS2), performance issues:

JC> So which drive models above are experiencing a continual increase in
JC> SMART attribute 193 (Load Cycle Count)?  My guess is that some of the
JC> WD Caviar Green models, and possibly all of the RE2-GP and RE4-GP
JC> models are experiencing this problem.

Just to add some more info:
I contacted WD support about the problem with RE4 drives and received a
firmware update by email today which is supposed to fix the problem. Did
not try it yet, though.


I am still busy replacing RE2-disks with updated drives. I came across a
very strange thing with zfs. Actually I had the following pool layout:

mclane# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
        spares
          ad14      AVAIL   

errors: No known data errors

All disks still have the firmware bug, so I want to replace them with
disks that I already fixed. I put in a updated drive as ad18 and
wanted to replace ad12 to get the drive with the broken firmware out:

mclane# zpool replace tank /dev/ad12 /dev/ad18 
mclane# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.01% done, 52h51m to go
config:

        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0     0
          raidz1       ONLINE       0     0     0
            ad8        ONLINE       0     0     0  7.21M resilvered
            ad10       ONLINE       0     0     0  7.22M resilvered
            replacing  ONLINE       0     0     0
              ad12     ONLINE       0     0     0
              ad18     ONLINE       0     0     0  10.7M resilvered
        spares
          ad14         AVAIL   

errors: No known data errors

However, something must have gone wrong during the resilvering process and
it now looks like this:

mclane# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 2h39m with 0 errors on Tue Jan 26
14:00:00 2010 config:

        NAME           STATE     READ WRITE CKSUM
        tank           DEGRADED     0     0     0
          raidz1       DEGRADED     0     0     0
            ad8        ONLINE       0     0     0  975M resilvered
            ad10       ONLINE       0     0   142  974M resilvered
            replacing  DEGRADED     0 7.25M     0
              ad12     ONLINE       0     0     0
              ad18     REMOVED      0     1     0  79.4M resilvered
        spares
          ad14         AVAIL   

errors: No known data errors


What is going on here? ad18 obviously detached during the
process. /var/log/messages just gives me

Jan 26 11:23:33 mclane kernel: ad18: FAILURE - device detached

Additionally ad10 obviously produced chksum errors. What do I do about the
degraded replacing process? Can I terminate it somehow and maybe replace
ad10 first? Any other hints?


cu
  Gerrit

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100126145720.ad9115ff.gerrit>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation