Date: Tue, 26 Jan 2010 14:57:20 +0100 From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org Subject: Re: immense delayed write to file system (ZFS and UFS2), performance issues Message-ID: <20100126145720.ad9115ff.gerrit@pmp.uni-hannover.de> In-Reply-To: <20100119112449.GA73052@icarus.home.lan> References: <4B54C100.9080906@mail.zedat.fu-berlin.de> <4B54C5EE.5070305@pp.dyndns.biz> <201001191250.23625.doconnor@gsoft.com.au> <7346c5c61001181841j3653a7c3m32bc033c8c146a92@mail.gmail.com> <4B557B5A.8040902@pp.dyndns.biz> <20100119095736.GA71824@icarus.home.lan> <20100119110724.ec01a3ed.gerrit@pmp.uni-hannover.de> <20100119112449.GA73052@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 19 Jan 2010 03:24:49 -0800 Jeremy Chadwick
<freebsd@jdc.parodius.com> wrote about Re: immense delayed write to file
system (ZFS and UFS2), performance issues:
JC> So which drive models above are experiencing a continual increase in
JC> SMART attribute 193 (Load Cycle Count)? My guess is that some of the
JC> WD Caviar Green models, and possibly all of the RE2-GP and RE4-GP
JC> models are experiencing this problem.
Just to add some more info:
I contacted WD support about the problem with RE4 drives and received a
firmware update by email today which is supposed to fix the problem. Did
not try it yet, though.
I am still busy replacing RE2-disks with updated drives. I came across a
very strange thing with zfs. Actually I had the following pool layout:
mclane# zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad8 ONLINE 0 0 0
ad10 ONLINE 0 0 0
ad12 ONLINE 0 0 0
spares
ad14 AVAIL
errors: No known data errors
All disks still have the firmware bug, so I want to replace them with
disks that I already fixed. I put in a updated drive as ad18 and
wanted to replace ad12 to get the drive with the broken firmware out:
mclane# zpool replace tank /dev/ad12 /dev/ad18
mclane# zpool status
pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0.01% done, 52h51m to go
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad8 ONLINE 0 0 0 7.21M resilvered
ad10 ONLINE 0 0 0 7.22M resilvered
replacing ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad18 ONLINE 0 0 0 10.7M resilvered
spares
ad14 AVAIL
errors: No known data errors
However, something must have gone wrong during the resilvering process and
it now looks like this:
mclane# zpool status
pool: tank
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 2h39m with 0 errors on Tue Jan 26
14:00:00 2010 config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
ad8 ONLINE 0 0 0 975M resilvered
ad10 ONLINE 0 0 142 974M resilvered
replacing DEGRADED 0 7.25M 0
ad12 ONLINE 0 0 0
ad18 REMOVED 0 1 0 79.4M resilvered
spares
ad14 AVAIL
errors: No known data errors
What is going on here? ad18 obviously detached during the
process. /var/log/messages just gives me
Jan 26 11:23:33 mclane kernel: ad18: FAILURE - device detached
Additionally ad10 obviously produced chksum errors. What do I do about the
degraded replacing process? Can I terminate it somehow and maybe replace
ad10 first? Any other hints?
cu
Gerrit
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100126145720.ad9115ff.gerrit>
