Date: Tue, 26 Jan 2010 14:57:20 +0100 From: Gerrit =?ISO-8859-1?Q?K=FChn?= <gerrit@pmp.uni-hannover.de> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-stable@freebsd.org Subject: Re: immense delayed write to file system (ZFS and UFS2), performance issues Message-ID: <20100126145720.ad9115ff.gerrit@pmp.uni-hannover.de> In-Reply-To: <20100119112449.GA73052@icarus.home.lan> References: <4B54C100.9080906@mail.zedat.fu-berlin.de> <4B54C5EE.5070305@pp.dyndns.biz> <201001191250.23625.doconnor@gsoft.com.au> <7346c5c61001181841j3653a7c3m32bc033c8c146a92@mail.gmail.com> <4B557B5A.8040902@pp.dyndns.biz> <20100119095736.GA71824@icarus.home.lan> <20100119110724.ec01a3ed.gerrit@pmp.uni-hannover.de> <20100119112449.GA73052@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 19 Jan 2010 03:24:49 -0800 Jeremy Chadwick <freebsd@jdc.parodius.com> wrote about Re: immense delayed write to file system (ZFS and UFS2), performance issues: JC> So which drive models above are experiencing a continual increase in JC> SMART attribute 193 (Load Cycle Count)? My guess is that some of the JC> WD Caviar Green models, and possibly all of the RE2-GP and RE4-GP JC> models are experiencing this problem. Just to add some more info: I contacted WD support about the problem with RE4 drives and received a firmware update by email today which is supposed to fix the problem. Did not try it yet, though. I am still busy replacing RE2-disks with updated drives. I came across a very strange thing with zfs. Actually I had the following pool layout: mclane# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 spares ad14 AVAIL errors: No known data errors All disks still have the firmware bug, so I want to replace them with disks that I already fixed. I put in a updated drive as ad18 and wanted to replace ad12 to get the drive with the broken firmware out: mclane# zpool replace tank /dev/ad12 /dev/ad18 mclane# zpool status pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.01% done, 52h51m to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad8 ONLINE 0 0 0 7.21M resilvered ad10 ONLINE 0 0 0 7.22M resilvered replacing ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad18 ONLINE 0 0 0 10.7M resilvered spares ad14 AVAIL errors: No known data errors However, something must have gone wrong during the resilvering process and it now looks like this: mclane# zpool status pool: tank state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 2h39m with 0 errors on Tue Jan 26 14:00:00 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 ad8 ONLINE 0 0 0 975M resilvered ad10 ONLINE 0 0 142 974M resilvered replacing DEGRADED 0 7.25M 0 ad12 ONLINE 0 0 0 ad18 REMOVED 0 1 0 79.4M resilvered spares ad14 AVAIL errors: No known data errors What is going on here? ad18 obviously detached during the process. /var/log/messages just gives me Jan 26 11:23:33 mclane kernel: ad18: FAILURE - device detached Additionally ad10 obviously produced chksum errors. What do I do about the degraded replacing process? Can I terminate it somehow and maybe replace ad10 first? Any other hints? cu Gerrit
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100126145720.ad9115ff.gerrit>