Date: Tue, 03 Jan 2012 16:17:11 +0100 From: Johan Hendriks <joh.hendriks@gmail.com> To: Matt Burke <mattblists@icritical.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS v28 on -STABLE not using hot spare Message-ID: <4F031BF7.8000900@gmail.com> In-Reply-To: <4F031654.1080200@icritical.com> References: <4F031654.1080200@icritical.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Matt Burke schreef: > Over the holidays one of the disks on a server has failed, but despite > configuring a hot spare, ZFS hasn't used it for some reason. Can anyone > shed some light on what I might have mis-configured to break the hot-spare > functionality? > > > [root@x ~]# uname -a > FreeBSD x 8.2-STABLE FreeBSD 8.2-STABLE #4: Mon Dec 5 12:43:58 GMT 2011 > root@x:/usr/obj/usr/src/sys/x amd64 > > > [root@x ~]# more /usr/src/sys/amd64/conf/x > include GENERIC > ident x > > options GEOM_STRIPE > options ROUTETABLES=4 > > > [root@x ~]# zpool status -v > pool: data > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the device > repaired. > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > data DEGRADED 0 0 0 > mirror-0 ONLINE 0 0 0 > mfid0 ONLINE 0 0 0 > mfid14 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > mfid1 ONLINE 0 0 0 > mfid15 ONLINE 0 0 0 > mirror-2 DEGRADED 0 0 0 > mfid2 ONLINE 0 0 0 > mfid16 FAULTED 0 931 0 too many errors > mirror-3 ONLINE 0 0 0 > mfid3 ONLINE 0 0 0 > mfid17 ONLINE 0 0 0 > mirror-4 ONLINE 0 0 0 > mfid4 ONLINE 0 0 0 > mfid18 ONLINE 0 0 0 > mirror-5 ONLINE 0 0 0 > mfid5 ONLINE 0 0 0 > mfid19 ONLINE 0 0 0 > mirror-6 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > mfid20 ONLINE 0 0 0 > mirror-7 ONLINE 0 0 0 > mfid7 ONLINE 0 0 0 > mfid21 ONLINE 0 0 0 > mirror-8 ONLINE 0 0 0 > mfid8 ONLINE 0 0 0 > mfid22 ONLINE 0 0 0 > mirror-9 ONLINE 0 0 0 > mfid9 ONLINE 0 0 0 > mfid23 ONLINE 0 0 0 > mirror-10 ONLINE 0 0 0 > mfid10 ONLINE 0 0 0 > mfid24 ONLINE 0 0 0 > logs > mirror-11 ONLINE 0 0 0 > mfid13 ONLINE 0 0 0 > mfid26 ONLINE 0 0 0 > cache > mfid12 ONLINE 0 0 0 > mfid25 ONLINE 0 0 0 > spares > mfid11 AVAIL > > errors: No known data errors > > The logs show loads of mfi1 and mfid16 errors for a few minutes, and then > (presumably when ZFS dropped the disk) nothing relevant after that. ZFS > hasn't logged anything, not even that it's failed a disk. > > I've manually done a 'zpool replace data mfid16 mfid11' which has brought > the spare in without problems, but I'm eager to learn what I did (or didn't > do?) to cause the spare to not be used automatically. > > Thanks in advance, > > ZFS on FreeBSD does not have 'HOT' spares. They are cold, and human intervention is needed to replace a disk in a pool. There are some topics about it on the net. I opt for a warning, because a lot of users get a false security sence when using the spares. zpool should not accept the spare without a warning to the user that it is a cold spare and not a hot one. it looks like there is some work planned for a zfs deamon that should overcome this problem on FreeBSD http://svnweb.freebsd.org/base?view=revision&revision=222836 On Solaris there is also a deamon running that does the actual replace. It should not be to hard to make a script that checks every minute or what time interval you want and check if a pool is degraded, then check if autoreplace is set for the pool, if so check if there is a spare, if so do the actual replace. Unfortunally i can not code :( Maybe some one has a script lying around. ?? regards Johan Hendriks
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F031BF7.8000900>