Date: Thu, 30 Jun 2011 20:02:19 -0700 From: Timothy Smith <tts@personalmis.com> To: freebsd-stable@freebsd.org Subject: HAST + ZFS: no action on drive failure Message-ID: <BANLkTi==ctVw1HpGkw-8QG68abCg-1Vp9g@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
First posting here, hopefully I'm doing it right =) I also posted this to the FreeBSD forum, but I know some hast folks monitor this list regularly and not so much there, so... Basically, I'm testing failure scenarios with HAST/ZFS. I got two nodes, scripted up a bunch of checks and failover actions between the nodes. Looking good so far, though more complex that I expected. It would be cool to post it somewher to get some pointers/critiques, but that's another thing. Anyway, now I'm just seeing what happens when a drive fails on primary node. Oddly/sadly, NOTHING! Hast just keeps on a ticking, and doesn't change the state of the failed drive, so the zpool has no clue the drive is offline. The /dev/hast/<resource> remains. The hastd does log some errors to the system log like this, but nothing more. messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Unable to flush activemap to disk: Device not configured. messages.0:Jun 30 18:39:59 nas1 hastd[11066]: [ada6] (primary) Local request failed (Device not configured): WRITE(4736512, 512). So, I guess the question is, "Do I have to script a cronjob to check for these kinds of errors and then change the hast resource to 'init' or something to handle this?" Or is there some kind of hastd config setting that I need to set? What's the SOP for this? As something related too, when the zpool in FreeBSD does finally notice that the drive is missing because I have manually changed the hast resource to INIT (so the /dev/hast/<res> is gone), my zpool (raidz2) hot spare doesn't engage, even with "autoreplace=on". The zpool status of the degraded pool seems to indicate that I should manually replace the failed drive. If that's the case, it's not really a "hot spare". Does this mean the "FMA Agent" referred to in the ZFS manual is not implemented in FreeBSD? thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTi==ctVw1HpGkw-8QG68abCg-1Vp9g>