Date: Sun, 05 Oct 2014 17:58:58 +0200 From: InterNetX - Juergen Gotteswinter <juergen.gotteswinter@internetx.com> To: Dmitry Morozovsky <marck@rinet.ru>, Mikolaj Golub <to.my.trociny@gmail.com> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, Matt Churchyard <matt.churchyard@userve.net> Subject: Re: HAST with broken HDD Message-ID: <54316AC2.5040909@internetx.com> In-Reply-To: <543168D0.2000705@internetx.com> References: <542BC135.1070906@Skynet.be> <542BDDB3.8080805@internetx.com> <CA%2BdUSypO8xTR3sh_KSL9c9FLxbGH%2BbTR9-gPdcCVd%2Bt0UgUF-g@mail.gmail.com> <542BF853.3040604@internetx.com> <CA%2BdUSyp4vMB_qUeqHgXNz2FiQbWzh8MjOEFYw%2BURcN4gUq69nw@mail.gmail.com> <542C019E.2080702@internetx.com> <CA%2BdUSyoEcPdJ1hdR3k1vNROFG7p1kN0HB5S2a_0gYhiV75OLAw@mail.gmail.com> <542C0710.3020402@internetx.com> <CA%2BdUSyr9OK9SvN3wX-O4DeriLBP-EEuAA8TTSYwdGfcR1asdtQ@mail.gmail.com> <97aab72e19d640ebb65c754c858043cc@SERVER.ad.usd-group.com> <20141003175439.GA7664@gmail.com> <alpine.BSF.2.00.1410051846480.72273@woozle.rinet.ru> <543168D0.2000705@internetx.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 05.10.2014 um 17:50 schrieb InterNetX - Juergen Gotteswinter: > > > Am 05.10.2014 um 16:50 schrieb Dmitry Morozovsky: >> On Fri, 3 Oct 2014, Mikolaj Golub wrote: >> >>> Disk errors are recorded to syslog. Also error counters are displayed >>> in `hastctl list' output. There is snmp_hast(3) in base -- a module >>> for bsnmp to retrieve this statistics via snmp protocol (traps are not >>> supported though). >>> >>> For notifications, the hastd can be configured to execute an arbitrary >>> command on various HAST events (see description for `exec' in >>> hast.conf(5)). Unfortunately, it does not have hooks for I/O error >>> events currently. It might be worth adding though. The problem with >>> this that it may generate to many events, so some throttling is >>> needed. >> >> And, I it, this should be noted, some kind of error-coalescing or similar >> before going from "warning" shate (there are some read error, but otherwise the >> disk is useable, and it would be overly hassle to switch to remote component >> completely) to "error" state (component is unuseable and needs to be replaced >> ASAP; drop it from HAST pair, and switchover if needed). >> >> Error such as "device lost" is, of course, fatal from the very beginning; but >> -- how should we interpret, well, sporadic controller resets with the disk >> coming back and catching syncing again? >> >> > > Hi Dmitry, > > since HAST is somehow not so different from DRBD, why dont take their > way of Error Handling as "Template". DRBD works pretty well and rock > solid since years, a well established Solution. HAST got the potencial > to become this also, with some improvements. > > Just my 2 Cents :) > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > forgot this one... http://www.drbd.org/users-guide/s-handling-disk-errors.html
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54316AC2.5040909>