Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Oct 2014 17:50:40 +0200
From:      InterNetX - Juergen Gotteswinter <jg@internetx.com>
To:        Dmitry Morozovsky <marck@rinet.ru>,  Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, Matt Churchyard <matt.churchyard@userve.net>
Subject:   Re: HAST with broken HDD
Message-ID:  <543168D0.2000705@internetx.com>
In-Reply-To: <alpine.BSF.2.00.1410051846480.72273@woozle.rinet.ru>
References:  <542BC135.1070906@Skynet.be> <542BDDB3.8080805@internetx.com> <CA%2BdUSypO8xTR3sh_KSL9c9FLxbGH%2BbTR9-gPdcCVd%2Bt0UgUF-g@mail.gmail.com> <542BF853.3040604@internetx.com> <CA%2BdUSyp4vMB_qUeqHgXNz2FiQbWzh8MjOEFYw%2BURcN4gUq69nw@mail.gmail.com> <542C019E.2080702@internetx.com> <CA%2BdUSyoEcPdJ1hdR3k1vNROFG7p1kN0HB5S2a_0gYhiV75OLAw@mail.gmail.com> <542C0710.3020402@internetx.com> <CA%2BdUSyr9OK9SvN3wX-O4DeriLBP-EEuAA8TTSYwdGfcR1asdtQ@mail.gmail.com> <97aab72e19d640ebb65c754c858043cc@SERVER.ad.usd-group.com> <20141003175439.GA7664@gmail.com> <alpine.BSF.2.00.1410051846480.72273@woozle.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


Am 05.10.2014 um 16:50 schrieb Dmitry Morozovsky:
> On Fri, 3 Oct 2014, Mikolaj Golub wrote:
> 
>> Disk errors are recorded to syslog. Also error counters are displayed
>> in `hastctl list' output. There is snmp_hast(3) in base -- a module
>> for bsnmp to retrieve this statistics via snmp protocol (traps are not
>> supported though).
>>
>> For notifications, the hastd can be configured to execute an arbitrary
>> command on various HAST events (see description for `exec' in
>> hast.conf(5)). Unfortunately, it does not have hooks for I/O error
>> events currently. It might be worth adding though. The problem with
>> this that it may generate to many events, so some throttling is
>> needed.
> 
> And, I it, this should be noted, some kind of error-coalescing or similar 
> before going from "warning" shate (there are some read error, but otherwise the 
> disk is useable, and it would be overly hassle to switch to remote component 
> completely) to "error" state (component is unuseable and needs to be replaced 
> ASAP; drop it from HAST pair, and switchover if needed). 
> 
> Error such as "device lost" is, of course, fatal from the very beginning; but 
> -- how should we interpret, well, sporadic controller resets with the disk 
> coming back and catching syncing again?
> 
> 

Hi Dmitry,

since HAST is somehow not so different from DRBD, why dont take their
way of Error Handling as "Template". DRBD works pretty well and rock
solid since years, a well established Solution. HAST got the potencial
to become this also, with some improvements.

Just my 2 Cents :)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?543168D0.2000705>