Date: Sat, 16 Jan 2016 20:13:06 +0200 From: Mykola Golub <trociny@FreeBSD.org> To: Shahin Hasanov <shahinhasanov@hotmail.com> Cc: FREEBSD_QUESTION <freebsd-questions@freebsd.org> Subject: Re: the switching time hastd from secondary to primary Message-ID: <20160116181305.GA2165@gmail.com> In-Reply-To: <DUB127-W36479628640DB40F39E12BB6CC0@phx.gbl> References: <DUB127-W2563827245EC96990575DDB6CC0@phx.gbl> <CAA2O=b84TtRyjYgFL9v1e36nERE4QFQoePx9LLFi10bC-cXHSA@mail.gmail.com> <DUB127-W36479628640DB40F39E12BB6CC0@phx.gbl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 14, 2016 at 02:23:46PM +0400, Shahin Hasanov wrote: > In /usr/local/sbin/ucarp_up.sh(below shown extract of it) script > ucarp waiting while it became primary. It tooks about 20 sec as > written > http://www.freebsd.org/cgi/man.cgi?query=hast.conf&apropos=0&sektion=0&manpath=FreeBSD+10.2-RELEASE&arch=default&format=html > . > for i in `jot 30`; do > pgrep -f "hastd: ${resource} \(secondary\)" >/dev/null 2>&1 || break > sleep 1 > done > if pgrep -f "hastd: ${resource} \(secondary\)" >/dev/null 2>&1; then > logger -p local0.error -t hast "Secondary process for resource ${resource} is still running after 30 seconds." > exit 1 > fi Looking at the logs would be nice. But I guess you are hitting here timeout in the thread waiting for incoming data from primary. This timeout is 2 * HAST_KEEPALIVE, and HAST_KEEPALIVE is hardcoded to 10 sec. So right now it can be changed only by recompiling hastd. On the other hand, hitting this timeout means that the connection was not closed properly, so it is not a case, I would expected for "planned" failovering, when the role is changed using `hastctl role` commands. This looks like rather a case of disaster recovery after networking partitioning, host crash, hang, etc.. In my opinion waiting for 20 sec is not bad comparing with possibility to have split-brain if the former primary is still alive. If you observe 20 sec timeout when doing "planned" failovering, I guess there is something wrong with the scripts that do switching. -- Mykola Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160116181305.GA2165>