Date: Fri, 10 Jun 2011 20:05:43 +0300 From: Mikolaj Golub <trociny@freebsd.org> To: Daniel Kalchev <daniel@digsys.bg> Cc: freebsd-stable@freebsd.org Subject: Re: HAST instability Message-ID: <86zklp8vmg.fsf@kopusha.home.net> In-Reply-To: <4DE90955.9020505@digsys.bg> (Daniel Kalchev's message of "Fri, 03 Jun 2011 19:18:29 %2B0300") References: <4DE21C64.8060107@digsys.bg> <4DE3ACF8.4070809@digsys.bg> <86d3j02fox.fsf@kopusha.home.net> <4DE4E43B.7030302@digsys.bg> <86zkm3t11g.fsf@in138.ua3> <4DE5048B.3080206@digsys.bg> <4DE5D535.20804@digsys.bg> <4DE8FE78.6070401@digsys.bg> <4DE90955.9020505@digsys.bg>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote: DK> Well, apparently my HAST joy was short. On a second run, I got stuck with DK> Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive DK> reply header: Operation timed out. DK> on the primary. No messages on the secondary. DK> On primary: DK> # netstat -an | grep 8457 DK> tcp4 0 0 10.2.101.11.42659 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.62058 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.34646 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.11419 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.37773 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.21911 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.40169 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 97749 10.2.101.11.44360 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.8457 *.* LISTEN DK> on secondary DK> # netstat -an | grep 8457 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.42659 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.62058 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.34646 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.11419 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.37773 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.21911 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.40169 FIN_WAIT_2 DK> tcp4 66415 0 10.2.101.12.8457 10.2.101.11.44360 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 *.* LISTEN DK> on primary DK> # hastctl status DK> data0: DK> role: primary DK> provname: data0 DK> localpath: /dev/gpt/data0 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> data1: DK> role: primary DK> provname: data1 DK> localpath: /dev/gpt/data1 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> data2: DK> role: primary DK> provname: data2 DK> localpath: /dev/gpt/data2 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 6291456 (6.0MB) DK> data3: DK> role: primary DK> provname: data3 DK> localpath: /dev/gpt/data3 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> Sits in this state for over 10 minutes. DK> Unfortunately, no KDB in kernel. Any ideas what other to look for? Could you please try this patch? http://people.freebsd.org/~trociny/hastd.no_shutdown.patch After patching you need to rebuild hastd and restart it (I expect only on secondary is enough but it is better to do this on both nodes). No server restart is needed. -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86zklp8vmg.fsf>