Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Jun 2011 20:05:43 +0300
From:      Mikolaj Golub <trociny@freebsd.org>
To:        Daniel Kalchev <daniel@digsys.bg>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: HAST instability
Message-ID:  <86zklp8vmg.fsf@kopusha.home.net>
In-Reply-To: <4DE90955.9020505@digsys.bg> (Daniel Kalchev's message of "Fri, 03 Jun 2011 19:18:29 %2B0300")
References:  <4DE21C64.8060107@digsys.bg> <4DE3ACF8.4070809@digsys.bg> <86d3j02fox.fsf@kopusha.home.net> <4DE4E43B.7030302@digsys.bg> <86zkm3t11g.fsf@in138.ua3> <4DE5048B.3080206@digsys.bg> <4DE5D535.20804@digsys.bg> <4DE8FE78.6070401@digsys.bg> <4DE90955.9020505@digsys.bg>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote:

 DK> Well, apparently my HAST joy was short. On a second run, I got stuck with

 DK> Jun  3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive
 DK> reply header: Operation timed out.

 DK> on the primary. No messages on the secondary.

 DK> On primary:

 DK> # netstat -an | grep 8457

 DK> tcp4       0      0 10.2.101.11.42659      10.2.101.12.8457       FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.11.62058      10.2.101.12.8457       CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.11.34646      10.2.101.12.8457       FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.11.11419      10.2.101.12.8457       CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.11.37773      10.2.101.12.8457       FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.11.21911      10.2.101.12.8457       FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.11.40169      10.2.101.12.8457       CLOSE_WAIT
 DK> tcp4       0  97749 10.2.101.11.44360      10.2.101.12.8457       CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.11.8457       *.*                    LISTEN

 DK> on secondary

 DK> # netstat -an | grep 8457

 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.42659      CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.62058      FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.34646      CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.11419      FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.37773      CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.21911      CLOSE_WAIT
 DK> tcp4       0      0 10.2.101.12.8457       10.2.101.11.40169      FIN_WAIT_2
 DK> tcp4   66415      0 10.2.101.12.8457       10.2.101.11.44360      FIN_WAIT_2
 DK> tcp4       0      0 10.2.101.12.8457       *.*                    LISTEN

 DK> on primary

 DK> # hastctl status
 DK> data0:
 DK>   role: primary
 DK>   provname: data0
 DK>   localpath: /dev/gpt/data0
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)
 DK> data1:
 DK>   role: primary
 DK>   provname: data1
 DK>   localpath: /dev/gpt/data1
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)
 DK> data2:
 DK>   role: primary
 DK>   provname: data2
 DK>   localpath: /dev/gpt/data2
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 6291456 (6.0MB)
 DK> data3:
 DK>   role: primary
 DK>   provname: data3
 DK>   localpath: /dev/gpt/data3
 DK>   extentsize: 2097152 (2.0MB)
 DK>   keepdirty: 64
 DK>   remoteaddr: 10.2.101.12
 DK>   sourceaddr: 10.2.101.11
 DK>   replication: fullsync
 DK>   status: complete
 DK>   dirty: 0 (0B)

 DK> Sits in this state for over 10 minutes.

 DK> Unfortunately, no KDB in kernel. Any ideas what other to look for?

Could you please try this patch?

http://people.freebsd.org/~trociny/hastd.no_shutdown.patch

After patching you need to rebuild hastd and restart it (I expect only on
secondary is enough but it is better to do this on both nodes). No server
restart is needed.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86zklp8vmg.fsf>