From owner-freebsd-stable@FreeBSD.ORG Fri Jun 10 17:05:49 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BFE3106564A for ; Fri, 10 Jun 2011 17:05:49 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id A7B038FC08 for ; Fri, 10 Jun 2011 17:05:48 +0000 (UTC) Received: by fxm11 with SMTP id 11so2577774fxm.13 for ; Fri, 10 Jun 2011 10:05:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=Ng8F86eZbW/m5zb9AdCNW5VupsjquGnDMFR4sc5B39g=; b=gVg9dqQxoCt8zWNuYg1dIy/31r2g2gT8o4a+pTQfyXWnLr3uxr50Fsj96JKUU2gGrR ljfyjSMMfthUIEDz9WpMqW8YFLv6EkzOk76MzfiK4IK7WL35P2AbJiYORnW232kjOmf1 9m+BB0HDwVz+0MBxdeVlTJNtgdYJbXwkCcB6I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=Lp55jgEV1ulM5b+Rt1YlcTXlInwZnvhknwaLAmYW89O3+4H95Rxb1x6FuOnUR2sruD BHZtO/6ZlH73kCZEbBACxGGxbyi85LwKevCWKmHft5QWDrezPIiVJrUtd5oVXz2az5R4 zbhM3dqqEEEAKaIjWUpLufglVj7K4ier4tKpQ= Received: by 10.223.64.66 with SMTP id d2mr313415fai.116.1307725547495; Fri, 10 Jun 2011 10:05:47 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id q21sm1135484fan.16.2011.06.10.10.05.44 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 10 Jun 2011 10:05:45 -0700 (PDT) From: Mikolaj Golub To: Daniel Kalchev References: <4DE21C64.8060107@digsys.bg> <4DE3ACF8.4070809@digsys.bg> <86d3j02fox.fsf@kopusha.home.net> <4DE4E43B.7030302@digsys.bg> <86zkm3t11g.fsf@in138.ua3> <4DE5048B.3080206@digsys.bg> <4DE5D535.20804@digsys.bg> <4DE8FE78.6070401@digsys.bg> <4DE90955.9020505@digsys.bg> X-Comment-To: Daniel Kalchev Sender: Mikolaj Golub Date: Fri, 10 Jun 2011 20:05:43 +0300 In-Reply-To: <4DE90955.9020505@digsys.bg> (Daniel Kalchev's message of "Fri, 03 Jun 2011 19:18:29 +0300") Message-ID: <86zklp8vmg.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@freebsd.org Subject: Re: HAST instability X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2011 17:05:49 -0000 On Fri, 03 Jun 2011 19:18:29 +0300 Daniel Kalchev wrote: DK> Well, apparently my HAST joy was short. On a second run, I got stuck with DK> Jun 3 19:08:16 b1a hastd[1900]: [data2] (primary) Unable to receive DK> reply header: Operation timed out. DK> on the primary. No messages on the secondary. DK> On primary: DK> # netstat -an | grep 8457 DK> tcp4 0 0 10.2.101.11.42659 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.62058 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.34646 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.11419 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.37773 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.21911 10.2.101.12.8457 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.11.40169 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 97749 10.2.101.11.44360 10.2.101.12.8457 CLOSE_WAIT DK> tcp4 0 0 10.2.101.11.8457 *.* LISTEN DK> on secondary DK> # netstat -an | grep 8457 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.42659 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.62058 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.34646 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.11419 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.37773 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.21911 CLOSE_WAIT DK> tcp4 0 0 10.2.101.12.8457 10.2.101.11.40169 FIN_WAIT_2 DK> tcp4 66415 0 10.2.101.12.8457 10.2.101.11.44360 FIN_WAIT_2 DK> tcp4 0 0 10.2.101.12.8457 *.* LISTEN DK> on primary DK> # hastctl status DK> data0: DK> role: primary DK> provname: data0 DK> localpath: /dev/gpt/data0 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> data1: DK> role: primary DK> provname: data1 DK> localpath: /dev/gpt/data1 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> data2: DK> role: primary DK> provname: data2 DK> localpath: /dev/gpt/data2 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 6291456 (6.0MB) DK> data3: DK> role: primary DK> provname: data3 DK> localpath: /dev/gpt/data3 DK> extentsize: 2097152 (2.0MB) DK> keepdirty: 64 DK> remoteaddr: 10.2.101.12 DK> sourceaddr: 10.2.101.11 DK> replication: fullsync DK> status: complete DK> dirty: 0 (0B) DK> Sits in this state for over 10 minutes. DK> Unfortunately, no KDB in kernel. Any ideas what other to look for? Could you please try this patch? http://people.freebsd.org/~trociny/hastd.no_shutdown.patch After patching you need to rebuild hastd and restart it (I expect only on secondary is enough but it is better to do this on both nodes). No server restart is needed. -- Mikolaj Golub