From owner-freebsd-stable@FreeBSD.ORG Tue May 31 14:09:05 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 722F01065672 for ; Tue, 31 May 2011 14:09:05 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 007468FC17 for ; Tue, 31 May 2011 14:09:04 +0000 (UTC) Received: by fxm11 with SMTP id 11so4352854fxm.13 for ; Tue, 31 May 2011 07:09:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:organization:references :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=SiiteZIpiqWyMfHASvHC0qP1gyZDuCWdmeh/1CyV8oo=; b=VPgYj8yOnYhQBPYrAbv3WzesVo4jwJfY9Nr2P6EMw/sCNljielBtEe0G34zxslyIJd kX1xsXq225vrA3Y54/eThAEhfUCG2/RVQmDH7Yk+WPjKeleUi3dDxpYwC1Uj9W11W/zl 0Lk16A+p3AFkXihV/icaVQ6X74q86nGMHunog= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:organization:references:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=JxZeNvIPJ2iAhRIsZF+zZZ826JKF/LVw3dNICz8jGnbSBLncM7BkAxMKFwlKixDGon cbIs6voZoqHa+HlPKNwlNvWKwzE8hkOsJQ9WAcoMvweogzjtnV2kqwUJJcA1SH80vd5E plnZ0jj6DFRfZ3Bh0E6DD6finPRjrTmZLQejI= Received: by 10.223.16.136 with SMTP id o8mr1172534faa.21.1306850943773; Tue, 31 May 2011 07:09:03 -0700 (PDT) Received: from localhost ([94.27.39.186]) by mx.google.com with ESMTPS id q14sm53544faa.3.2011.05.31.07.09.01 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 31 May 2011 07:09:02 -0700 (PDT) From: Mikolaj Golub To: Daniel Kalchev Organization: TOA Ukraine References: <4DE21C64.8060107@digsys.bg> <4DE3ACF8.4070809@digsys.bg> <86d3j02fox.fsf@kopusha.home.net> <4DE4E43B.7030302@digsys.bg> Sender: Mikolaj Golub Date: Tue, 31 May 2011 17:08:59 +0300 In-Reply-To: <4DE4E43B.7030302@digsys.bg> (Daniel Kalchev's message of "Tue, 31 May 2011 15:51:07 +0300") Message-ID: <86zkm3t11g.fsf@in138.ua3> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@freebsd.org Subject: Re: HAST instability X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2011 14:09:05 -0000 On Tue, 31 May 2011 15:51:07 +0300 Daniel Kalchev wrote: DK> On 30.05.11 21:42, Mikolaj Golub wrote: >> DK> One strange thing is that there is never established TCP connection >> DK> between both nodes: >> >> DK> tcp4 0 0 10.2.101.11.48939 10.2.101.12.8457 FIN_WAIT_2 >> DK> tcp4 0 1288 10.2.101.11.57008 10.2.101.12.8457 CLOSE_WAIT >> DK> tcp4 0 0 10.2.101.11.46346 10.2.101.12.8457 FIN_WAIT_2 >> DK> tcp4 0 90648 10.2.101.11.13916 10.2.101.12.8457 CLOSE_WAIT >> DK> tcp4 0 0 10.2.101.11.8457 *.* LISTEN >> >> It is normal. hastd uses the connections only in one direction so it calls >> shutdown to close unused directions. DK> So the TCP connections are all too short-lived that I can never see a DK> single one in ESTABLISHED state? 10Gbit Ethernet is indeed fast, so DK> this might well be possible... No the connections are persistent, just only one (unused) direction of communication is closed. See shutdown(2) for further info. >> I would like to look at full logs for some rather large period, with several >> cases, from both primary and secondary (and be sure about synchronized time). DK> I have made sure clocks are synchronized and am currently running on a freshly rebooted nodes (with two additional SATA drives at each node) -- DK> so far some interesting findings, like I get hash errors and DK> disconnects much more frequent now. Will post when an bonnie++ run on DK> the ZFS filesystem on top of the HAST resources finishes. As I wrote privately, it would be nice to see both netstat and hast logs (from both nodes) for the same rather long period, when several cases occured. It would be good to place them somewere on web so other guys could access them too, as I will be offline for 7-10 days and will not be able to help you until I am back. DK> One additional note: while playing with this setup, I tried to DK> simulate local disk going away in the hope HAST will switch to using DK> the remote disk. Instead of asking someone at the site to pull out the DK> drive, I just issued on the primary DK> hastctl role init data0 DK> which resulted in kernel panic. Unfortunately, there was no sufficient DK> dump space for 48GB. I will re-run this again with more drives for the DK> crash dump. Anything you want me to look for in particular? (kernels DK> have no KDB compiled in yet) Well, removing physical disk (device /dev/gpt/data0 consumed by hastd dissapears) and switching a resource to init role (devive /dev/hast/data0 consumed by FS dissapears) are two different things. Sure you should not normally change the resource role (destroy hast device) before unmounting (exporting) FS. -- Mikolaj Golub