From owner-freebsd-stable@FreeBSD.ORG Thu Oct 28 19:09:03 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4647610656E6; Thu, 28 Oct 2010 19:09:03 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 592B68FC18; Thu, 28 Oct 2010 19:09:02 +0000 (UTC) Received: by fxm17 with SMTP id 17so2336702fxm.13 for ; Thu, 28 Oct 2010 12:09:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=NwwnU+vG+8HYiZTqzSe3OwHgfEAVFDR6xud94IXxuJI=; b=XezIxyxQ3duH6Vz4ZpHQ97PW+rCehD88c2m5PlLjr/v2SjGsM2LlwpktbLomVlhEoT IjW7o0kXv22dEnHbF5uKTtwL5N/3LcBi12he9y6A9kBs6/nG/GMhDbxOuLEYxWOvWAh+ shXOfunVdz2558+V4UeoB65EQXOE/43K8InFo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=qj+MOIsEFExWqJ3RKplqJcruSHXp7XdeGHThZlVnct2KbTtDzq2OeBtESjMnbRbpNA Byp7aYOHKqnXV+sPVUxnZEXBghZpnrm1riUSEvI21Trav4vuH5a676Iy4wSb/4ZxfWlW kXooRg6+5FiU7Q5ICMUDizY0kdJSy1BYb/XGs= Received: by 10.223.89.136 with SMTP id e8mr4420474fam.139.1288292940677; Thu, 28 Oct 2010 12:09:00 -0700 (PDT) Received: from localhost ([95.69.174.185]) by mx.google.com with ESMTPS id b15sm662279fah.28.2010.10.28.12.08.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 28 Oct 2010 12:08:59 -0700 (PDT) From: Mikolaj Golub To: Pawel Jakub Dawidek References: <86wrp3wj67.fsf@kopusha.home.net> <20101028163036.GA2347@garage.freebsd.pl> X-Comment-To: Pawel Jakub Dawidek Date: Thu, 28 Oct 2010 22:08:54 +0300 In-Reply-To: <20101028163036.GA2347@garage.freebsd.pl> (Pawel Jakub Dawidek's message of "Thu, 28 Oct 2010 18:30:36 +0200") Message-ID: <86lj5i3zjt.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-stable@freebsd.org, Mikolaj Golub , Pete French Subject: Re: hast vs ggate+gmirror sychrnoisation speed X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2010 19:09:03 -0000 On Thu, 28 Oct 2010 18:30:36 +0200 Pawel Jakub Dawidek wrote: PJD> On Wed, Oct 27, 2010 at 10:05:20PM +0300, Mikolaj Golub wrote: >> In hast_proto_send() we send header and then data. Couldn't it be that >> remote_send and sync threads interfere and their packets are mixed? May be some >> synchronization is needed here? >> >> I set sleep(1) in hast_proto_send() between proto_send(header) and >> proto_send(data). The error started to occur frequently. PJD> Synchronization requests are sent through the remote thread just like PJD> regular I/O requests, exactly because of races that can occur. PJD> I looked at the code and the keepalive packets arbe sent from another PJD> thread. Could you try turning them off in primary.c and see if that PJD> helps? At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The errors started to observe frequently: Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to receive request header: RPC version wrong. Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1709, exitcode=75). Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to receive request header: RPC version wrong. Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1722, exitcode=75). ... Now I have been running synchronization for more then a half an hour with keepalive_send disabled and have not seen any error. -- Mikolaj Golub