From owner-freebsd-fs@FreeBSD.ORG Mon Jun 14 07:28:21 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 410811065673; Mon, 14 Jun 2010 07:28:21 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 465958FC08; Mon, 14 Jun 2010 07:28:20 +0000 (UTC) Received: by bwz2 with SMTP id 2so2513059bwz.13 for ; Mon, 14 Jun 2010 00:28:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=r4lW3Dxwyfb4GSQX9fkvZYS9nqRDoBHOPZkGg4w2cCM=; b=wjzDCQ6ksyS8yYm4Vm/Hrq46hayw6hqh7T+tHGjwpQCsIGEqzvgiUMmHWK5jdaB2Qf eL70RqSanutCUKDxG6511XGzGl/npWa3u9zga5fXSOu82snA5xNCN3vNmV53Kx1ujhYv JGwUrGfHiDElZHMN+y9o2XrahgEmAWUAIR0JE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=NIA1Xh6kRERQ76kMWYJ6g+9iEM4EzmAzfCtctnWfJ99JHxaI7+Qg6jpL81OpZ5Sqkp EB6IqJMbBZ8vniX9cIRhaG5JojE6Qgphii/SrK7bVsGhLVELIVSSY8FCnkL6L7MlbFM5 J0jwOQMlXzoYJmImXkZ9wwlQPaHw3t0sF+w3s= Received: by 10.204.160.156 with SMTP id n28mr4039738bkx.1.1276500499239; Mon, 14 Jun 2010 00:28:19 -0700 (PDT) Received: from localhost ([95.69.160.52]) by mx.google.com with ESMTPS id z17sm18299426bkx.18.2010.06.14.00.28.17 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 14 Jun 2010 00:28:17 -0700 (PDT) From: Mikolaj Golub To: Pawel Jakub Dawidek References: <4C10B526.4040908@soupacific.com> <20100612104336.GA2253@garage.freebsd.pl> <4C1372E0.1000903@soupacific.com> <20100612142311.GF2253@garage.freebsd.pl> <4C139F9C.2090305@soupacific.com> <86iq5oc82y.fsf@kopusha.home.net> <4C14215D.9090304@soupacific.com> <20100613003635.GA60012@icarus.home.lan> <20100613074921.GB1320@garage.freebsd.pl> <4C149A5C.3070401@soupacific.com> <20100613102401.GE1320@garage.freebsd.pl> X-Comment-To: Pawel Jakub Dawidek Date: Mon, 14 Jun 2010 10:28:15 +0300 In-Reply-To: <20100613102401.GE1320@garage.freebsd.pl> (Pawel Jakub Dawidek's message of "Sun, 13 Jun 2010 12:24:01 +0200") Message-ID: <86eigavzsg.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.1 and HAST X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jun 2010 07:28:21 -0000 On Sun, 13 Jun 2010 12:24:01 +0200 Pawel Jakub Dawidek wrote: >> Jun 13 16:25:37 sv01A hastd: [zfshast] (primary) Header contains no 'seq' field. PJD> This is the most important bit from the primary node. PJD> The header either does not contain 'seq' field or this field is 0. It PJD> can only be 0 if you have old kernel. With recent kernel geom_gate.ko PJD> was modified to start seq at 1, so this should not happen. I am a bit confused how this seq is supposed to work. For sync thread. I have set up hast on 8-STABLE (before I used it on 9-CURRENT only) and have the same issue as hiroshi@ does. I have added pjdlog_debug(2, "remote_send: seq is %llu.", (uint64_t)ggio->gctl_seq); after nv_add_uint64(nv, (uint64_t)ggio->gctl_seq, "seq"); in primary/remote_send thread and observe the following: Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Unable to receive reply header: Resource temporarily unavailable. Jun 14 00:12:54 zhuzha kernel: Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Unable to receive reply header: Resource temporarily unavailable. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Closing old incoming connection to tcp4://192.168.120.6. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Closing old outgoing connection to tcp4://192.168.120.6. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_recv: No requests, waiting. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_guard: Checking connections. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_guard: Reconnecting to tcp4://192.168.120.6. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Unable to write synchronization data: Invalid argument. Jun 14 00:12:54 zhuzha kernel: Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Unable to write synchronization data: Invalid argument. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0080) Moving request to the free queue. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Connected to tcp4://192.168.120.6. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Successfully reconnected to tcp4://192.168.120.6. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Synchronization started. 10485760 bytes to go. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: Taking free request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Got free request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Sending sync request: READ(0, 131072). Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Moving request to the send queue. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) local_send: (0x284f0040) Got request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) local_send: Taking request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Sending sync request: WRITE(0, 131072). Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Moving request to the send queue. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) sync: (0x284f0040) Moving request to the send queues. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_send: (0x284f0040) Got request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_send: seq is 0. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_send: (0x284f0040) Moving request to the recv queue. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) remote_send: Taking request. Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Header contains no 'seq' field. Jun 14 00:12:54 zhuzha kernel: Jun 14 00:12:54 zhuzha hastd: [storage] (primary) Header contains no 'seq' field. This lasts for some time (usually more then a half of a hour) and at some moment seq is set to 1 (but one time I saw when it was set to 19) and after this sychroniztion to secondary starts to work (seq is increasing by one for the next request). Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Unable to receive reply header: Resource tempor arily unavailable. Jun 14 00:29:23 zhuzha kernel: Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Unable to receiv e reply header: Resource temporarily unavailable. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Closing old incoming connection to tcp4://192.1 68.120.6. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Closing old outgoing connection to tcp4://192.1 68.120.6. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_recv: No requests, waiting. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_guard: Checking connections. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_guard: Reconnecting to tcp4://192.168.12 0.6. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Unable to write synchronization data: Invalid a rgument. Jun 14 00:29:23 zhuzha kernel: Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Unable to write synchronization data: Invalid argument. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x28441580) Moving request to the free q ueue. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Connected to tcp4://192.168.120.6. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Successfully reconnected to tcp4://192.168.120. 6. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) Synchronization started. 10485760 bytes to go. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: Taking free request. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Got free request. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Sending sync request: READ(0 , 131072). Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Moving request to the send q ueue. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) local_send: (0x284f1640) Got request. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) local_send: Taking request. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Sending sync request: WRITE( 0, 131072). Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Moving request to the send q ueue. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Moving request to the send q ueues. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_send: (0x284f1640) Got request. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_send: seq is 1. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_send: (0x284f1640) Moving request to the recv queue. Jun 14 00:29:23 zhuzha hastd: [storage] (primary) remote_send: Taking request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) remote_recv: No requests, waiting. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f1640) Moving request to the free queue. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: Taking free request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Got free request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Sending sync request: READ(131072, 131072). Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Moving request to the send queue. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) local_send: (0x284f15c0) Got request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) local_send: Taking request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Sending sync request: WRITE(131072, 131072). Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Moving request to the send queue. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) sync: (0x284f15c0) Moving request to the send queues. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) remote_send: (0x284f15c0) Got request. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) remote_send: seq is 2. Jun 14 00:29:24 zhuzha hastd: [storage] (primary) remote_send: (0x284f15c0) Moving request to the recv queue. ... Adding seq logging on CURRENT I observe seq having the same value 2779096485 (0xa5a5a5a5): Jun 14 03:35:45 lolek hastd: [test] (primary) remote_send: (0x287f44c0) Got request. Jun 14 03:35:45 lolek hastd: [test] (primary) remote_send: seq is 2779096485. Jun 14 03:35:45 lolek hastd: [test] (primary) remote_send: (0x287f44c0) Moving request to the recv queue. So it looks like seq is just contents of unitialized memory and this works under CURRENT only because memory is initialized to 0xa5 by malloc here. This test program returns "*p is 0xa5a5a5a5" under my CURRENT and "*p is 0x0" under 8-STABLE. #include #include #include int main(int argc, char *argv[]) { unsigned long *p; p = malloc(sizeof(*p)); if (p == NULL) err(1,"malloc failed"); printf ("*p is 0x%lx\n", *p); return 0; } -- Mikolaj Golub