From owner-freebsd-stable@FreeBSD.ORG Sat Nov 23 21:59:56 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BC29CA2F; Sat, 23 Nov 2013 21:59:56 +0000 (UTC) Received: from mail-lb0-x22f.google.com (mail-lb0-x22f.google.com [IPv6:2a00:1450:4010:c04::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1CA1D2D4C; Sat, 23 Nov 2013 21:59:55 +0000 (UTC) Received: by mail-lb0-f175.google.com with SMTP id x18so2057955lbi.20 for ; Sat, 23 Nov 2013 13:59:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=rgufqjt2s91FUOAiEyJn9nL0S2tbXq3ijUTGyj7uHhY=; b=VskTke7BUBixscQmkcMpgAvO1KERoQ6asn0MkLt2G7ZfEwpcqDspDdMR6PxuZL1g2y JxUE0z0xg3b2h3DMJkmnYjfKvOfHkbsMtBTdVYUCXIzDS0CaEqBiHz94M2wg9gNE9odZ c86Eeffxm/UGM62ifFMbKALsO9LV/C7+BCMc6+98znBYLQl/GMFsKIg4Ns4iUciBo2zs OeagccK8oHkdhDabq9lV4odRTotnSvMy4YssY4rAFnGXEq9Q9QZshE0rEXCd9I7OTrve Ta4UUbCzCM7LIOR0NPYyP3SIv+ulT1zSJktamoP1cEskA/yHZgxCDCxx95UHogBeq0ND qNEQ== X-Received: by 10.152.116.7 with SMTP id js7mr15790181lab.11.1385243994164; Sat, 23 Nov 2013 13:59:54 -0800 (PST) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id iy7sm13204926lbc.4.2013.11.23.13.59.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 23 Nov 2013 13:59:53 -0800 (PST) Sender: Mikolaj Golub Date: Sat, 23 Nov 2013 23:59:51 +0200 From: Mikolaj Golub To: Pete French Subject: Re: Hast locking up under 9.2 Message-ID: <20131123215950.GA17292@gmail.com> References: <20131121203711.GA3736@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22 (2013-10-16) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Nov 2013 21:59:56 -0000 On Fri, Nov 22, 2013 at 11:18:29AM +0000, Pete French wrote: > "Assertion failed: (!hio->hio_done), function write_complete, file > /usr/src/sbin/hastd/primary.c, line 1130." It looks like write_complete usage (which should be called once per write request) for memsync is racy. Consider the following scenario: 1) remote_recv_thread: memsync ack received, refcount -> 2; 2) local_send_thread: local write completed, refcount -> 1, entering write_complete() 3) remote_recv_thread: memsync fin received, refcount -> 0, move hio to done queue, ggate_send_thread gets the hio, checks for !hio->hio_done and (if loca_send_thread is still in write_complete()) entering write_complete() Also, using write_complete in local_send after releasing refcount is not safe: at that time the request may be completed remotely, moved to the free queue and reused. I think using hio_countdown for detecting the current state of memsync request is confusing. Also I have already reported about the case when the detection did not work after the secondary disconnect: http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_secondary_disconnect.3.patch So I propose: 1) Use hio_countdown only for counting components we waiting to complete, i.e. initially it is always 2 for any replication mode. 2) To distinguish between "memsync ack" or "memsync fin" responses from the secondary, add and use hio_memsyncacked field. 3) Call write_complete() in component threads _before_ releasing hio_countdown (i.e. before the hio may be returned to the done queue). 4) Add and use hio_writecount refcounter to detect when write_complete() should be called in memsync case. 5) As hio_done is used only for async, rename it to hio_asyncdone and check/modify outside of more generic write_complete(), only when it is needed. Now, write_complete(): - for fullsync is called by ggate_send_thread; - for async case -- either by local component thread or by ggate_send_thread; - for memsync case -- by one of the component threads. Here is the patch that implements this, which survived preliminary testing: http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_write_complete.1.patch Also, Pawel, do you mind if add the following macros? #define ISASYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_ASYNC) #define ISFULLSYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_FULLSYNC) #define ISMEMSYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_MEMSYNC) -- Mikolaj Golub