From owner-freebsd-stable@FreeBSD.ORG Mon Nov 25 09:46:19 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DD11FFE0; Mon, 25 Nov 2013 09:46:18 +0000 (UTC) Received: from mail-la0-x233.google.com (mail-la0-x233.google.com [IPv6:2a00:1450:4010:c03::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3BB2929E4; Mon, 25 Nov 2013 09:46:18 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id ec20so2753093lab.38 for ; Mon, 25 Nov 2013 01:46:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9WA+wNiWGrqZOloAep+Yv/LezSYJP1TBMMuJxo5PnA8=; b=L/hMHEeCLVkzWq/jPxvzoFV4hmgYOmY5eCtd2+JOXKq0G08w7lm7qJm8nas85pDRuJ xBv7cKp7KT7NjEP2pMNBeD01LNF7eTN714C015tX+fAMRXKv+v21uums3A9eJcnreGGq cXzr0+0mrqtJ9sa3eLbQDbw3llw65Ah3/nyhmljlTEWFtQaPAaVxC0EeOVsrcl1BgO0y bePLECFI/uC1MypJytn4Nc3O/gf7ismlElNTobY5c8expahzbSrBZqy9O6qAvXsR4dNu w+UFmDvakrmrWZaYgApWHWfxiXjvrTSfirlYbk9cjel8Bi6LAlmAFycDArPo/L3dABby KwGw== X-Received: by 10.152.140.193 with SMTP id ri1mr22274508lab.18.1385372475205; Mon, 25 Nov 2013 01:41:15 -0800 (PST) Received: from localhost ([188.230.122.226]) by mx.google.com with ESMTPSA id bo10sm5323033lbb.16.2013.11.25.01.41.13 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 25 Nov 2013 01:41:14 -0800 (PST) Date: Mon, 25 Nov 2013 11:41:12 +0200 From: Mikolaj Golub To: Pawel Jakub Dawidek Subject: Re: Hast locking up under 9.2 Message-ID: <20131125094111.GA22396@gmail.com> References: <20131121203711.GA3736@gmail.com> <20131123215950.GA17292@gmail.com> <20131125083223.GE1398@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131125083223.GE1398@garage.freebsd.pl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, Pete French X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Nov 2013 09:46:19 -0000 On Mon, Nov 25, 2013 at 09:32:23AM +0100, Pawel Jakub Dawidek wrote: > On Sat, Nov 23, 2013 at 11:59:51PM +0200, Mikolaj Golub wrote: > > On Fri, Nov 22, 2013 at 11:18:29AM +0000, Pete French wrote: > > > > > "Assertion failed: (!hio->hio_done), function write_complete, file > > > /usr/src/sbin/hastd/primary.c, line 1130." > > > > It looks like write_complete usage (which should be called once per > > write request) for memsync is racy. > > > > Consider the following scenario: > > > > 1) remote_recv_thread: memsync ack received, refcount -> 2; > > 2) local_send_thread: local write completed, refcount -> 1, entering > > write_complete() > > 3) remote_recv_thread: memsync fin received, refcount -> 0, move hio > > to done queue, ggate_send_thread gets the hio, checks for > > !hio->hio_done and (if loca_send_thread is still in > > write_complete()) entering write_complete() > > I don't see how is that possible. The write_complete() function is > called only when hio_countdown goes from 2 to 1 and because this is > atomic operation it can only happen in one thread. Can you elaborate on > how calling write_complete() concurrently for the same request is > possible? Yes, hio_countdown protects calling write_complete() concurently by "component" threads. But it may also be called by ggate_send_thread(): if (!hio->hio_done) write_complete(res, hio); So if write_complete() has already started executing in local_send_thread(), and at that time memsync fin is received, the request is moved to ggate_send_thread, and write_complete can be reentered if it is still in progress in local_send_thread (hio_done is set on exiting write_complete). That is why statement (3) in my patch: write_complete() in component threads is called only before releasing hio_countdown. Otherwise you are not protected from running it simultaneously by ggate_send_thread, or even hio be moved to free before write_complete is finished in local_send_thread. And so hio_countdown can't be used for detecting the current memsync state. -- Mikolaj Golub