Date: Sat, 23 Nov 2013 23:59:51 +0200 From: Mikolaj Golub <trociny@FreeBSD.org> To: Pete French <petefrench@ingresso.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: Hast locking up under 9.2 Message-ID: <20131123215950.GA17292@gmail.com> In-Reply-To: <E1Vjokn-000OuU-1Y@dilbert.ingresso.co.uk> References: <20131121203711.GA3736@gmail.com> <E1Vjokn-000OuU-1Y@dilbert.ingresso.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Nov 22, 2013 at 11:18:29AM +0000, Pete French wrote: > "Assertion failed: (!hio->hio_done), function write_complete, file > /usr/src/sbin/hastd/primary.c, line 1130." It looks like write_complete usage (which should be called once per write request) for memsync is racy. Consider the following scenario: 1) remote_recv_thread: memsync ack received, refcount -> 2; 2) local_send_thread: local write completed, refcount -> 1, entering write_complete() 3) remote_recv_thread: memsync fin received, refcount -> 0, move hio to done queue, ggate_send_thread gets the hio, checks for !hio->hio_done and (if loca_send_thread is still in write_complete()) entering write_complete() Also, using write_complete in local_send after releasing refcount is not safe: at that time the request may be completed remotely, moved to the free queue and reused. I think using hio_countdown for detecting the current state of memsync request is confusing. Also I have already reported about the case when the detection did not work after the secondary disconnect: http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_secondary_disconnect.3.patch So I propose: 1) Use hio_countdown only for counting components we waiting to complete, i.e. initially it is always 2 for any replication mode. 2) To distinguish between "memsync ack" or "memsync fin" responses from the secondary, add and use hio_memsyncacked field. 3) Call write_complete() in component threads _before_ releasing hio_countdown (i.e. before the hio may be returned to the done queue). 4) Add and use hio_writecount refcounter to detect when write_complete() should be called in memsync case. 5) As hio_done is used only for async, rename it to hio_asyncdone and check/modify outside of more generic write_complete(), only when it is needed. Now, write_complete(): - for fullsync is called by ggate_send_thread; - for async case -- either by local component thread or by ggate_send_thread; - for memsync case -- by one of the component threads. Here is the patch that implements this, which survived preliminary testing: http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_write_complete.1.patch Also, Pawel, do you mind if add the following macros? #define ISASYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_ASYNC) #define ISFULLSYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_FULLSYNC) #define ISMEMSYNC(hio) ((hio)->hio_replication == HAST_REPLICATION_MEMSYNC) -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131123215950.GA17292>