Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 2010 22:05:20 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Pete French <petefrench@ticketswitch.com>
Cc:        Pawel Jakub Dawidek <pjd@FreeBSD.org>, freebsd-stable@freebsd.org
Subject:   Re: hast vs ggate+gmirror sychrnoisation speed
Message-ID:  <86wrp3wj67.fsf@kopusha.home.net>
In-Reply-To: <E1PAlxN-000H5x-Eh@dilbert.ticketswitch.com> (Pete French's message of "Tue, 26 Oct 2010 17:01:01 %2B0100")
References:  <E1PAlxN-000H5x-Eh@dilbert.ticketswitch.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 26 Oct 2010 17:01:01 +0100 Pete French wrote:

 PF>  Actually, I just llooked I dmesg on the secondary - it is full
 PF> of messages thus:

 PF> Oct 26 15:44:59 serpentine-passive hastd[10394]: [serp0] (secondary) Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:45:00 serpentine-passive hastd[782]: [serp0] (secondary) Worker process exited ungracefully (pid=10394, exitcode=75).
 PF> Oct 26 15:46:59 serpentine-passive hastd[10421]: [serp0] (secondary) Unable to receive request header: RPC version wrong.
 PF> Oct 26 15:47:04 serpentine-passive hastd[782]: [serp0] (secondary) Worker process exited ungracefully (pid=10421, exitcode=75).

I saw this too but only sporadic messages so I forgot and did not investigate
then this :-).

Now running synchronization I see them too (but again only sporadic). Setting
the assertion and looking at the received header:

(gdb) list
309                     goto fail;
310
311             if (hdr.version != HAST_PROTO_VERSION) {
312                     assert(0);
313                     errno = ERPCMISMATCH;
314                     goto fail;
315             }
316
317             hdr.size = le32toh(hdr.size);
318
(gdb) p/x hdr
$2 = {version = 0x9, size = 0x65657266}

So it looks like garbage.

In hast_proto_send() we send header and then data. Couldn't it be that
remote_send and sync threads interfere and their packets are mixed? May be some
synchronization is needed here?

I set sleep(1) in hast_proto_send() between proto_send(header) and
proto_send(data). The error started to occur frequently.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86wrp3wj67.fsf>