Date: Thu, 28 Oct 2010 22:08:54 +0300 From: Mikolaj Golub <to.my.trociny@gmail.com> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: freebsd-stable@freebsd.org, Mikolaj Golub <to.my.trociny@gmail.com>, Pete French <petefrench@ticketswitch.com> Subject: Re: hast vs ggate+gmirror sychrnoisation speed Message-ID: <86lj5i3zjt.fsf@kopusha.home.net> In-Reply-To: <20101028163036.GA2347@garage.freebsd.pl> (Pawel Jakub Dawidek's message of "Thu, 28 Oct 2010 18:30:36 %2B0200") References: <E1PAlxN-000H5x-Eh@dilbert.ticketswitch.com> <86wrp3wj67.fsf@kopusha.home.net> <20101028163036.GA2347@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 28 Oct 2010 18:30:36 +0200 Pawel Jakub Dawidek wrote: PJD> On Wed, Oct 27, 2010 at 10:05:20PM +0300, Mikolaj Golub wrote: >> In hast_proto_send() we send header and then data. Couldn't it be that >> remote_send and sync threads interfere and their packets are mixed? May be some >> synchronization is needed here? >> >> I set sleep(1) in hast_proto_send() between proto_send(header) and >> proto_send(data). The error started to occur frequently. PJD> Synchronization requests are sent through the remote thread just like PJD> regular I/O requests, exactly because of races that can occur. PJD> I looked at the code and the keepalive packets arbe sent from another PJD> thread. Could you try turning them off in primary.c and see if that PJD> helps? At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The errors started to observe frequently: Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to receive request header: RPC version wrong. Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1709, exitcode=75). Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to receive request header: RPC version wrong. Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1722, exitcode=75). ... Now I have been running synchronization for more then a half an hour with keepalive_send disabled and have not seen any error. -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86lj5i3zjt.fsf>