From owner-freebsd-stable@FreeBSD.ORG Sat Oct 30 12:26:03 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 523AB106564A; Sat, 30 Oct 2010 12:26:03 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id A23E18FC12; Sat, 30 Oct 2010 12:26:02 +0000 (UTC) Received: by fxm17 with SMTP id 17so3987590fxm.13 for ; Sat, 30 Oct 2010 05:26:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :x-comment-to:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=9XvX+Uz72cRL2KM9u38CIuXYBkmBZxVuQ0gDVdbmfjQ=; b=EbCgkX95ZYigNMmmR00L1txNNshpehvxJ5GBNqf3W1UEJLdE09qAo/kRhfwx8fZFUO 2KcoQqk9NZx4IGtrXmsmACQyvcNLROUE3Dy0bvIemco5fKG31hlaWMcYQnzvBxtwW0gP delHOZ9a6EPRWeQAc2hYJLNA+yytcr+x1sMHg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=cKfqggoPjLlcS7SJEsxGhFuJBwaPvfnkMhMr7vAhCvt77e5v/Lb6UnLNw92qSBHU/W L6N3olyiFMu6BRelPG+oiCkwq1oOTYa5PPkB1mPctAIbkd3BlOQlJZirzXmWa93IDFGy tGSEXplAKtOToa/5x6apd30NbRTWSNU4/Bqgo= Received: by 10.223.102.78 with SMTP id f14mr1920469fao.66.1288441561769; Sat, 30 Oct 2010 05:26:01 -0700 (PDT) Received: from localhost ([95.69.174.185]) by mx.google.com with ESMTPS id j14sm1527460faa.47.2010.10.30.05.25.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 30 Oct 2010 05:25:59 -0700 (PDT) From: Mikolaj Golub To: Pawel Jakub Dawidek References: <86wrp3wj67.fsf@kopusha.home.net> <20101028163036.GA2347@garage.freebsd.pl> <86lj5i3zjt.fsf@kopusha.home.net> X-Comment-To: Mikolaj Golub Date: Sat, 30 Oct 2010 15:25:56 +0300 In-Reply-To: <86lj5i3zjt.fsf@kopusha.home.net> (Mikolaj Golub's message of "Thu, 28 Oct 2010 22:08:54 +0300") Message-ID: <86d3qr3m0b.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-stable@freebsd.org, Pete French Subject: Re: hast vs ggate+gmirror sychrnoisation speed X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Oct 2010 12:26:03 -0000 --=-=-= On Thu, 28 Oct 2010 22:08:54 +0300 Mikolaj Golub wrote to Pawel Jakub Dawidek: PJD>> I looked at the code and the keepalive packets arbe sent from another PJD>> thread. Could you try turning them off in primary.c and see if that PJD>> helps? MG> At first I set RETRY_SLEEP to 1 sec to have more keepalive packets. The errors MG> started to observe frequently: MG> Oct 28 21:35:53 bolek hastd[1709]: [storage] (secondary) Unable to receive request header: RPC version wrong. MG> Oct 28 21:35:54 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1709, exitcode=75). MG> Oct 28 21:36:12 bolek hastd[1722]: [storage] (secondary) Unable to receive request header: RPC version wrong. MG> Oct 28 21:36:12 bolek hastd[1632]: [storage] (secondary) Worker process exited ungracefully (pid=1722, exitcode=75). MG> ... MG> Now I have been running synchronization for more then a half an hour with MG> keepalive_send disabled and have not seen any error. So :-) What do you think about sending keepalive in remote_send_thread() to avoid this problem and sending them only when a connection is idle (it looks like there is no much use to send them all the time)? Something like in the patch below (it works for me). -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=hastd.keepalive.patch Index: sbin/hastd/primary.c =================================================================== --- sbin/hastd/primary.c (revision 214550) +++ sbin/hastd/primary.c (working copy) @@ -190,6 +190,19 @@ static pthread_mutex_t metadata_lock; hio_next[(ncomp)]); \ mtx_unlock(&hio_##name##_list_lock[(ncomp)]); \ } while (0) +#define QUEUE_TRY1(hio, name, ncomp) do { \ + mtx_lock(&hio_##name##_list_lock[(ncomp)]); \ + (hio) = TAILQ_FIRST(&hio_##name##_list[(ncomp)]); \ + if (hio == NULL) { \ + cv_timedwait(&hio_##name##_list_cond[(ncomp)], \ + &hio_##name##_list_lock[(ncomp)], RETRY_SLEEP); \ + hio = TAILQ_FIRST(&hio_##name##_list[(ncomp)]); \ + } \ + if (hio != NULL) \ + TAILQ_REMOVE(&hio_##name##_list[(ncomp)], hio, \ + hio_next[(ncomp)]); \ + mtx_unlock(&hio_##name##_list_lock[(ncomp)]); \ +} while (0) #define QUEUE_TAKE2(hio, name) do { \ mtx_lock(&hio_##name##_list_lock); \ while (((hio) = TAILQ_FIRST(&hio_##name##_list)) == NULL) { \ @@ -1176,6 +1189,38 @@ local_send_thread(void *arg) return (NULL); } +static void +keepalive_send(struct hast_resource *res, unsigned int ncomp) +{ + struct nv *nv; + + if (!ISCONNECTED(res, ncomp)) + return; + + assert(res->hr_remotein != NULL); + assert(res->hr_remoteout != NULL); + + nv = nv_alloc(); + nv_add_uint8(nv, HIO_KEEPALIVE, "cmd"); + if (nv_error(nv) != 0) { + nv_free(nv); + pjdlog_debug(1, + "keepalive_send: Unable to prepare header to send."); + return; + } + if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) { + pjdlog_common(LOG_DEBUG, 1, errno, + "keepalive_send: Unable to send request"); + nv_free(nv); + rw_unlock(&hio_remote_lock[ncomp]); + remote_close(res, ncomp); + rw_rlock(&hio_remote_lock[ncomp]); + return; + } + nv_free(nv); + pjdlog_debug(2, "keepalive_send: Request sent."); +} + /* * Thread sends request to secondary node. */ @@ -1184,6 +1229,7 @@ remote_send_thread(void *arg) { struct hast_resource *res = arg; struct g_gate_ctl_io *ggio; + time_t lastcheck, now; struct hio *hio; struct nv *nv; unsigned int ncomp; @@ -1194,10 +1240,19 @@ remote_send_thread(void *arg) /* Remote component is 1 for now. */ ncomp = 1; + lastcheck = time(NULL); for (;;) { pjdlog_debug(2, "remote_send: Taking request."); - QUEUE_TAKE1(hio, send, ncomp); + QUEUE_TRY1(hio, send, ncomp); + if (hio == NULL) { + now = time(NULL); + if (lastcheck + RETRY_SLEEP <= now) { + keepalive_send(res, ncomp); + lastcheck = now; + } + continue; + } pjdlog_debug(2, "remote_send: (%p) Got request.", hio); ggio = &hio->hio_ggio; switch (ggio->gctl_cmd) { @@ -1883,32 +1938,6 @@ failed: } static void -keepalive_send(struct hast_resource *res, unsigned int ncomp) -{ - struct nv *nv; - - nv = nv_alloc(); - nv_add_uint8(nv, HIO_KEEPALIVE, "cmd"); - if (nv_error(nv) != 0) { - nv_free(nv); - pjdlog_debug(1, - "keepalive_send: Unable to prepare header to send."); - return; - } - if (hast_proto_send(res, res->hr_remoteout, nv, NULL, 0) < 0) { - pjdlog_common(LOG_DEBUG, 1, errno, - "keepalive_send: Unable to send request"); - nv_free(nv); - rw_unlock(&hio_remote_lock[ncomp]); - remote_close(res, ncomp); - rw_rlock(&hio_remote_lock[ncomp]); - return; - } - nv_free(nv); - pjdlog_debug(2, "keepalive_send: Request sent."); -} - -static void guard_one(struct hast_resource *res, unsigned int ncomp) { struct proto_conn *in, *out; @@ -1926,12 +1955,6 @@ guard_one(struct hast_resource *res, unsigned int if (ISCONNECTED(res, ncomp)) { assert(res->hr_remotein != NULL); assert(res->hr_remoteout != NULL); - keepalive_send(res, ncomp); - } - - if (ISCONNECTED(res, ncomp)) { - assert(res->hr_remotein != NULL); - assert(res->hr_remoteout != NULL); rw_unlock(&hio_remote_lock[ncomp]); pjdlog_debug(2, "remote_guard: Connection to %s is ok.", res->hr_remoteaddr); --=-=-=--