From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  2 16:26:06 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D5E14106564A;
	Sat,  2 Oct 2010 16:26:06 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 3872D8FC17;
	Sat,  2 Oct 2010 16:26:05 +0000 (UTC)
Received: by fxm9 with SMTP id 9so3354844fxm.13
	for <multiple recipients>; Sat, 02 Oct 2010 09:26:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:from:to:cc:subject:references
	:x-comment-to:date:in-reply-to:message-id:user-agent:mime-version
	:content-type; bh=xcK9desYQpRHqcDeASxNdUusoVhCZM3HTGqO7r6gcLY=;
	b=q2MR3fEFZeIwGGqzSgMLaObtJkxNMGVrf4r8pv9vxQz17f7sLxS9NYwCq1mWHrxuEK
	0TrzyBVJMpD1M6jR9pu7AeY7YNIarFElYCoLaNsCrc+/pGfiRuFa0mOEd+7Z+mShXo1t
	uKmC26r3OO/KaI0zd5Fqv7kVYoXo1iW4cX0Ew=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:cc:subject:references:x-comment-to:date:in-reply-to
	:message-id:user-agent:mime-version:content-type;
	b=fUwvsbBB0ZEzqYWOIgbeZdCavU6uQd/chjWIXabeFgvwl/LSN9mFfKEL/8HiKvJ6OP
	ePWYHTNreUCYBXpetIruK5j8yH7R61gUQIUDDus0eNci4vQ+OB+y9uiy6vU8+cvf8Zuw
	Z2z/4pmbz/ZAqxEtrfSbYZR0wm6L4MYNzQ+bA=
Received: by 10.223.103.84 with SMTP id j20mr6907188fao.35.1286036765051;
	Sat, 02 Oct 2010 09:26:05 -0700 (PDT)
Received: from localhost ([95.69.162.97])
	by mx.google.com with ESMTPS id h12sm1250305faa.13.2010.10.02.09.26.03
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sat, 02 Oct 2010 09:26:04 -0700 (PDT)
From: Mikolaj Golub <to.my.trociny@gmail.com>
To: freebsd-fs@freebsd.org
References: <86hbh44wgl.fsf@kopusha.home.net>
X-Comment-To: Mikolaj Golub
Date: Sat, 02 Oct 2010 19:26:05 +0300
In-Reply-To: <86hbh44wgl.fsf@kopusha.home.net> (Mikolaj Golub's message of
	"Sat, 02 Oct 2010 15:20:58 +0300")
Message-ID: <86aamw4l42.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
Cc: pjd@freebsd.org
Subject: Re: hastd: assertion (res->hr_event != NULL) fails in secondary on
	split-brain
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2010 16:26:07 -0000

--=-=-=


On Sat, 02 Oct 2010 15:20:58 +0300 Mikolaj Golub wrote:

 MG> After recent changes in hastd (I think r213006: Fix descriptor leaks) if
 MG> split-brain occurs hastd will abort in child_cleanup() on assertion
 MG> (res->hr_event != NULL).
 ...
 MG> So we have double close of res->hr_event. The first time it is closed when
 MG> parent detects that worker exited in main_loop(), and the second time when a
 MG> new connection from primary comes and the parent does cleanup after previously
 MG> terminated child before starting new one.

 MG> The straightforward fix is to check res->hr_event before closing, like in the
 MG> patch below.

 MG> -- 
 MG> Mikolaj Golub

 MG> Index: sbin/hastd/control.c
 MG> ===================================================================
 MG> --- sbin/hastd/control.c        (revision 213357)
 MG> +++ sbin/hastd/control.c        (working copy)
 MG> @@ -58,8 +58,10 @@ child_cleanup(struct hast_resource *res)
 MG>  
 MG>          proto_close(res->hr_ctrl);
 MG>          res->hr_ctrl = NULL;
 MG> -        proto_close(res->hr_event);
 MG> -        res->hr_event = NULL;
 MG> +        if (res->hr_event != NULL) {
 MG> +                proto_close(res->hr_event);
 MG> +                res->hr_event = NULL;
 MG> +        }
 MG>          res->hr_workerpid = 0;
 MG>  }
 MG>  

Running with this fix another issue is observed. On split-brain `hastctl
status' on secondary will return "[ERROR] Error 32 received from hastd" most
of the times. And only for some runs an output will be returned.

lolek# hastctl status storage
[ERROR] Error 32 received from hastd.
lolek# hastctl status storage
[ERROR] Error 32 received from hastd.
lolek# hastctl status storage
storage:
  role: secondary
  provname: storage
  localpath: /dev/ad4
  extentsize: 2097152
  keepdirty: 0
  remoteaddr: tcp4://bolek
  replication: memsync
  status: complete
  dirty: 0 bytes
lolek# hastctl status storage
[ERROR] Error 32 received from hastd.

This is because hastd clears res->hr_workerpid only when a new connection from
the primary comes. Whilst hastd checks res->hr_workerpid in control_status()
and if it is not zero it tries to get info from the worker and returns error
(broken pipe) if the worker is actually not running.

So it looks like it is better not just to close res->hr_ctrl in main_loop()
but to do full child cleanup here -- straight away its exit is detected.

What do you think about the attached patch?

-- 
Mikolaj Golub


--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment; filename=hast.child_kill.patch

Index: sbin/hastd/hastd.c
===================================================================
--- sbin/hastd/hastd.c	(revision 213357)
+++ sbin/hastd/hastd.c	(working copy)
@@ -94,22 +94,6 @@ g_gate_load(void)
 }
 
 static void
-child_exit_log(unsigned int pid, int status)
-{
-
-	if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
-		pjdlog_debug(1, "Worker process exited gracefully (pid=%u).",
-		    pid);
-	} else if (WIFSIGNALED(status)) {
-		pjdlog_error("Worker process killed (pid=%u, signal=%d).",
-		    pid, WTERMSIG(status));
-	} else {
-		pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).",
-		    pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1);
-	}
-}
-
-static void
 child_exit(void)
 {
 	struct hast_resource *res;
@@ -388,8 +372,6 @@ listen_accept(void)
 	const unsigned char *token;
 	char laddr[256], raddr[256];
 	size_t size;
-	pid_t pid;
-	int status;
 
 	proto_local_address(cfg->hc_listenconn, laddr, sizeof(laddr));
 	pjdlog_debug(1, "Accepting connection to %s.", laddr);
@@ -504,26 +486,7 @@ listen_accept(void)
 			    "Worker process exists (pid=%u), stopping it.",
 			    (unsigned int)res->hr_workerpid);
 			/* Stop child process. */
-			if (kill(res->hr_workerpid, SIGINT) < 0) {
-				pjdlog_errno(LOG_ERR,
-				    "Unable to stop worker process (pid=%u)",
-				    (unsigned int)res->hr_workerpid);
-				/*
-				 * Other than logging the problem we
-				 * ignore it - nothing smart to do.
-				 */
-			}
-			/* Wait for it to exit. */
-			else if ((pid = waitpid(res->hr_workerpid,
-			    &status, 0)) != res->hr_workerpid) {
-				/* We can only log the problem. */
-				pjdlog_errno(LOG_ERR,
-				    "Waiting for worker process (pid=%u) failed",
-				    (unsigned int)res->hr_workerpid);
-			} else {
-				child_exit_log(res->hr_workerpid, status);
-			}
-			child_cleanup(res);
+			child_kill(res);
 		} else if (res->hr_remotein != NULL) {
 			char oaddr[256];
 
@@ -678,8 +641,8 @@ main_loop(void)
 				if (event_recv(res) == 0)
 					continue;
 				/* The worker process exited? */
-				proto_close(res->hr_event);
-				res->hr_event = NULL;
+				if (res->hr_workerpid != 0)
+					child_kill(res);
 			}
 		}
 	}
Index: sbin/hastd/control.c
===================================================================
--- sbin/hastd/control.c	(revision 213357)
+++ sbin/hastd/control.c	(working copy)
@@ -63,6 +63,51 @@ child_cleanup(struct hast_resource *res)
 	res->hr_workerpid = 0;
 }
 
+void
+child_exit_log(unsigned int pid, int status)
+{
+
+	if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
+		pjdlog_debug(1, "Worker process exited gracefully (pid=%u).",
+		    pid);
+	} else if (WIFSIGNALED(status)) {
+		pjdlog_error("Worker process killed (pid=%u, signal=%d).",
+		    pid, WTERMSIG(status));
+	} else {
+		pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).",
+		    pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1);
+	}
+}
+
+void
+child_kill(struct hast_resource *res)
+{
+	pid_t pid;
+	int status;
+
+	assert(res->hr_workerpid != 0);
+	
+	if (kill(res->hr_workerpid, SIGINT) < 0) {
+		pjdlog_errno(LOG_ERR,
+			     "Unable to stop worker process (pid=%u)",
+			     (unsigned int)res->hr_workerpid);
+		/*
+		 * Other than logging the problem we
+		 * ignore it - nothing smart to do.
+		 */
+	}
+	/* Wait for it to exit. */
+	else if ((pid = waitpid(res->hr_workerpid,
+				&status, 0)) != res->hr_workerpid) {
+		/* We can only log the problem. */
+		pjdlog_errno(LOG_ERR,
+			     "Waiting for worker process (pid=%u) failed",
+			     (unsigned int)res->hr_workerpid);
+	}
+	child_exit_log(res->hr_workerpid, status);
+	child_cleanup(res);
+}	
+
 static void
 control_set_role_common(struct hastd_config *cfg, struct nv *nvout,
     uint8_t role, struct hast_resource *res, const char *name, unsigned int no)
@@ -107,22 +152,8 @@ control_set_role_common(struct hastd_config *cfg,
 	 * If previous role was primary or secondary we have to kill process
 	 * doing that work.
 	 */
-	if (res->hr_workerpid != 0) {
-		if (kill(res->hr_workerpid, SIGTERM) < 0) {
-			pjdlog_errno(LOG_WARNING,
-			    "Unable to kill worker process %u",
-			    (unsigned int)res->hr_workerpid);
-		} else if (waitpid(res->hr_workerpid, NULL, 0) !=
-		    res->hr_workerpid) {
-			pjdlog_errno(LOG_WARNING,
-			    "Error while waiting for worker process %u",
-			    (unsigned int)res->hr_workerpid);
-		} else {
-			pjdlog_debug(1, "Worker process %u stopped.",
-			    (unsigned int)res->hr_workerpid);
-		}
-		child_cleanup(res);
-	}
+	if (res->hr_workerpid != 0)
+		child_kill(res);
 
 	/* Start worker process if we are changing to primary. */
 	if (role == HAST_ROLE_PRIMARY)
Index: sbin/hastd/control.h
===================================================================
--- sbin/hastd/control.h	(revision 213357)
+++ sbin/hastd/control.h	(working copy)
@@ -39,6 +39,8 @@ struct hastd_config;
 struct hast_resource;
 
 void child_cleanup(struct hast_resource *res);
+void child_exit_log(unsigned int pid, int status);
+void child_kill(struct hast_resource *res);
 
 void control_set_role(struct hast_resource *res, uint8_t role);
 

--=-=-=--