From owner-svn-src-all@FreeBSD.ORG Thu Apr 29 15:42:24 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F88C106566B; Thu, 29 Apr 2010 15:42:24 +0000 (UTC) (envelope-from pjd@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 7569F8FC14; Thu, 29 Apr 2010 15:42:24 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id o3TFgOLP086616; Thu, 29 Apr 2010 15:42:24 GMT (envelope-from pjd@svn.freebsd.org) Received: (from pjd@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id o3TFgOiR086614; Thu, 29 Apr 2010 15:42:24 GMT (envelope-from pjd@svn.freebsd.org) Message-Id: <201004291542.o3TFgOiR086614@svn.freebsd.org> From: Pawel Jakub Dawidek Date: Thu, 29 Apr 2010 15:42:24 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r207372 - head/sbin/hastd X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Apr 2010 15:42:24 -0000 Author: pjd Date: Thu Apr 29 15:42:24 2010 New Revision: 207372 URL: http://svn.freebsd.org/changeset/base/207372 Log: - Check if the worker process was killed by signal and restart it. - Improve logging. Pointed out by: Garrett Cooper MFC after: 3 days Modified: head/sbin/hastd/hastd.c Modified: head/sbin/hastd/hastd.c ============================================================================== --- head/sbin/hastd/hastd.c Thu Apr 29 15:36:32 2010 (r207371) +++ head/sbin/hastd/hastd.c Thu Apr 29 15:42:24 2010 (r207372) @@ -108,6 +108,22 @@ g_gate_load(void) } static void +child_exit_log(unsigned int pid, int status) +{ + + if (WIFEXITED(status) && WEXITSTATUS(status) == 0) { + pjdlog_debug(1, "Worker process exited gracefully (pid=%u).", + pid); + } else if (WIFSIGNALED(status)) { + pjdlog_error("Worker process killed (pid=%u, signal=%d).", + pid, WTERMSIG(status)); + } else { + pjdlog_error("Worker process exited ungracefully (pid=%u, exitcode=%d).", + pid, WIFEXITED(status) ? WEXITSTATUS(status) : -1); + } +} + +static void child_exit(void) { struct hast_resource *res; @@ -129,18 +145,17 @@ child_exit(void) } pjdlog_prefix_set("[%s] (%s) ", res->hr_name, role2str(res->hr_role)); - if (WEXITSTATUS(status) == 0) { - pjdlog_debug(1, - "Worker process exited gracefully (pid=%u).", - (unsigned int)pid); - } else { - pjdlog_error("Worker process failed (pid=%u, status=%d).", - (unsigned int)pid, WEXITSTATUS(status)); - } + child_exit_log(pid, status); proto_close(res->hr_ctrl); res->hr_workerpid = 0; if (res->hr_role == HAST_ROLE_PRIMARY) { - if (WEXITSTATUS(status) == EX_TEMPFAIL) { + /* + * Restart child process if it was killed by signal + * or exited because of temporary problem. + */ + if (WIFSIGNALED(status) || + (WIFEXITED(status) && + WEXITSTATUS(status) == EX_TEMPFAIL)) { sleep(1); pjdlog_info("Restarting worker process."); hastd_primary(res); @@ -300,19 +315,12 @@ listen_accept(void) /* Wait for it to exit. */ else if ((pid = waitpid(res->hr_workerpid, &status, 0)) != res->hr_workerpid) { + /* We can only log the problem. */ pjdlog_errno(LOG_ERR, "Waiting for worker process (pid=%u) failed", (unsigned int)res->hr_workerpid); - /* See above. */ - } else if (WEXITSTATUS(status) != 0) { - pjdlog_error("Worker process (pid=%u) exited ungracefully: status=%d.", - (unsigned int)res->hr_workerpid, - WEXITSTATUS(status)); - /* See above. */ } else { - pjdlog_debug(1, - "Worker process (pid=%u) exited gracefully.", - (unsigned int)res->hr_workerpid); + child_exit_log(res->hr_workerpid, status); } res->hr_workerpid = 0; } else if (res->hr_remotein != NULL) {