From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 11 14:12:05 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7D02E106564A for ; Wed, 11 Apr 2012 14:12:05 +0000 (UTC) (envelope-from rflynn@acsalaska.net) Received: from mailhub.rachie.is-a-geek.net (rachie.is-a-geek.net [66.230.99.27]) by mx1.freebsd.org (Postfix) with ESMTP id 1123D8FC0C for ; Wed, 11 Apr 2012 14:12:05 +0000 (UTC) Received: from [127.0.0.1] (squeeze.lan.rachie.is-a-geek.net [192.168.2.30]) by mailhub.rachie.is-a-geek.net (Postfix) with ESMTP id 166FB7E844 for ; Wed, 11 Apr 2012 06:11:56 -0800 (AKDT) Message-ID: <4F859112.5070005@acsalaska.net> Date: Wed, 11 Apr 2012 16:11:30 +0200 From: Mel Flynn User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: FreeBSD Hackers Content-Type: multipart/mixed; boundary="------------050201080106010708050901" Cc: Subject: Debugging zombies: pthread_sigmask and sigwait X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 14:12:05 -0000 This is a multi-part message in MIME format. --------------050201080106010708050901 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi, I'm currently stuck on a bug in Zarafa-spooler that creates zombies. and working around it by claiming that our pthread library isn't "normal" which uses standard signals rather then a signal thread. My limited understanding of these facilities is however not enough to see the actual problem here and reading of related manpages did not lead me to a solution either. A test case reproducing the problem is attached. What happens is that SIGCHLD is never received by the signal thread and the child processes turn to zombies. Signal counters never go up, not even for SIGINFO, which I added specifically to see if anything gets through at all. The signal thread shows being stuck in sigwait. It's reproducible on 8.3-PRERELEASE of a few days ago (r233768). I'm not able to test it on anything newer unfortunately, but I suspect this is a bug/linuxism in the code not in FreeBSD. Thanks in advance for any insights. -- Mel --------------050201080106010708050901 Content-Type: text/plain; charset=windows-1252; name="BSDmakefile.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="BSDmakefile.txt" PROG=spoolerbug NO_MAN=yes DEBUG_FLAGS=-g3 WARNS=6 WITH_DEBUG=yes LDFLAGS+=-pthread .include "../mk/core.mk" .include --------------050201080106010708050901 Content-Type: text/plain; charset=windows-1252; name="spoolerbug.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="spoolerbug.c" /* * vim: ts=4 sw=4 tw=78 noet ai fdm=marker */ #include __FBSDID("$FreeBSD$"); #include #include #include #include /* signal related */ #include /* vfork */ #include /* arc4random() */ #include #include #include /* printing */ #include #define SERVER_ITERATIONS 3 /* declarations */ void *signal_handler(void *); int running_server(void); void process_signal(int); /* globals */ pthread_t signal_thread; sigset_t signal_mask; bool bQuit = false; pid_t lastPid = 0; char *szCommand; size_t n_sigs_handled = 0; size_t n_sigs_child = 0; size_t n_sigs_info = 0; void * signal_handler(void *args __unused) { int sig; while( !bQuit && sigwait(&signal_mask, &sig) == 0 ) { n_sigs_handled++; process_signal(sig); } return NULL; } int running_server(void) { u_int32_t r, max = 10; pid_t pid, me; int i = 0; me = getpid(); warnx("[master]: Send SIGINFO to %u", (unsigned)me); do { warnx("[master]: lastPid = %u, n_sigs_handled=%zu, n_sigs_child=%zu" "n_sigs_info=%zu", (unsigned)lastPid, n_sigs_handled, n_sigs_child, n_sigs_info); pid = vfork(); if( pid < 0 ) break; if( pid == 0 ) { execl(szCommand, getprogname(), "-F", NULL); _exit(EXIT_FAILURE); } else { if( bQuit ) break; warnx("[master]: Child spawned with pid %u", (unsigned)pid); r = arc4random() % max; sleep((unsigned int)r); } } while( !bQuit && i++ < SERVER_ITERATIONS ); return (0); } void process_signal(int sig) { int stat; pid_t pid; switch(sig) { case SIGTERM: case SIGINT: bQuit = true; break; case SIGCHLD: n_sigs_child++; while( (pid = waitpid(-1, &stat, WNOHANG)) > 0) { lastPid = pid; } break; case SIGINFO: n_sigs_info++; break; default: signal(sig, SIG_IGN); break; } } int main(int argc, char *argv[]) { bool bForked = false; const char *opts = "F"; int ch, hr, rc; szCommand = argv[0]; while( (ch = getopt(argc, argv, opts)) != -1 ) { if( ch == 'F' ) bForked = true; } argc -= optind; argv += optind; if( !bForked ) { sigemptyset(&signal_mask); sigaddset(&signal_mask, SIGTERM); sigaddset(&signal_mask, SIGINT); sigaddset(&signal_mask, SIGCHLD); sigaddset(&signal_mask, SIGINFO); } daemon(1, 1); if( !bForked ) { rc = pthread_sigmask(SIG_BLOCK, &signal_mask, NULL); if( rc != 0 ) err(EXIT_FAILURE, "pthread_sigmask()"); pthread_create(&signal_thread, NULL, signal_handler, NULL); hr = running_server(); warnx("[master]: Joining signal thread"); pthread_join(signal_thread, NULL); } else { printf("Child says hello\n"); sleep(1); hr = 0; } return (hr); } --------------050201080106010708050901--