From owner-freebsd-threads@FreeBSD.ORG Mon Oct 18 11:02:11 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 217A616A4EC for ; Mon, 18 Oct 2004 11:02:11 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECA7643D39 for ; Mon, 18 Oct 2004 11:02:10 +0000 (GMT) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i9IB2A22048213 for ; Mon, 18 Oct 2004 11:02:10 GMT (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9IB2APx048207 for freebsd-threads@freebsd.org; Mon, 18 Oct 2004 11:02:10 GMT (envelope-from owner-bugmaster@freebsd.org) Date: Mon, 18 Oct 2004 11:02:10 GMT Message-Id: <200410181102.i9IB2APx048207@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Subject: Current problem reports assigned to you X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Oct 2004 11:02:11 -0000 Current FreeBSD problem reports Critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2004/04/22] threads/65883threads libkse's sigwait does not work after fork 1 problem total. Serious problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/07/18] kern/20016 threads pthreads: Cannot set scheduling timer/Can o [2000/08/26] kern/20861 threads libc_r does not honor socket timeouts o [2001/01/20] threads/24472threads libc_r does not honor SO_SNDTIMEO/SO_RCVT o [2001/01/25] threads/24632threads libc_r delicate deviation from libc in ha o [2001/01/25] kern/24641 threads pthread_rwlock_rdlock can deadlock o [2001/11/26] bin/32295 threads pthread dont dequeue signals o [2002/02/01] threads/34536threads accept() blocks other threads o [2002/05/25] kern/38549 threads the procces compiled whith pthread stoppe o [2002/06/27] threads/39922threads [PATCH?] Threaded applications executed w o [2002/08/04] kern/41331 threads Pthread library open sets O_NONBLOCK flag o [2003/03/02] threads/48856threads Setting SIGCHLD to SIG_IGN still leaves z o [2003/03/10] threads/49087threads Signals lost in programs linked with libc o [2003/05/08] threads/51949threads thread in accept cannot be cancelled s [2004/03/15] kern/64313 threads FreeBSD (OpenBSD) pthread implicit set/un o [2004/08/26] threads/70975threads unexpected and unreliable behaviour when o [2004/09/14] threads/71725threads Mysql Crashes frequently giving Sock Erro o [2004/10/05] threads/72353threads Assertion fails in /usr/src/lib/libpthrea o [2004/10/07] threads/72429threads threads blocked in stdio (fgets, etc) are 18 problems total. Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- o [2000/05/26] kern/18824 threads gethostbyname is not thread safe o [2000/06/13] kern/19247 threads uthread_sigaction.c does not do anything o [2000/10/21] kern/22190 threads A threaded read(2) from a socketpair(2) f o [2001/09/09] threads/30464threads pthread mutex attributes -- pshared o [2002/05/02] threads/37676threads libc_r: msgsnd(), msgrcv(), pread(), pwri s [2002/07/16] threads/40671threads pthread_cancel doesn't remove thread from o [2004/07/13] threads/69020threads pthreads library leaks _gc_mutex o [2004/09/21] threads/71966threads Mlnet Core Dumped : Fatal error '_pq_inse 8 problems total. From owner-freebsd-threads@FreeBSD.ORG Wed Oct 20 21:17:25 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 893BC16A4CE for ; Wed, 20 Oct 2004 21:17:25 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4562743D46 for ; Wed, 20 Oct 2004 21:17:25 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 28117 invoked from network); 20 Oct 2004 21:17:24 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 20 Oct 2004 21:17:23 -0000 Received: from [10.50.41.228] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9KLHIlL060395; Wed, 20 Oct 2004 17:17:18 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Daniel Eischen Date: Wed, 20 Oct 2004 17:18:23 -0400 User-Agent: KMail/1.6.2 MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200410201718.23862.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org Subject: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Oct 2004 21:17:25 -0000 We are trying to run mono on 4.x and are having problems with the process getting stuck spinning in an infinite loop. After some debugging, we determined that the problem is that the condition variable thread queues are getting corrupted due to threads being added to a queue while they are already queued on another queue. For example, if a thread is somehow on c1's queue but runs and blocks on c2, later when c1 tries to do a broadcast, it tries to remove all the waiters to wake them up doing something like: while ((head = TAILQ_FIRST(&c1->c_queue)) != NULL) { } The problem is that since the thread was last added to c2's queue, his tqe_prev pointer in his sqe TAILQ_ENTRY points to an item on c2's list, and thus the c_queue.tqe_next pointer doesn't get updated by TAILQ_REMOVE, so the thread just "sticks" on c1's head pointer and it spins forever. We seemed to have tracked this down to some sort of bug related to signals and condition variables. It seems that we try to go handle a signal while we are on a condition variable queue, but not in PS_COND_WAIT, so _cond_wait_backout() is not called to remove the thread from the queue. I tried deferring signals around the cond queue manipulations in cond_wait() and cond_timedwait() but we are still seeing the problem. The patches we currently are using (including debug cruft) are below. Right now we see the assertion in _thread_sig_wrapper() firing, but if I remove that, one of the assertions in the condition variable code that check for threads not being on the right condition variable queue trigger instead. Does anyone have any other ideas of how a thread could catch a signal while PS_RUNNING and on a condition variable queue? (I'm also worried that the wait() functions assume that if the thread is interrupted, its always not on the queue, but that doesn't seem to be the case for pthread_cancel() for example.) Index: Makefile =================================================================== RCS file: /usr/cvs/src/lib/libc_r/Makefile,v retrieving revision 1.24.2.7 diff -u -r1.24.2.7 Makefile --- Makefile 22 Oct 2002 14:44:02 -0000 1.24.2.7 +++ Makefile 14 Oct 2004 23:33:42 -0000 @@ -14,7 +14,7 @@ # Uncomment this if you want libc_r to contain debug information for # thread locking. -CFLAGS+=-D_LOCK_DEBUG +CFLAGS+=-D_LOCK_DEBUG -ggdb # enable extra internal consistancy checks CFLAGS+=-D_PTHREADS_INVARIANTS Index: uthread/pthread_private.h =================================================================== RCS file: /usr/cvs/src/lib/libc_r/uthread/pthread_private.h,v retrieving revision 1.36.2.21 diff -u -r1.36.2.21 pthread_private.h --- uthread/pthread_private.h 22 Oct 2002 14:44:02 -0000 1.36.2.21 +++ uthread/pthread_private.h 14 Oct 2004 22:24:00 -0000 @@ -744,6 +744,7 @@ */ TAILQ_ENTRY(pthread) pqe; /* priority queue link */ TAILQ_ENTRY(pthread) sqe; /* synchronization queue link */ + TAILQ_ENTRY(pthread) cqe; /* condition variable queue link */ TAILQ_ENTRY(pthread) qe; /* all other queues link */ /* Wait data. */ Index: uthread/uthread_cond.c =================================================================== RCS file: /usr/cvs/src/lib/libc_r/uthread/uthread_cond.c,v retrieving revision 1.22.2.8 diff -u -r1.22.2.8 uthread_cond.c --- uthread/uthread_cond.c 22 Oct 2002 14:44:02 -0000 1.22.2.8 +++ uthread/uthread_cond.c 15 Oct 2004 21:55:48 -0000 @@ -37,6 +37,8 @@ #include #include "pthread_private.h" +#define DEFER + /* * Prototypes */ @@ -195,6 +197,14 @@ * signal handler. */ do { +#ifdef DEFER + /* + * Defer signals to protect the scheduling queues + * from access by the signal handler: + */ + _thread_kern_sig_defer(); +#endif + /* Lock the condition variable structure: */ _SPINLOCK(&(*cond)->lock); @@ -270,6 +280,7 @@ * after handling a signal. */ if (interrupted != 0) { + PTHREAD_ASSERT_NOT_IN_SYNCQ(curthread); /* * Lock the mutex and ignore any * errors. Note that even @@ -314,6 +325,13 @@ if ((interrupted != 0) && (curthread->continuation != NULL)) curthread->continuation((void *) curthread); +#ifdef DEFER + /* + * Undefer and handle pending signals, yielding if + * necessary: + */ + _thread_kern_sig_undefer(); +#endif } while ((done == 0) && (rval == 0)); _thread_leave_cancellation_point(); @@ -354,6 +372,13 @@ * signal handler. */ do { +#ifdef DEFER + /* + * Defer signals to protect the scheduling queues + * from access by the signal handler: + */ + _thread_kern_sig_defer(); +#endif /* Lock the condition variable structure: */ _SPINLOCK(&(*cond)->lock); @@ -431,6 +456,7 @@ * after handling a signal. */ if (interrupted != 0) { + PTHREAD_ASSERT_NOT_IN_SYNCQ(curthread); /* * Lock the mutex and ignore any * errors. Note that even @@ -484,6 +510,13 @@ if ((interrupted != 0) && (curthread->continuation != NULL)) curthread->continuation((void *) curthread); +#ifdef DEFER + /* + * Undefer and handle pending signals, yielding if + * necessary: + */ + _thread_kern_sig_undefer(); +#endif } while ((done == 0) && (rval == 0)); _thread_leave_cancellation_point(); @@ -671,8 +704,12 @@ pthread_t pthread; while ((pthread = TAILQ_FIRST(&cond->c_queue)) != NULL) { - TAILQ_REMOVE(&cond->c_queue, pthread, sqe); + PTHREAD_ASSERT(pthread->data.cond == cond, "cond_queue_deq: mismatched condition variables"); + PTHREAD_ASSERT(pthread->cqe.tqe_prev == &TAILQ_FIRST(&cond->c_queue), "cond_queue_deq: elem doesn't match"); + PTHREAD_ASSERT(pthread->flags & PTHREAD_FLAGS_IN_CONDQ, "cond_queue_deq: condq flag not set"); + TAILQ_REMOVE(&cond->c_queue, pthread, cqe); pthread->flags &= ~PTHREAD_FLAGS_IN_CONDQ; + pthread->data.cond = NULL; if ((pthread->timeout == 0) && (pthread->interrupted == 0)) /* * Only exit the loop when we find a thread @@ -693,6 +730,9 @@ static inline void cond_queue_remove(pthread_cond_t cond, pthread_t pthread) { + pthread_t foo; + int found; + /* * Because pthread_cond_timedwait() can timeout as well * as be signaled by another thread, it is necessary to @@ -700,8 +740,28 @@ * it isn't in the queue. */ if (pthread->flags & PTHREAD_FLAGS_IN_CONDQ) { - TAILQ_REMOVE(&cond->c_queue, pthread, sqe); + PTHREAD_ASSERT(pthread->data.cond == cond, + "cond_queue_remove: mismatched condition variables"); + found = 0; + TAILQ_FOREACH(foo, &cond->c_queue, cqe) + if (foo == pthread) + found++; + PTHREAD_ASSERT(found != 0, "thread not on queue"); + PTHREAD_ASSERT(found <= 1, "thread on queue more than once"); + + if (TAILQ_FIRST(&cond->c_queue) == pthread) + PTHREAD_ASSERT(pthread->cqe.tqe_prev == + &TAILQ_FIRST(&cond->c_queue), + "cond_queue_remove: elem doesn't match"); + else + TAILQ_FOREACH(foo, &cond->c_queue, cqe) + if (TAILQ_NEXT(foo, cqe) == pthread) + PTHREAD_ASSERT(pthread->cqe.tqe_prev == + &foo->cqe.tqe_next, + "cond_queue_remove: elem doesn't match"); + TAILQ_REMOVE(&cond->c_queue, pthread, cqe); pthread->flags &= ~PTHREAD_FLAGS_IN_CONDQ; + pthread->data.cond = NULL; } } @@ -713,21 +773,25 @@ cond_queue_enq(pthread_cond_t cond, pthread_t pthread) { pthread_t tid = TAILQ_LAST(&cond->c_queue, cond_head); + pthread_t foo; PTHREAD_ASSERT_NOT_IN_SYNCQ(pthread); + TAILQ_FOREACH(foo, &cond->c_queue, cqe) + PTHREAD_ASSERT(pthread != foo, "thread already on queue"); + /* * For the common case of all threads having equal priority, * we perform a quick check against the priority of the thread * at the tail of the queue. */ if ((tid == NULL) || (pthread->active_priority <= tid->active_priority)) - TAILQ_INSERT_TAIL(&cond->c_queue, pthread, sqe); + TAILQ_INSERT_TAIL(&cond->c_queue, pthread, cqe); else { tid = TAILQ_FIRST(&cond->c_queue); while (pthread->active_priority <= tid->active_priority) - tid = TAILQ_NEXT(tid, sqe); - TAILQ_INSERT_BEFORE(tid, pthread, sqe); + tid = TAILQ_NEXT(tid, cqe); + TAILQ_INSERT_BEFORE(tid, pthread, cqe); } pthread->flags |= PTHREAD_FLAGS_IN_CONDQ; pthread->data.cond = cond; Index: uthread/uthread_mutex.c =================================================================== RCS file: /usr/cvs/src/lib/libc_r/uthread/uthread_mutex.c,v retrieving revision 1.20.2.8 diff -u -r1.20.2.8 uthread_mutex.c --- uthread/uthread_mutex.c 22 Oct 2002 14:44:03 -0000 1.20.2.8 +++ uthread/uthread_mutex.c 20 Oct 2004 20:18:41 -0000 @@ -59,6 +59,8 @@ #define _MUTEX_ASSERT_NOT_OWNED(m) #endif +#define DEFER + /* * Prototypes */ @@ -748,6 +750,11 @@ struct pthread *curthread = _get_curthread(); int ret = 0; +#ifdef DEFER + if (add_reference) + PTHREAD_ASSERT(curthread->sig_defer_count == 1, + "lost defer count start"); +#endif if (mutex == NULL || *mutex == NULL) { ret = EINVAL; } else { @@ -755,6 +762,9 @@ * Defer signals to protect the scheduling queues from * access by the signal handler: */ +#ifdef DEFER + if (!add_reference) +#endif _thread_kern_sig_defer(); /* Lock the mutex structure: */ @@ -1064,8 +1074,16 @@ * Undefer and handle pending signals, yielding if * necessary: */ +#ifdef DEFER + if (!add_reference) +#endif _thread_kern_sig_undefer(); } +#ifdef DEFER + if (add_reference) + PTHREAD_ASSERT(curthread->sig_defer_count == 1, + "lost defer count finish"); +#endif /* Return the completion status: */ return (ret); Index: uthread/uthread_sig.c =================================================================== RCS file: /usr/cvs/src/lib/libc_r/uthread/uthread_sig.c,v retrieving revision 1.25.2.13 diff -u -r1.25.2.13 uthread_sig.c --- uthread/uthread_sig.c 22 Oct 2002 14:44:03 -0000 1.25.2.13 +++ uthread/uthread_sig.c 15 Oct 2004 22:29:18 -0000 @@ -1007,6 +1007,10 @@ break; } } +#if 1 + PTHREAD_ASSERT((thread->flags & PTHREAD_FLAGS_IN_CONDQ) == 0, + "still on cond queue"); +#endif /* Unblock the signal in case we don't return from the handler: */ _thread_sigq[psf->signo - 1].blocked = 0; -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Wed Oct 20 21:39:41 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B00116A4CE; Wed, 20 Oct 2004 21:39:41 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id EAC2643D31; Wed, 20 Oct 2004 21:39:40 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9KLderT013576; Wed, 20 Oct 2004 17:39:40 -0400 (EDT) Date: Wed, 20 Oct 2004 17:39:40 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200410201718.23862.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Oct 2004 21:39:41 -0000 On Wed, 20 Oct 2004, John Baldwin wrote: > We are trying to run mono on 4.x and are having problems with the process > getting stuck spinning in an infinite loop. After some debugging, we > determined that the problem is that the condition variable thread queues are > getting corrupted due to threads being added to a queue while they are > already queued on another queue. For example, if a thread is somehow on c1's > queue but runs and blocks on c2, later when c1 tries to do a broadcast, it > tries to remove all the waiters to wake them up doing something like: > > while ((head = TAILQ_FIRST(&c1->c_queue)) != NULL) { > } > > The problem is that since the thread was last added to c2's queue, his > tqe_prev pointer in his sqe TAILQ_ENTRY points to an item on c2's list, and > thus the c_queue.tqe_next pointer doesn't get updated by TAILQ_REMOVE, so the > thread just "sticks" on c1's head pointer and it spins forever. > > We seemed to have tracked this down to some sort of bug related to signals and > condition variables. It seems that we try to go handle a signal while we are > on a condition variable queue, but not in PS_COND_WAIT, so > _cond_wait_backout() is not called to remove the thread from the queue. I > tried deferring signals around the cond queue manipulations in cond_wait() > and cond_timedwait() but we are still seeing the problem. The patches we > currently are using (including debug cruft) are below. Right now we see the > assertion in _thread_sig_wrapper() firing, but if I remove that, one of the > assertions in the condition variable code that check for threads not being on > the right condition variable queue trigger instead. Does anyone have any > other ideas of how a thread could catch a signal while PS_RUNNING and on a > condition variable queue? (I'm also worried that the wait() functions assume > that if the thread is interrupted, its always not on the queue, but that > doesn't seem to be the case for pthread_cancel() for example.) I'm not sure what's going on, but I do know that you can't call pthread_cond_wait() from a signal handler. If a thread is blocked on (taking your example) condition variable c1, then a signal interrupts it and it again blocks on condition variable c2, that behavior is undefined (by POSIX). Another thing to watch out for is longjmps out of signal handlers after being interrupted while waiting on a condition variable. I think libc_r should handle this, but there could be a bug lurking in that respect. I'll take a look at libc_r and see if I can spot anything obvious. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Wed Oct 20 23:29:39 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 903D216A4CE; Wed, 20 Oct 2004 23:29:39 +0000 (GMT) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i9KNTdkf048455; Wed, 20 Oct 2004 19:29:39 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.1/8.13.1/Submit) id i9KNTcen048454; Wed, 20 Oct 2004 19:29:38 -0400 (EDT) (envelope-from green) Date: Wed, 20 Oct 2004 19:29:38 -0400 From: Brian Fundakowski Feldman To: threads@FreeBSD.org, standards@FreeBSD.org Message-ID: <20041020232938.GJ1072@green.homeunix.org> References: <200410202322.i9KNMuE3092472@repoman.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200410202322.i9KNMuE3092472@repoman.freebsd.org> User-Agent: Mutt/1.5.6i Subject: Re: cvs commit: ports/java/jdk14 Makefile ports/java/jdk14/files patch-vm::os_bsd.hpp X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Oct 2004 23:29:40 -0000 On Wed, Oct 20, 2004 at 11:22:56PM +0000, Brian Feldman wrote: > green 2004-10-20 23:22:56 UTC > > FreeBSD ports repository > > Modified files: > java/jdk14 Makefile > Added files: > java/jdk14/files patch-vm::os_bsd.hpp > Log: > The BSD patchset for the Sun JDK modeled its thread behavior mostly after > existing the Solaris base, and similarly to what happened with NSPR, made > a bad assumption on undefined behavior. This broke locking in various > places in Java, for example, causing the the debugging support to be > totally broken. It is worth someone who knows the Java codebase taking > a look to see what other things could have been broken by this on > FreeBSD 5.x+. > > The assumption is that pthread_mutex_trylock(3) on a default-type > mutex will fail with EBUSY. This assumption is wrong for our > libpthread, which returns EDEADLK if the owner thread is trying to > acquire the mutex again with trylock. The behavior of performing a > locking operation on a self-locked default-type mutex is explicitly > undefined for pthread_mutex_lock(3). > > The POSIX specification is still not very clear. It defines > pthread_mutex_trylock(3) in terms of pthread_mutex_lock(3) yet > does not say what the defined behavior should be for a self-locked > pthread_mutex_trylock(3) for any of the various mutex types, so it is > ambiguous whether the result is clearly undefined or clearly to return > EBUSY. > > It is a one line change whether or not to make libpthread return > EDEADLK in this case, where it seems that most implementations do not. > > Reference: http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html > > Revision Changes Path > 1.81 +1 -1 ports/java/jdk14/Makefile > 1.1 +13 -0 ports/java/jdk14/files/patch-vm::os_bsd.hpp (new) It would be reeeeeeally nice if we could decide whether to change this behavior before 5.3-RELEASE, since it's a single-liner and is continually biting people. My opinion is that POSIX "wants" you to return EBUSY if you're going to do any error checking at all in pthread_mutex_trylock(3). -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 07:00:48 2004 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E08A216A4CF for ; Thu, 21 Oct 2004 07:00:48 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0709143D2D for ; Thu, 21 Oct 2004 07:00:45 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i9L70iDq004727 for ; Thu, 21 Oct 2004 07:00:44 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9L70i4x004725; Thu, 21 Oct 2004 07:00:44 GMT (envelope-from gnats) Resent-Date: Thu, 21 Oct 2004 07:00:44 GMT Resent-Message-Id: <200410210700.i9L70i4x004725@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-threads@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Mark Andrews Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BE85416A4CE for ; Thu, 21 Oct 2004 06:54:13 +0000 (GMT) Received: from daemon.lab.isc.org (daemon.lab.isc.org [204.152.187.133]) by mx1.FreeBSD.org (Postfix) with ESMTP id 78CA543D58 for ; Thu, 21 Oct 2004 06:54:13 +0000 (GMT) (envelope-from marka@daemon.lab.isc.org) Received: from daemon.lab.isc.org (localhost [127.0.0.1]) by daemon.lab.isc.org (8.12.10/8.12.10) with ESMTP id i9L6sDT9029638 for ; Thu, 21 Oct 2004 06:54:13 GMT (envelope-from marka@daemon.lab.isc.org) Received: (from marka@localhost) by daemon.lab.isc.org (8.12.10/8.12.10/Submit) id i9L6sCjk029637; Thu, 21 Oct 2004 06:54:12 GMT (envelope-from marka) Message-Id: <200410210654.i9L6sCjk029637@daemon.lab.isc.org> Date: Thu, 21 Oct 2004 06:54:12 GMT From: Mark Andrews To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Subject: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Mark Andrews List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 07:00:49 -0000 >Number: 72953 >Category: threads >Synopsis: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Oct 21 07:00:44 GMT 2004 >Closed-Date: >Last-Modified: >Originator: Mark Andrews >Release: FreeBSD 5.3-BETA6 i386 >Organization: ISC >Environment: System: FreeBSD daemon.lab.isc.org 5.3-BETA6 FreeBSD 5.3-BETA6 #16: Tue Oct 5 23:37:23 UTC 2004 jinmei@daemon.lab.isc.org:/hog0/users/jinmei/src/sys/i386/compile/DAEMON i386 >Description: fork() clears blocked signals unless PTHREAD_SCOPE_SYSTEM is set. >How-To-Repeat: Run the following program with and without the arguement system. In the first case the program will incorrectly be killed by the SIGTERM. In the second case the signal will be blocked and sigwait returns. #include #include #include #include #include #include #include sigset_t signal_mask; void dofork() { switch (fork()) { case 0: break; case -1: _exit(1); default: _exit(0); } fprintf(stderr, "fork\n"); fflush(stderr); } static void * waiter(void *arg) { int result; int sig; result = sigwait(&signal_mask, &sig); if (result != 0) fprintf(stderr, "sigwait: %\n", strerror(result)); else fprintf(stderr, "signal %d\n", sig); fflush(stderr); return (NULL); } int main(int argc, char **argv) { pthread_t id; pthread_attr_t attr; int result; int scope = 0; if (argc > 1) { if (strcmp(argv[1], "system") == 0) scope = 1; } sigemptyset (&signal_mask); sigaddset (&signal_mask, SIGTERM); result = pthread_sigmask(SIG_BLOCK, &signal_mask, NULL); if (result != 0) fprintf(stderr, "pthread_sigmask: %\n", strerror(result)); else fprintf(stderr, "pthread_sigmask: OK\n"); fflush(stderr); dofork(); result = pthread_attr_init(&attr); if (result != 0) fprintf(stderr, "pthread_attr_init: %\n", strerror(result)); else fprintf(stderr, "pthread_attr_init: OK\n"); fflush(stderr); if (scope) { result = pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); if (result != 0) fprintf(stderr, "pthread_attr_setscope: %\n", strerror(result)); else fprintf(stderr, "pthread_attr_setscope: OK\n"); } else fprintf(stderr, "default scope\n"); fflush(stderr); result = pthread_create (&id, &attr, waiter, NULL); if (result != 0) fprintf(stderr, "pthread_create: %\n", strerror(result)); else fprintf(stderr, "pthread_create: OK\n"); fflush(stderr); if (kill(getpid(), SIGTERM) == -1) perror("kill"); else fprintf(stderr, "kill: OK\n"); fflush(stderr); result = pthread_join(id, NULL); if (result != 0) fprintf(stderr, "pthread_join: %\n", strerror(result)); else fprintf(stderr, "pthread_join: OK\n"); fflush(stderr); return(0); } >Fix: >Release-Note: >Audit-Trail: >Unformatted: From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 07:10:27 2004 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D0A416A4CE for ; Thu, 21 Oct 2004 07:10:27 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5810F43D45 for ; Thu, 21 Oct 2004 07:10:27 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i9L7ARSI009135 for ; Thu, 21 Oct 2004 07:10:27 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9L7ARGU009134; Thu, 21 Oct 2004 07:10:27 GMT (envelope-from gnats) Date: Thu, 21 Oct 2004 07:10:27 GMT Message-Id: <200410210710.i9L7ARGU009134@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Mark Andrews Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Mark Andrews List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 07:10:27 -0000 The following reply was made to PR threads/72953; it has been noted by GNATS. From: Mark Andrews To: freebsd-gnats-submit@FreeBSD.org, marka@isc.org Cc: Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM Date: Thu, 21 Oct 2004 17:07:59 +1000 see also threads/65883 From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 12:30:23 2004 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D90416A4CE for ; Thu, 21 Oct 2004 12:30:23 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6CCE543D1F for ; Thu, 21 Oct 2004 12:30:23 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i9LCUNJe043485 for ; Thu, 21 Oct 2004 12:30:23 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9LCUNEI043484; Thu, 21 Oct 2004 12:30:23 GMT (envelope-from gnats) Date: Thu, 21 Oct 2004 12:30:23 GMT Message-Id: <200410211230.i9LCUNEI043484@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: David Xu Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: David Xu List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 12:30:23 -0000 The following reply was made to PR threads/72953; it has been noted by GNATS. From: David Xu To: Mark Andrews Cc: FreeBSD-gnats-submit@freebsd.org Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM Date: Thu, 21 Oct 2004 20:33:10 +0800 It is not supported to do any thread related operations in child process forked from a threaded process, this becauses thread library may be in corrupted state, the safe operation in child process is calling execve() immediately after fork(). Mark Andrews wrote: >>Number: 72953 >>Category: threads >>Synopsis: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM >>Confidential: no >>Severity: serious >>Priority: medium >>Responsible: freebsd-threads >>State: open >>Quarter: >>Keywords: >>Date-Required: >>Class: sw-bug >>Submitter-Id: current-users >>Arrival-Date: Thu Oct 21 07:00:44 GMT 2004 >>Closed-Date: >>Last-Modified: >>Originator: Mark Andrews >>Release: FreeBSD 5.3-BETA6 i386 >>Organization: >> >> From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 19:24:01 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 55ECB16A4CF; Thu, 21 Oct 2004 19:24:01 +0000 (GMT) Received: from corbulon.video-collage.com (aldan.algebra.com [216.254.65.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id E2D4343D46; Thu, 21 Oct 2004 19:24:00 +0000 (GMT) (envelope-from mi+mx@aldan.algebra.com) Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net [168.100.195.11])i9LJNwEq029214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Oct 2004 15:23:59 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) Received: from localhost (mteterin@localhost [127.0.0.1]) i9LJNrgw057511; Thu, 21 Oct 2004 15:23:53 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) From: Mikhail Teterin Organization: Virtual Estates, Inc. To: current@FreeBSD.org Date: Thu, 21 Oct 2004 15:23:52 -0400 User-Agent: KMail/1.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200410211523.52663@misha-mx.virtual-estates.net> X-Virus-Scanned: clamd / ClamAV version devel-20040615, clamav-milter version 0.73a on corbulon.video-collage.com X-Virus-Status: Clean X-Scanned-By: MIMEDefang 2.43 cc: threads@FreeBSD.org cc: kde@FreeBSD.org Subject: unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 19:24:01 -0000 Hello! This happened twice already -- first with KMail and now with Kontact. A process crashes as usual (KDE's 3.3.0 release was of unusually low quality), and seems to go away, except it does not. It stays in the `STOP' (according to top(1)) or in the `T' (as per ps(1)) state and can not be killed -- neither with -CONT, nor with -KILL. There is also a zombie-child of it: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 1042 1096 1 57 8 0 68044 50524 - T ?? 0:27,88 kontact 1042 4903 1096 0 -84 0 0 0 - Z ?? 0:03,07 This is all, probably, due to something in KDE's attempts to capture crashes and collect backtraces for better bug reports. But whatever bugs they may have there, having an unkillable process -- of any kind -- worries me greatly. Is this a known issue, or is a PR warranted? I'm running a fresh -current with libpthread.so by default, using a P4 with HTT enabled -- two "logical processes", one of which is occupied by setiathome -- my way of testing out a new motherboard. Thanks! -mi From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 20:27:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 26BC316A4CF for ; Thu, 21 Oct 2004 20:27:23 +0000 (GMT) Received: from mail.gmx.net (imap.gmx.net [213.165.64.20]) by mx1.FreeBSD.org (Postfix) with SMTP id 4D5EF43D4C for ; Thu, 21 Oct 2004 20:27:22 +0000 (GMT) (envelope-from michaelnottebrock@gmx.net) Received: (qmail 22901 invoked by uid 65534); 21 Oct 2004 20:27:19 -0000 Received: from pD9E2466C.dip.t-dialin.net (EHLO lofi.dyndns.org) (217.226.70.108) by mail.gmx.net (mp010) with SMTP; 21 Oct 2004 22:27:19 +0200 X-Authenticated: #443188 Received: from kiste.my.domain (lofi@kiste.my.domain [192.168.8.4]) (authenticated bits=0) by lofi.dyndns.org (8.12.10/8.12.10) with ESMTP id i9LKR4Pk005253 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Thu, 21 Oct 2004 22:27:08 +0200 (CEST) (envelope-from michaelnottebrock@gmx.net) From: Michael Nottebrock To: kde-freebsd@freebsd.kde.org Date: Thu, 21 Oct 2004 22:26:50 +0200 User-Agent: KMail/1.7 References: <200410211523.52663@misha-mx.virtual-estates.net> In-Reply-To: <200410211523.52663@misha-mx.virtual-estates.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart183147978.YZqN6tsars"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200410212227.02663.michaelnottebrock@gmx.net> X-Virus-Scanned: by amavisd-new cc: threads@freebsd.org cc: Mikhail Teterin cc: kde@freebsd.org cc: current@freebsd.org Subject: Re: [kde-freebsd] unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 20:27:23 -0000 --nextPart183147978.YZqN6tsars Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Thursday, 21. October 2004 21:23, Mikhail Teterin wrote: > Hello! > > This happened twice already -- first with KMail and now with Kontact. > A process crashes as usual (KDE's 3.3.0 release was of unusually low > quality), and seems to go away, except it does not. It stays in the > `STOP' (according to top(1)) or in the `T' (as per ps(1)) state and > can not be killed -- neither with -CONT, nor with -KILL. [...] > This is all, probably, due to something in KDE's attempts to capture > crashes and collect backtraces for better bug reports. But whatever bugs > they may have there, having an unkillable process -- of any kind -- worri= es > me greatly. Is this a known issue, or is a PR warranted? There have been no similar reports (to my knowledge) and I haven't seen=20 anything similar on either 4.x or 5.x (I don't run 6-CURRENT). =2D-=20 ,_, | Michael Nottebrock | lofi@freebsd.org (/^ ^\) | FreeBSD - The Power to Serve | http://www.freebsd.org \u/ | K Desktop Environment on FreeBSD | http://freebsd.kde.org --nextPart183147978.YZqN6tsars Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.9.11 (FreeBSD) iD8DBQBBeBuWXhc68WspdLARAoGTAJ40IURpMr3D9pXwGgbXSCFlQJ0O8ACgnNPy jacA440V6NWbJiJTuBSO1Zc= =39yt -----END PGP SIGNATURE----- --nextPart183147978.YZqN6tsars-- From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 20:32:17 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 03F1E16A4CE; Thu, 21 Oct 2004 20:32:17 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 796A043D41; Thu, 21 Oct 2004 20:32:16 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i9LKVt0f032028; Thu, 21 Oct 2004 16:31:55 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i9LKVt3c032025; Thu, 21 Oct 2004 16:31:55 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 21 Oct 2004 16:31:55 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Michael Nottebrock In-Reply-To: <200410212227.02663.michaelnottebrock@gmx.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: Mikhail Teterin cc: kde@freebsd.org cc: current@freebsd.org cc: kde-freebsd@freebsd.kde.org Subject: Re: [kde-freebsd] unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 20:32:17 -0000 On Thu, 21 Oct 2004, Michael Nottebrock wrote: > On Thursday, 21. October 2004 21:23, Mikhail Teterin wrote: > > Hello! > > > > This happened twice already -- first with KMail and now with Kontact. > > A process crashes as usual (KDE's 3.3.0 release was of unusually low > > quality), and seems to go away, except it does not. It stays in the > > `STOP' (according to top(1)) or in the `T' (as per ps(1)) state and > > can not be killed -- neither with -CONT, nor with -KILL. > > [...] > > > This is all, probably, due to something in KDE's attempts to capture > > crashes and collect backtraces for better bug reports. But whatever bugs > > they may have there, having an unkillable process -- of any kind -- worries > > me greatly. Is this a known issue, or is a PR warranted? > > There have been no similar reports (to my knowledge) and I haven't seen > anything similar on either 4.x or 5.x (I don't run 6-CURRENT). Actually, I recall seeing a similar problem about 14 months ago on 5-CURRENT. I believe that when a program crashed, its SIGSEGV handler would fork and attach gdb to its parent in order to generate a stack trace. I didn't have the opportunity to try and track it down, but I also don't remember seeing it in the last six months. It could be because KDE programs crash less for me as opposed to that the bug leading to the wedge has been fixed. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 20:58:40 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E3A816A4CE for ; Thu, 21 Oct 2004 20:58:40 +0000 (GMT) Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id 34B2843D41 for ; Thu, 21 Oct 2004 20:58:40 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 25735 invoked from network); 21 Oct 2004 20:58:39 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 21 Oct 2004 20:58:39 -0000 Received: from [10.50.41.228] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i9LKwQnJ068720; Thu, 21 Oct 2004 16:58:35 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Daniel Eischen Date: Thu, 21 Oct 2004 12:54:22 -0400 User-Agent: KMail/1.6.2 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200410211254.22805.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: threads@FreeBSD.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 20:58:40 -0000 On Wednesday 20 October 2004 05:39 pm, Daniel Eischen wrote: > On Wed, 20 Oct 2004, John Baldwin wrote: > > We are trying to run mono on 4.x and are having problems with the process > > getting stuck spinning in an infinite loop. After some debugging, we > > determined that the problem is that the condition variable thread queues > > are getting corrupted due to threads being added to a queue while they > > are already queued on another queue. For example, if a thread is somehow > > on c1's queue but runs and blocks on c2, later when c1 tries to do a > > broadcast, it tries to remove all the waiters to wake them up doing > > something like: > > > > while ((head = TAILQ_FIRST(&c1->c_queue)) != NULL) { > > } > > > > The problem is that since the thread was last added to c2's queue, his > > tqe_prev pointer in his sqe TAILQ_ENTRY points to an item on c2's list, > > and thus the c_queue.tqe_next pointer doesn't get updated by > > TAILQ_REMOVE, so the thread just "sticks" on c1's head pointer and it > > spins forever. > > > > We seemed to have tracked this down to some sort of bug related to > > signals and condition variables. It seems that we try to go handle a > > signal while we are on a condition variable queue, but not in > > PS_COND_WAIT, so > > _cond_wait_backout() is not called to remove the thread from the queue. > > I tried deferring signals around the cond queue manipulations in > > cond_wait() and cond_timedwait() but we are still seeing the problem. > > The patches we currently are using (including debug cruft) are below. > > Right now we see the assertion in _thread_sig_wrapper() firing, but if I > > remove that, one of the assertions in the condition variable code that > > check for threads not being on the right condition variable queue trigger > > instead. Does anyone have any other ideas of how a thread could catch a > > signal while PS_RUNNING and on a condition variable queue? (I'm also > > worried that the wait() functions assume that if the thread is > > interrupted, its always not on the queue, but that doesn't seem to be the > > case for pthread_cancel() for example.) > > I'm not sure what's going on, but I do know that you can't call > pthread_cond_wait() from a signal handler. If a thread is blocked > on (taking your example) condition variable c1, then a signal > interrupts it and it again blocks on condition variable c2, that > behavior is undefined (by POSIX). The behavior seems more to be this: - thread does pthread_cond_wait*(c1) - thread enqueued on c1 - thread interrupted by a signal while on c1 but still in PS_RUNNING - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag (among others) - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's not in - this case, this is the normal case though, which is why it's ok to not save the CONDQ flag in the saved state above) - thread executes signal handler - thread restores state - pthread_condwait*() see that interrupted is 0, so don't try to remove the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ isn't set either, so we can't detect this case that way) - thread returns from pthread_cond_wait() (maybe due to timeout, etc.) - thread calls pthread_cond_wait*(c2) - thread enqueued on c2 - another thread does pthread_cond_broadcast(c2), and bewm My question is is it possible for the thread to get interrupted and chosen to run a signal while it is on c1 somehow given my patch to defer signals around the wait loops (and is that patch correct btw given the above scenario?) > Another thing to watch out for is longjmps out of signal handlers > after being interrupted while waiting on a condition variable. > I think libc_r should handle this, but there could be a bug > lurking in that respect. The thing to note is that my assertion in _thread_sig_wrapper() about being on a condition variable queue and executing a handler is that it is placed after _cond_wait_backout() could be called (but won't be for PS_RUNNING), and before the signal handler itself is called. > I'll take a look at libc_r and see if I can spot anything obvious. Ok, thanks. FWIW, it seems that on 5.3 with KSE, mono does much better, but we still see rare hangs, so it maybe that if this bug is fixed it might be present in libpthread on 5 as well. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 21:06:32 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 887BB16A4CE; Thu, 21 Oct 2004 21:06:32 +0000 (GMT) Received: from corbulon.video-collage.com (aldan.algebra.com [216.254.65.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2993A43D48; Thu, 21 Oct 2004 21:06:32 +0000 (GMT) (envelope-from mi+mx@aldan.algebra.com) Received: from 250-217.customer.cloud9.net (195-11.customer.cloud9.net [168.100.195.11])i9LL6UAq030028 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Oct 2004 17:06:31 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) Received: from localhost (mteterin@localhost [127.0.0.1]) i9LL6PJL058265; Thu, 21 Oct 2004 17:06:25 -0400 (EDT) (envelope-from mi+mx@aldan.algebra.com) From: Mikhail Teterin Organization: Virtual Estates, Inc. Date: Thu, 21 Oct 2004 17:06:24 -0400 User-Agent: KMail/1.7 To: Michael Nottebrock , Robert Watson MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-u" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200410211706.24723@misha-mx.virtual-estates.net> X-Virus-Scanned: clamd / ClamAV version devel-20040615, clamav-milter version 0.73a on corbulon.video-collage.com X-Virus-Status: Clean X-Scanned-By: MIMEDefang 2.43 cc: threads@freebsd.org cc: kde@freebsd.org Subject: Fwd: Re: kern/72979: unkillable process(es) stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 21:06:32 -0000 FYI... -mi ---------- ---------- [...] You can access the state of your problem report at any time via this link: http://www.freebsd.org/cgi/query-pr.cgi?pr=72979 ------------------------------------------------------- From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 21:15:04 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5FE5C16A4CE; Thu, 21 Oct 2004 21:15:04 +0000 (GMT) Received: from lakermmtao12.cox.net (lakermmtao12.cox.net [68.230.240.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id B377F43D1F; Thu, 21 Oct 2004 21:15:03 +0000 (GMT) (envelope-from mezz7@cox.net) Received: from mezz.mezzweb.com ([68.103.32.140]) by lakermmtao12.cox.net (InterMail vM.6.01.03.04 201-2131-111-106-20040729) with ESMTP id <20041021211502.HSRZ13338.lakermmtao12.cox.net@mezz.mezzweb.com>; Thu, 21 Oct 2004 17:15:02 -0400 Date: Thu, 21 Oct 2004 16:15:08 -0500 To: "John Baldwin" References: <200410211254.22805.jhb@FreeBSD.org> From: "Jeremy Messenger" Content-Type: text/plain; format=flowed; delsp=yes; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-ID: In-Reply-To: <200410211254.22805.jhb@FreeBSD.org> User-Agent: Opera M2/7.54 (Linux, build 751) cc: Daniel Eischen cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 21:15:04 -0000 On Thu, 21 Oct 2004 12:54:22 -0400, John Baldwin wrote: > On Wednesday 20 October 2004 05:39 pm, Daniel Eischen wrote: >> On Wed, 20 Oct 2004, John Baldwin wrote: >> > We are trying to run mono on 4.x and are having problems with the >> process >> > getting stuck spinning in an infinite loop. After some debugging, we >> > determined that the problem is that the condition variable thread >> queues >> > are getting corrupted due to threads being added to a queue while they >> > are already queued on another queue. For example, if a thread is >> somehow >> > on c1's queue but runs and blocks on c2, later when c1 tries to do a >> > broadcast, it tries to remove all the waiters to wake them up doing >> > something like: >> > >> > while ((head = TAILQ_FIRST(&c1->c_queue)) != NULL) { >> > } >> > >> > The problem is that since the thread was last added to c2's queue, his >> > tqe_prev pointer in his sqe TAILQ_ENTRY points to an item on c2's >> list, >> > and thus the c_queue.tqe_next pointer doesn't get updated by >> > TAILQ_REMOVE, so the thread just "sticks" on c1's head pointer and it >> > spins forever. >> > >> > We seemed to have tracked this down to some sort of bug related to >> > signals and condition variables. It seems that we try to go handle a >> > signal while we are on a condition variable queue, but not in >> > PS_COND_WAIT, so >> > _cond_wait_backout() is not called to remove the thread from the >> queue. >> > I tried deferring signals around the cond queue manipulations in >> > cond_wait() and cond_timedwait() but we are still seeing the problem. >> > The patches we currently are using (including debug cruft) are below. >> > Right now we see the assertion in _thread_sig_wrapper() firing, but >> if I >> > remove that, one of the assertions in the condition variable code that >> > check for threads not being on the right condition variable queue >> trigger >> > instead. Does anyone have any other ideas of how a thread could >> catch a >> > signal while PS_RUNNING and on a condition variable queue? (I'm also >> > worried that the wait() functions assume that if the thread is >> > interrupted, its always not on the queue, but that doesn't seem to be >> the >> > case for pthread_cancel() for example.) >> >> I'm not sure what's going on, but I do know that you can't call >> pthread_cond_wait() from a signal handler. If a thread is blocked >> on (taking your example) condition variable c1, then a signal >> interrupts it and it again blocks on condition variable c2, that >> behavior is undefined (by POSIX). > > The behavior seems more to be this: > > - thread does pthread_cond_wait*(c1) > - thread enqueued on c1 > - thread interrupted by a signal while on c1 but still in PS_RUNNING > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag > (among > others) > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's > not in > - this case, this is the normal case though, which is why it's ok to not > save > the CONDQ flag in the saved state above) > - thread executes signal handler > - thread restores state > - pthread_condwait*() see that interrupted is 0, so don't try to remove > the > thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ isn't > set > either, so we can't detect this case that way) > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.) > - thread calls pthread_cond_wait*(c2) > - thread enqueued on c2 > - another thread does pthread_cond_broadcast(c2), and bewm > > My question is is it possible for the thread to get interrupted and > chosen to > run a signal while it is on c1 somehow given my patch to defer signals > around > the wait loops (and is that patch correct btw given the above scenario?) > >> Another thing to watch out for is longjmps out of signal handlers >> after being interrupted while waiting on a condition variable. >> I think libc_r should handle this, but there could be a bug >> lurking in that respect. > > The thing to note is that my assertion in _thread_sig_wrapper() about > being on > a condition variable queue and executing a handler is that it is placed > after > _cond_wait_backout() could be called (but won't be for PS_RUNNING), and > before the signal handler itself is called. > >> I'll take a look at libc_r and see if I can spot anything obvious. > > Ok, thanks. FWIW, it seems that on 5.3 with KSE, mono does much better, > but > we still see rare hangs, so it maybe that if this bug is fixed it might > be > present in libpthread on 5 as well. You can check this thread if you are insteresting... It's not about libc_r, but about Mono runs on FreeBSD 5.3 and the threads get corrupt if you run 'mono -pkg:foopkg foo.cs'. http://lists.freebsd.org/pipermail/freebsd-threads/2004-October/thread.html#2540 If you know the other fixes, secrets and etc, it would be nice if you can info to the bsd-sharp project[1]. Tom is kind of take it over for now while the maintainer of lang/mono is busy or has disappeared. Mono works better in bsd-sharp's lang/mono than FreeBSD's lang/mono. [1] http://forge.novell.com/modules/xfmod/project/?bsd-sharp Cheers, Mezz -- mezz7@cox.net - mezz@FreeBSD.org FreeBSD GNOME Team http://www.FreeBSD.org/gnome/ - gnome@FreeBSD.org From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 21:21:31 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2151616A4CE; Thu, 21 Oct 2004 21:21:31 +0000 (GMT) Received: from core.zp.ua (core.zp.ua [193.108.112.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B31043D39; Thu, 21 Oct 2004 21:21:29 +0000 (GMT) (envelope-from oleg@core.zp.ua) Received: from core.zp.ua (oleg@localhost [127.0.0.1]) by core.zp.ua (8.13.1/8.13.1) with ESMTP id i9LLL5Hl001271; Fri, 22 Oct 2004 00:21:05 +0300 (EEST) (envelope-from oleg@core.zp.ua) Received: (from oleg@localhost) by core.zp.ua (8.13.1/8.13.1/Submit) id i9LLL56I001270; Fri, 22 Oct 2004 00:21:05 +0300 (EEST) (envelope-from oleg) Date: Fri, 22 Oct 2004 00:21:05 +0300 From: "Oleg V. Nauman" To: Michael Nottebrock Message-ID: <20041021212105.GX12192@core.zp.ua> Mail-Followup-To: Michael Nottebrock , kde-freebsd@freebsd.kde.org, threads@freebsd.org, Mikhail Teterin , kde@freebsd.org, current@freebsd.org References: <200410211523.52663@misha-mx.virtual-estates.net> <200410212227.02663.michaelnottebrock@gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <200410212227.02663.michaelnottebrock@gmx.net> User-Agent: Mutt/1.5.6i cc: threads@freebsd.org cc: Mikhail Teterin cc: kde@freebsd.org cc: current@freebsd.org cc: kde-freebsd@freebsd.kde.org Subject: Re: [kde-freebsd] unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 21:21:31 -0000 On Thu, Oct 21, 2004 at 10:26:50PM +0200, Michael Nottebrock wrote: > On Thursday, 21. October 2004 21:23, Mikhail Teterin wrote: > > Hello! > > > > This happened twice already -- first with KMail and now with Kontact. > > A process crashes as usual (KDE's 3.3.0 release was of unusually low > > quality), and seems to go away, except it does not. It stays in the > > `STOP' (according to top(1)) or in the `T' (as per ps(1)) state and > > can not be killed -- neither with -CONT, nor with -KILL. > > [...] > > > This is all, probably, due to something in KDE's attempts to capture > > crashes and collect backtraces for better bug reports. But whatever bugs > > they may have there, having an unkillable process -- of any kind -- worries > > me greatly. Is this a known issue, or is a PR warranted? > > There have been no similar reports (to my knowledge) and I haven't seen > anything similar on either 4.x or 5.x (I don't run 6-CURRENT). This problem looks like not specific to KDE, but for OpenOffice and clamav-milter at least. Unkillable by -9 OpenOffice (5.3-BETA7,information from local mailing list): ps -O flags,lim,lockname,mwchan,nwchan,sigcatch,sigignore,sigmask,state,wchan,xstat 2194 PID F LIM LOCK MWCHAN NWCHAN CAUGHT IGNORED BLOCKED STAT WCHAN XSTAT TT STAT TIME COMMAND 2194 8c081 - - - - 1b80eeb7 781000 fffefeff TL - 0 ?? TL 0:05,47 /usr/local/OpenO clamav-milter (doesn't respond to SIGCONT), but killable by -9 only, without any library mappings (5.3-STABLE): #ps -O flags,lim,lockname,mwchan,nwchan,sig,sigcatch,sigignore,sigmask,state,wchan,xstat 418 PID F LIM LOCK MWCHAN NWCHAN PENDING CAUGHT IGNORED BLOCKED STAT WCHAN XSTAT TT STAT TIME COMMAND 418 88180 - - - - 0 10040000 87a9001 fffefeff TLs - 0 ?? TLs 0:02.78 /usr/local/sbin/ # kill -CONT 418 #ps -O flags,lim,lockname,mwchan,nwchan,sig,sigcatch,sigignore,sigmask,state,wchan,xstat 418 PID F LIM LOCK MWCHAN NWCHAN PENDING CAUGHT IGNORED BLOCKED STAT WCHAN XSTAT TT STAT TIME COMMAND 418 288180 - - - - 40000 10040000 87a9001 fffefeff TLs - 0 ?? TLs 0:02.78 /usr/local/sbin/ > > -- > ,_, | Michael Nottebrock | lofi@freebsd.org > (/^ ^\) | FreeBSD - The Power to Serve | http://www.freebsd.org > \u/ | K Desktop Environment on FreeBSD | http://freebsd.kde.org -- NO37-RIPE From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 21:22:59 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 68C2D16A4CE; Thu, 21 Oct 2004 21:22:59 +0000 (GMT) Received: from gundel.de.clara.net (gundel.de.clara.net [212.82.225.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id EF58D43D45; Thu, 21 Oct 2004 21:22:58 +0000 (GMT) (envelope-from jesk@killall.org) Received: from port-212-202-52-250.dynamic.qsc.de ([212.202.52.250] helo=turbofresse) by gundel.de.clara.net with smtp (Exim 4.30; FreeBSD) id 1CKkYX-000GQa-7M; Thu, 21 Oct 2004 23:33:09 +0200 Message-ID: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> From: "jesk" To: Date: Thu, 21 Oct 2004 23:19:51 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 cc: threads@freebsd.org Subject: FreeBSD5.3-RC1 MySQL Performance X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 21:22:59 -0000 Hello, i found some time to make some performance tests with mysql under FreeBSD5.3-RC1. Hardware is a HP DL360 with 2x2,8GHz Xeon CPU´s, 2GB, deactivated HTT and u160/10krpm scsi drive. For reference values i took a RedHat Fedora with native threads (NPTL) on 2.6 kernel and the same hardware. for benchmarks i used super-smack with the default smack files. the MySQL backend was MyISAM. with both setups the mysql was always under high load which seemed to me for a good sign to recognize expressive values on thread execution and mysql performance without loosing to much time in i/o. the benchmark is executing 1000 sql-select queries*10 concurrent clients on a 90k row table with a random not really high cacheable where-statement on the index: ---- 15985 queries per second (pthreads without process scope threads, sched_4bsd and preemption) 6139 queries per second (pthreads with process scope threads, sched_4bsd and preemption) 10779 queries per second (linuxthreads, sched_4bsd and preemption) fedora result: 11900 queries per second ---- same test (same parameters) but with a update query first and then a select query on the same key i realized worse values for freebsd: ---- 2027.52 queries per second (pthreads without process scope threads, sched_4bsd and preemption) 1146.66 queries per second (pthreads with process scope threads, sched_4bsd and preemption) 3040.78 queries per second (linuxthreads, sched_4bsd and preemption) fedora result: 3920.21 queries per second ---- i checked if i could tune up the update query procedure with writing on a ramdisk, but this wasnt a highly profit. if i could use the mixture of linuxthreads on updates and pthreads on select queries without the use of proc scope it would be a good answer to linux, but fedora wasnt reachable in its update operation.. here the relevant used mysql values in this test: ---- query_cache_size=64000000 key_buffer_size=1024M table_cache=128 thread_cache_size=128 max_connections=1000 ---- maybe someone got some hints for improvement of this situation... regards, jesk From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 21:36:39 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3294E16A4CE for ; Thu, 21 Oct 2004 21:36:39 +0000 (GMT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id B131D43D45 for ; Thu, 21 Oct 2004 21:36:37 +0000 (GMT) (envelope-from pete@he.iki.fi) Received: from [193.64.42.134] (h86.vuokselantie10.fi [193.64.42.134]) by silver.he.iki.fi (8.13.1/8.11.4) with ESMTP id i9LLaTJl026318; Fri, 22 Oct 2004 00:36:29 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <41782BDF.8040301@he.iki.fi> Date: Fri, 22 Oct 2004 00:36:31 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: jesk References: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> In-Reply-To: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: threads@freebsd.org Subject: Re: FreeBSD5.3-RC1 MySQL Performance X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 21:36:39 -0000 jesk wrote: >the benchmark is executing 1000 sql-select queries*10 concurrent clients on >a 90k row table with a random not really high cacheable where-statement on >the index: >---- >15985 queries per second >(pthreads without process scope threads, sched_4bsd and preemption) >6139 queries per second >(pthreads with process scope threads, sched_4bsd and preemption) >10779 queries per second >(linuxthreads, sched_4bsd and preemption) >fedora result: >11900 queries per second >---- > > >maybe someone got some hints for improvement of this situation... > > Do you have any idea why process scope threads are faster than system scope threads? My gut feeling is that it should be exactly opposite. Pete From owner-freebsd-threads@FreeBSD.ORG Thu Oct 21 23:04:39 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D510516A4CE; Thu, 21 Oct 2004 23:04:39 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7F8A643D46; Thu, 21 Oct 2004 23:04:39 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9LN4cpp027893; Thu, 21 Oct 2004 19:04:38 -0400 (EDT) Date: Thu, 21 Oct 2004 19:04:38 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: <200410211254.22805.jhb@FreeBSD.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Oct 2004 23:04:39 -0000 On Thu, 21 Oct 2004, John Baldwin wrote: > > The behavior seems more to be this: > > - thread does pthread_cond_wait*(c1) > - thread enqueued on c1 > - thread interrupted by a signal while on c1 but still in PS_RUNNING This shouldn't happen when signals are deferred. It should only happen when the state is PS_COND_WAIT after we've context switched to the scheduler. > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag (among > others) Right, because it assumes that the thread will be backed out of any mutex or CV queues prior to invoking the signal handler. > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's not in > - this case, this is the normal case though, which is why it's ok to not save > the CONDQ flag in the saved state above) Right. The problem is, how is the thread getting setup for a signal while signals are deferred and the state has not yet been changed from PS_RUNNING to PS_COND_WAIT? > - thread executes signal handler > - thread restores state > - pthread_condwait*() see that interrupted is 0, so don't try to remove the > thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ isn't set > either, so we can't detect this case that way) > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.) > - thread calls pthread_cond_wait*(c2) > - thread enqueued on c2 > - another thread does pthread_cond_broadcast(c2), and bewm > > My question is is it possible for the thread to get interrupted and chosen to > run a signal while it is on c1 somehow given my patch to defer signals around > the wait loops (and is that patch correct btw given the above scenario?) Yes (and yes I think). Defering signals just means that the signal handler won't try to install a signal frame on the current thread; instead it just queues the signal and the scheduler will pick it up and send it to the correct thread. I do think signals should be deferred for condition variables so that setting the thread state (to PS_COND_WAIT) is atomic. It's not obvious to be where the bug is. If you had a simple test case to reproduce it that would help. -- Dan Eischen From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 01:47:23 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E1D9A16A4CE for ; Fri, 22 Oct 2004 01:47:23 +0000 (GMT) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D93143D1D for ; Fri, 22 Oct 2004 01:47:23 +0000 (GMT) (envelope-from sam@errno.com) Received: from [66.127.85.93] ([66.127.85.93]) (authenticated bits=0) by ebb.errno.com (8.12.9/8.12.6) with ESMTP id i9M1lHWi055258 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Oct 2004 18:47:17 -0700 (PDT) (envelope-from sam@errno.com) Message-ID: <417866BF.1000200@errno.com> Date: Thu, 21 Oct 2004 18:47:43 -0700 From: Sam Leffler Organization: Errno Consulting User-Agent: Mozilla Thunderbird 0.8 (Macintosh/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Robert Watson References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: kde@freebsd.org cc: current@freebsd.org cc: Mikhail Teterin cc: kde-freebsd@freebsd.kde.org cc: threads@freebsd.org cc: Michael Nottebrock Subject: Re: [kde-freebsd] unkillable multithreaded processes stuck in `STOP' state X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 01:47:24 -0000 Robert Watson wrote: > On Thu, 21 Oct 2004, Michael Nottebrock wrote: > > >>On Thursday, 21. October 2004 21:23, Mikhail Teterin wrote: >> >>>Hello! >>> >>>This happened twice already -- first with KMail and now with Kontact. >>>A process crashes as usual (KDE's 3.3.0 release was of unusually low >>>quality), and seems to go away, except it does not. It stays in the >>>`STOP' (according to top(1)) or in the `T' (as per ps(1)) state and >>>can not be killed -- neither with -CONT, nor with -KILL. >> >>[...] >> >> >>>This is all, probably, due to something in KDE's attempts to capture >>>crashes and collect backtraces for better bug reports. But whatever bugs >>>they may have there, having an unkillable process -- of any kind -- worries >>>me greatly. Is this a known issue, or is a PR warranted? >> >>There have been no similar reports (to my knowledge) and I haven't seen >>anything similar on either 4.x or 5.x (I don't run 6-CURRENT). > > > Actually, I recall seeing a similar problem about 14 months ago on > 5-CURRENT. I believe that when a program crashed, its SIGSEGV handler > would fork and attach gdb to its parent in order to generate a stack > trace. I didn't have the opportunity to try and track it down, but I also > don't remember seeing it in the last six months. It could be because KDE > programs crash less for me as opposed to that the bug leading to the wedge > has been fixed. On my recent -current laptop (updated last week) I can reliably run gdb on a program, break main, and quit. The process being debugged is left in STOP state and is unkillable. Sam From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 02:30:31 2004 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9C31B16A4CE for ; Fri, 22 Oct 2004 02:30:31 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8B1B543D55 for ; Fri, 22 Oct 2004 02:30:31 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) i9M2UVrg039282 for ; Fri, 22 Oct 2004 02:30:31 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i9M2UVxo039281; Fri, 22 Oct 2004 02:30:31 GMT (envelope-from gnats) Date: Fri, 22 Oct 2004 02:30:31 GMT Message-Id: <200410220230.i9M2UVxo039281@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Mark Andrews Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Mark Andrews List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 02:30:31 -0000 The following reply was made to PR threads/72953; it has been noted by GNATS. From: Mark Andrews To: freebsd-gnats-submit@FreeBSD.org, marka@isc.org Cc: Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM Date: Fri, 22 Oct 2004 12:29:34 +1000 No. The restriction are for multi-threaded processes not single threaded processes. The test case is still single threaded when fork() is called. Also pthread_sigmask() / sigprocmask() are supposed to be identical in single threaded applications. Replacing pthread_sigmask() with sigprocmask() changes the resulting behaviour. http://www.opengroup.org/onlinepubs/007908799/xsh/sigprocmask.html http://www.opengroup.org/onlinepubs/007908799/xsh/fork.html From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 15:42:37 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A4AC416A4CE for ; Fri, 22 Oct 2004 15:42:37 +0000 (GMT) Received: from telecom.net.et (sparrow.telecom.net.et [213.55.64.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 67A6F43D45 for ; Fri, 22 Oct 2004 15:42:32 +0000 (GMT) (envelope-from mtm@identd.net) Received: from [213.55.68.16] (HELO rogue.acs.lan) by telecom.net.et (CommuniGate Pro SMTP 3.4.8) with ESMTP id 60837848; Fri, 22 Oct 2004 18:14:21 +0300 Received: by rogue.acs.lan (Postfix, from userid 1000) id 78704B872; Fri, 22 Oct 2004 18:21:03 +0300 (EAT) Date: Fri, 22 Oct 2004 18:21:03 +0300 From: Mike Makonnen To: Petri Helenius Message-ID: <20041022152103.GA4743@rogue.acs.lan> References: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> <41782BDF.8040301@he.iki.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41782BDF.8040301@he.iki.fi> User-Agent: Mutt/1.4.2.1i X-Operating-System: FreeBSD/6.0-CURRENT (i386) cc: jesk cc: threads@freebsd.org Subject: Re: FreeBSD5.3-RC1 MySQL Performance X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 15:42:37 -0000 On Fri, Oct 22, 2004 at 12:36:31AM +0300, Petri Helenius wrote: > jesk wrote: > > >the benchmark is executing 1000 sql-select queries*10 concurrent clients on > >a 90k row table with a random not really high cacheable where-statement on > >the index: > >---- > >15985 queries per second > >(pthreads without process scope threads, sched_4bsd and preemption) > >6139 queries per second > >(pthreads with process scope threads, sched_4bsd and preemption) > >10779 queries per second > >(linuxthreads, sched_4bsd and preemption) > >fedora result: > >11900 queries per second > >---- > > > > > >maybe someone got some hints for improvement of this situation... > > > > > Do you have any idea why process scope threads are faster than system > scope threads? My gut feeling is that it should be exactly opposite. I think you're reading it wrong: 'pthreads without process scope threads' ^^^^^^^ gets 15985 qps whereas 'with process scope threads' it only gets 6139 qps. Cheers. -- Mike Makonnen | GPG-KEY: http://www.identd.net/~mtm/mtm.asc mtm@identd.net | Fingerprint: AC7B 5672 2D11 F4D0 EBF8 5279 5359 2B82 7CD4 1F55 mtm@FreeBSD.Org| FreeBSD - Unleash the Daemon ! From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 15:48:22 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1AC816A4CE for ; Fri, 22 Oct 2004 15:48:22 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8707843D1D; Fri, 22 Oct 2004 15:48:22 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) i9MFmKsS054703; Fri, 22 Oct 2004 15:48:21 GMT (envelope-from davidxu@freebsd.org) Message-ID: <41792BC6.3020408@freebsd.org> Date: Fri, 22 Oct 2004 23:48:22 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040921 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mark Andrews References: <200410220230.i9M2UVxo039281@freefall.freebsd.org> In-Reply-To: <200410220230.i9M2UVxo039281@freefall.freebsd.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-threads@freebsd.org Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 15:48:22 -0000 Interesting, how can you treat the program as single threaded while calling pthread_xxx which is obviously defined for multithread? process only has one thread does not mean it is single-threaded, when you are linking pthread library, the program should be treated as multi-threaded, otherwise don't link with it. Mark Andrews wrote: >The following reply was made to PR threads/72953; it has been noted by GNATS. > >From: Mark Andrews >To: freebsd-gnats-submit@FreeBSD.org, marka@isc.org >Cc: >Subject: Re: threads/72953: fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYSTEM >Date: Fri, 22 Oct 2004 12:29:34 +1000 > > No. The restriction are for multi-threaded processes not single > threaded processes. The test case is still single threaded when fork() > is called. > > Also pthread_sigmask() / sigprocmask() are supposed to be identical in > single threaded applications. Replacing pthread_sigmask() with > sigprocmask() changes the resulting behaviour. > > http://www.opengroup.org/onlinepubs/007908799/xsh/sigprocmask.html > http://www.opengroup.org/onlinepubs/007908799/xsh/fork.html >_______________________________________________ >freebsd-threads@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-threads >To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" > > > > From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 19:24:14 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6D95416A4CE for ; Fri, 22 Oct 2004 19:24:14 +0000 (GMT) Received: from silver.he.iki.fi (helenius.fi [193.64.42.241]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5313243D5D for ; Fri, 22 Oct 2004 19:24:13 +0000 (GMT) (envelope-from pete@he.iki.fi) Received: from [193.64.42.134] (h86.vuokselantie10.fi [193.64.42.134]) by silver.he.iki.fi (8.13.1/8.11.4) with ESMTP id i9MJO6Um062662; Fri, 22 Oct 2004 22:24:11 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <41795E56.60603@he.iki.fi> Date: Fri, 22 Oct 2004 22:24:06 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mike Makonnen References: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> <41782BDF.8040301@he.iki.fi> <20041022152103.GA4743@rogue.acs.lan> In-Reply-To: <20041022152103.GA4743@rogue.acs.lan> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: jesk cc: threads@freebsd.org Subject: Re: FreeBSD5.3-RC1 MySQL Performance X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 19:24:14 -0000 Mike Makonnen wrote: >On Fri, Oct 22, 2004 at 12:36:31AM +0300, Petri Helenius wrote: > > >>jesk wrote: >> >> >> >>>the benchmark is executing 1000 sql-select queries*10 concurrent clients on >>>a 90k row table with a random not really high cacheable where-statement on >>>the index: >>>---- >>>15985 queries per second >>>(pthreads without process scope threads, sched_4bsd and preemption) >>>6139 queries per second >>>(pthreads with process scope threads, sched_4bsd and preemption) >>>10779 queries per second >>>(linuxthreads, sched_4bsd and preemption) >>>fedora result: >>>11900 queries per second >>>---- >>> >>> >>>maybe someone got some hints for improvement of this situation... >>> >>> >>> >>> >>Do you have any idea why process scope threads are faster than system >>scope threads? My gut feeling is that it should be exactly opposite. >> >> > >I think you're reading it wrong: 'pthreads without process scope threads' > ^^^^^^^ >gets 15985 qps whereas 'with process scope threads' it only gets 6139 qps. > > Yes. I meant to ask why system scope threads are faster than process scope threads. They should be the other way around. Pete From owner-freebsd-threads@FreeBSD.ORG Fri Oct 22 19:57:40 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B86A216A4CE for ; Fri, 22 Oct 2004 19:57:40 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D6E443D1D for ; Fri, 22 Oct 2004 19:57:40 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 7AFFB7A425; Fri, 22 Oct 2004 12:57:40 -0700 (PDT) Message-ID: <41796634.1040706@elischer.org> Date: Fri, 22 Oct 2004 12:57:40 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Petri Helenius References: <000901c4b7b4$2113ab70$45fea8c0@turbofresse> <41782BDF.8040301@he.iki.fi> <20041022152103.GA4743@rogue.acs.lan> <41795E56.60603@he.iki.fi> In-Reply-To: <41795E56.60603@he.iki.fi> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: jesk cc: threads@freebsd.org Subject: Re: FreeBSD5.3-RC1 MySQL Performance X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Oct 2004 19:57:40 -0000 Petri Helenius wrote: > Mike Makonnen wrote: > >> On Fri, Oct 22, 2004 at 12:36:31AM +0300, Petri Helenius wrote: >> >> >>> jesk wrote: >>> >>> >>> >>>> the benchmark is executing 1000 sql-select queries*10 concurrent >>>> clients on >>>> a 90k row table with a random not really high cacheable >>>> where-statement on >>>> the index: >>>> ---- >>>> 15985 queries per second >>>> (pthreads without process scope threads, sched_4bsd and preemption) >>>> 6139 queries per second >>>> (pthreads with process scope threads, sched_4bsd and preemption) >>>> 10779 queries per second >>>> (linuxthreads, sched_4bsd and preemption) >>>> fedora result: >>>> 11900 queries per second >>>> ---- >>>> >>>> >>>> maybe someone got some hints for improvement of this situation... >>>> >>>> >>>> >>> >>> Do you have any idea why process scope threads are faster than >>> system scope threads? My gut feeling is that it should be exactly >>> opposite. >>> >> >> >> I think you're reading it wrong: 'pthreads without process scope >> threads' >> ^^^^^^^ >> gets 15985 qps whereas 'with process scope threads' it only gets 6139 >> qps. >> >> > Yes. I meant to ask why system scope threads are faster than process > scope threads. They should be the other way around. It's one of these things where the two schemes are so different that they have WAY different performance characteristics for any given task. Peter had a test where process scope threads ran 50(!) times faster (than system scope) yet in some tests it runs slower.. We need to tune a lot but we've been concentrating on just making it work.. > > > Pete > > _______________________________________________ > freebsd-threads@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-threads > To unsubscribe, send any mail to > "freebsd-threads-unsubscribe@freebsd.org"