From owner-freebsd-current@FreeBSD.ORG Thu Dec 20 07:55:41 2007 Return-Path: Delivered-To: current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 822EF16A418 for ; Thu, 20 Dec 2007 07:55:41 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.freebsd.org (Postfix) with ESMTP id 5148613C44B for ; Thu, 20 Dec 2007 07:55:41 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.14.2/8.14.1) with ESMTP id lBK7JOg1084825 for ; Thu, 20 Dec 2007 02:19:24 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.14.2/8.14.1/Submit) id lBK7JOJr084824 for current@freebsd.org; Thu, 20 Dec 2007 02:19:24 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Thu, 20 Dec 2007 02:19:24 -0500 From: David Schultz To: current@FreeBSD.ORG Message-ID: <20071220071924.GA84778@VARK.MIT.EDU> Mail-Followup-To: current@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Cc: Subject: libthr mutex race? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Dec 2007 07:55:41 -0000 There seems to be a bug libthr that results in programs occasionally printing: Fatal error 'mutex is on list' at line 450 in file /q/8.x/src/lib/libthr/thread/thr_mutex.c (errno = 1) Or, on occasion, it decides to be more emphatic: :) FFaattaall eerrrroorr ''mutex is on listmutex is on list'' aatt lliinnee 445500 iinn ffiillee /q/8.x/src/lib/libthr/thread/thr_mutex.c/q/8.x/src/lib/libthr/thread/thr_mutex.c ((eerrrrnnoo == 11)) I can reproduce this on ia64 (pluto2) but not i386. I'm posting here because I sort of suspect that it applies to all architectures with a weak memory model, but someone with a sparc64 or powerpc or something will have to confirm that for me. I'm leaving town for the holidays in a few hours and gdb on pluto2 seems borked, but if someone wants to look at this, that would be great. I've provided a program that reproduces the bug about 50% of the time (and hangs on pthread_join() the other 50%, which I think is an unrelated issue). I compiled with -O -pthread on FreeBSD pluto2.freebsd.org 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Dec 15 20:23:39 UTC 2007 marcel@pluto2.freebsd.org:/q/obj/q/8.x/src/sys/PLUTO2 ia64 ** Note that you probably need at least 2 CPUs to repro this. #include pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER; void * thread_main(void *arg) { int i, j; volatile int k; for (i = 0; i < 100000; i++) { pthread_mutex_lock(&m); pthread_mutex_unlock(&m); for (j = 0; j < 500; j++) k = 0; pthread_mutex_lock(&m); pthread_mutex_unlock(&m); } return (NULL); } int main(int argc, char **argv) { pthread_t td1, td2; // Three threads in total pthread_create(&td1, NULL, thread_main, NULL); pthread_create(&td2, NULL, thread_main, NULL); thread_main(NULL); pthread_join(td1, NULL); pthread_join(td2, NULL); return (0); }