From owner-freebsd-hackers Wed Mar 10 9: 0: 8 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from ns.tar.com (ns.tar.com [204.95.187.2]) by hub.freebsd.org (Postfix) with ESMTP id 262A81549A for ; Wed, 10 Mar 1999 09:00:05 -0800 (PST) (envelope-from dick@ns.tar.com) Received: (from dick@localhost) by ns.tar.com (8.9.3/8.9.3) id KAA11118; Wed, 10 Mar 1999 10:59:44 -0600 (CST) (envelope-from dick) Date: Wed, 10 Mar 1999 10:59:44 -0600 From: "Richard Seaman, Jr." To: John Plevyak Cc: hackers@freebsd.org Subject: Re: bug in linuxthreads for FreeBSD Message-ID: <19990310105944.F4440@tar.com> References: <19990309172626.A7182@proxydev.inktomi.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=0ntfKIWw70PvrIHh X-Mailer: Mutt 0.95.1i In-Reply-To: <19990309172626.A7182@proxydev.inktomi.com>; from John Plevyak on Tue, Mar 09, 1999 at 05:26:26PM -0800 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --0ntfKIWw70PvrIHh Content-Type: text/plain; charset=us-ascii On Tue, Mar 09, 1999 at 05:26:26PM -0800, John Plevyak wrote: > I have been using the linuxthreads for FreeBSD port, and running > it under high stress. Glad to see it. Also glad you can only find *one* bug :) > It seems that there is a bug/race which triggers the ASSERT in queue.h. After looking at the code and your description, I would agree that there is a race bug, though I'm not at all convinced your description is completely accurate. I've attached a patch, and I'd be interested if it solves the wait queue corruption problem. [snip] > The patch I am using unconditionally removes the thread from the > queue. This does not prevent the thread from being woken up > extaneously, but it does prevent the ASSERT and corruption of > the p_nextwaiting lists. It might be possible to eliminate, or at least narrow the window for the "extraneous" wakeup. It would require reworking the linux threads code more. OTOH, it could just be left as is. Here's what Butenhof's "Programming with POSIX Threads" says about the pthread_cond_* wait functions: "It is important that you test the predicate after locking the mutex and before waiting on the condition variable. If a thread signals a condition variable while no threads are waiting, nothing happens. If some other thread calls pthread_cond_wait right after that, it will just keep waiting...." and, "It is equally important that you test the predicate again when the thread wakes up. You should always wait for a condition variable in a loop, to protect against program errors, multiprocessor races, and spurious wakeups." > This looks like it may be a general problem with linuxthreads, > and I am wondering if anyone else has seen it. You're the first to report it to me. > Also, there is no 'lock prefix' before the xchg in _atomic_lock.S which > will prevent the library from working correctly with SMP. I'd like to know more about this. According to the Intel documents I have, the xchg op code implies a 'lock prefix'. You can add the 'lock prefix', but its redundant. -- Richard Seaman, Jr. email: dick@tar.com 5182 N. Maple Lane phone: 414-367-5450 Chenequa WI 53058 fax: 414-367-5852 --0ntfKIWw70PvrIHh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="condvar.diff" *** condvar.c.orig Wed Mar 10 10:12:49 1999 --- condvar.c Wed Mar 10 10:13:37 1999 *************** *** 152,157 **** --- 152,158 ---- #endif /* Otherwise, return normally */ + remove_from_queue(&cond->c_waiting, self); release(&cond->c_spinlock); pthread_mutex_lock(mutex); return 0; --0ntfKIWw70PvrIHh-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message