From owner-freebsd-hackers  Wed Mar 10  9: 0: 8 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from ns.tar.com (ns.tar.com [204.95.187.2])
	by hub.freebsd.org (Postfix) with ESMTP id 262A81549A
	for <hackers@freebsd.org>; Wed, 10 Mar 1999 09:00:05 -0800 (PST)
	(envelope-from dick@ns.tar.com)
Received: (from dick@localhost)
	by ns.tar.com (8.9.3/8.9.3) id KAA11118;
	Wed, 10 Mar 1999 10:59:44 -0600 (CST)
	(envelope-from dick)
Date: Wed, 10 Mar 1999 10:59:44 -0600
From: "Richard Seaman, Jr." <dick@tar.com>
To: John Plevyak <jplevyak@inktomi.com>
Cc: hackers@freebsd.org
Subject: Re: bug in linuxthreads for FreeBSD
Message-ID: <19990310105944.F4440@tar.com>
References: <19990309172626.A7182@proxydev.inktomi.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=0ntfKIWw70PvrIHh
X-Mailer: Mutt 0.95.1i
In-Reply-To: <19990309172626.A7182@proxydev.inktomi.com>; from John Plevyak on Tue, Mar 09, 1999 at 05:26:26PM -0800
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


--0ntfKIWw70PvrIHh
Content-Type: text/plain; charset=us-ascii

On Tue, Mar 09, 1999 at 05:26:26PM -0800, John Plevyak wrote:

> I have been using the linuxthreads for FreeBSD port, and running
> it under high stress.

Glad to see it.  Also glad you can only find *one* bug :)

> It seems that there is a bug/race which triggers the ASSERT in queue.h.

After looking at the code and your description, I would agree that
there is a race bug, though I'm not at all convinced your description
is completely accurate.  I've attached a patch, and I'd be interested
if it solves the wait queue corruption problem.

[snip]

> The patch I am using unconditionally removes the thread from the
> queue.  This does not prevent the thread from being woken up
> extaneously, but it does prevent the ASSERT and corruption of
> the p_nextwaiting lists.

It might be possible to eliminate, or at least narrow the window
for the "extraneous" wakeup.  It would require reworking the linux
threads code more.  OTOH, it could just be left as is.  Here's
what Butenhof's "Programming with POSIX Threads" says about 
the pthread_cond_* wait functions:

"It is important that you test the predicate after locking the mutex
and before waiting on the condition variable.  If a thread signals
a condition variable while no threads are waiting, nothing happens.
If some other thread calls pthread_cond_wait right after that,
it will just keep waiting...."

and,

"It is equally important that you test the predicate again when the
thread wakes up.  You should always wait for a condition variable
in a loop, to protect against program errors, multiprocessor races,
and spurious wakeups."

> This looks like it may be a general problem with linuxthreads,
> and I am wondering if anyone else has seen it.

You're the first to report it to me.

> Also, there is no 'lock prefix' before the xchg in _atomic_lock.S which
> will prevent the library from working correctly with SMP.

I'd like to know more about this.  According to the Intel documents
I have, the xchg op code implies a 'lock prefix'.  You can add the
'lock prefix', but its redundant.

-- 
Richard Seaman, Jr.           email: dick@tar.com
5182 N. Maple Lane            phone: 414-367-5450
Chenequa WI 53058             fax:   414-367-5852

--0ntfKIWw70PvrIHh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="condvar.diff"

*** condvar.c.orig	Wed Mar 10 10:12:49 1999
--- condvar.c	Wed Mar 10 10:13:37 1999
***************
*** 152,157 ****
--- 152,158 ----
  
  #endif
    /* Otherwise, return normally */
+   remove_from_queue(&cond->c_waiting, self);
    release(&cond->c_spinlock);
    pthread_mutex_lock(mutex);
    return 0;

--0ntfKIWw70PvrIHh--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message