From owner-freebsd-threads@FreeBSD.ORG Thu Oct 28 22:43:45 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC1D616A4CE; Thu, 28 Oct 2004 22:43:45 +0000 (GMT) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 571D343D64; Thu, 28 Oct 2004 22:43:45 +0000 (GMT) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) i9SMhiqc026987; Thu, 28 Oct 2004 18:43:44 -0400 (EDT) Date: Thu, 28 Oct 2004 18:43:44 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: John Baldwin In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) cc: threads@freebsd.org Subject: Re: Infinite loop bug in libc_r on 4.x with condition variables and signals X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2004 22:43:45 -0000 On Thu, 28 Oct 2004, Daniel Eischen wrote: > On Thu, 28 Oct 2004, John Baldwin wrote: > > > On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote: > > > On Wed, 27 Oct 2004, John Baldwin wrote: > > > > > > > > FWIW, we are having (I think) the same problem on 5.3 with libpthread. > > > > The panic there is in the mutex code about an assertion failing because a > > > > thread is on a syncq when it is not supposed to be. > > > > > > David and I recently fixed some races in pthread_join() and > > > pthread_exit() in -current libpthread. Don't know if those > > > were responsible... > > > > > > Here's a test program that shows correct behavior with both > > > libc_r and libpthread in -current. > > > > We've started testing on -current and are seeing several problems with > > libpthread. Using a UP kernel (machines have single processor with HTT) > > seems to make it better, but we seem to be getting SIG 11's in > > pthread_testcancel() as well as the failed lock assertions that were > > mentioned earlier on the list in the PR. Just running monodevelop from the > > bsd-sharp stuff mentioned earlier can break in that one of the processes dies > > with the assertion failure. If you let the other processes run, then you can > > run it again and get the window to pop up, but then clicking on any of the > > controls results in the pthread_testcancel() crash. FWIW, I think the reason > > that the stack traces look weird in the PR's thread may be due to catching a > > signal. When we were looking at the problems with libc_r on 4.x we would get > > some weird looking backtraces sometimes when the assertion in uthread_sig.c > > that I added failed. Seems that gdb doesn't handle the signal frames very > > well. > > You also want to make sure you're not running out of stack space > for your threads. > > Is the code trying to install signal frames on threads itself? > That could cause the problems you are seeing. I went back to the monodoc test case in the PR. Running under the debugger gives this: (gdb) run /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs ./list.cs ./elabel.cs ./history.cs ./Contributions.cs ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll Starting program: /usr/local/bin/mono /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs ./list.cs ./elabel.cs ./history.cs ./Contributions.cs ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll [Switching to Thread 1 (LWP 100074)] Breakpoint 1, 0x0804862e in main () (gdb) cont Continuing. [Switching to Thread 4 (LWP 100128)] Breakpoint 2, 0x2842c801 in __assert () from /lib/libc.so.5 (gdb) bt #0 0x2842c801 in __assert () from /lib/libc.so.5 #1 0x2837ce4e in _lock_acquire (lck=0x8062f00, lu=0x8110e48, prio=674751930) at /opt/FreeBSD/src/lib/libpthread/sys/lock.c:171 #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 #3 0x28371677 in __pthread_mutex_lock (m=0x28482434) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:796 #4 0x28171cc6 in WaitForSingleObjectEx (handle=0xe, timeout=500, alertable=0) at handles-private.h:97 #5 0x2816b116 in CreateProcess (appname=0xd, cmdline=0x8092ac4, process_attrs=0x0, thread_attrs=0x0, inherit_handles=1, create_flags=1024, new_environ=0x0, cwd=0x0, startup=0xbf8ec78c, process_info=0xbf8ec77c) at processes.c:427 #6 0x2813ef4f in ves_icall_System_Diagnostics_Process_Start_internal (appname=0x80f89d8, cmd=0x8092ab8, dirname=0x808ff30, stdin_handle=0x2837e5ba, stdout_handle=0x2837e5ba, stderr_handle=0x2837e5ba, process_info=0xbf8ec964) at process.c:870 #7 0x28f548ff in ?? () #8 0x080f89d8 in ?? () #9 0x08092ab8 in ?? () #10 0x0808ff30 in ?? () #11 0x00000009 in ?? () #12 0x0000000d in ?? () #13 0x0000000b in ?? () #14 0xbf8ec964 in ?? () #15 0x0812d420 in ?? () #16 0x0812d408 in ?? () #17 0x0820d300 in ?? () #18 0x0808ff30 in ?? () #19 0x08092ab8 in ?? () #20 0x080f89d8 in ?? () #21 0xbf8ec838 in ?? () #22 0x28f548cc in ?? () #23 0xbf8ec98c in ?? () #24 0x28f542aa in ?? () ---Type to continue, or q to quit--- #25 0x080f89d8 in ?? () #26 0x08092ab8 in ?? () #27 0x0808ff30 in ?? () #28 0x00000009 in ?? () #29 0x0000000d in ?? () #30 0x0000000b in ?? () #31 0xbf8ec964 in ?? () #32 0x28371bfe in mutex_unlock_common (m=0xb, add_reference=134818488) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:984 Previous frame inner to this frame (corrupt stack?) (gdb) info threads 5 Thread 2 (LWP 100137) 0x2837bfd3 in kse_release () at kse_release.S:2 4 Thread 3 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked (curthread=0x8110000) at pthread_md.h:225 * 3 Thread 4 (LWP 100128) 0x2842c801 in __assert () from /lib/libc.so.5 2 Thread 1 (sleeping) 0x28373d0f in _thr_sched_switch_unlocked (curthread=0x8053000) at pthread_md.h:225 (gdb) thread 3 [Switching to thread 3 (Thread 4 (LWP 100128))]#0 0x2842c801 in __assert () from /lib/libc.so.5 (gdb) frame 2 #2 0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434, abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495 495 THR_LOCK_ACQUIRE(curthread, &(*m)->m_lock); (gdb) print curthread->uniqueid $36 = 3 (gdb) print/x curthread->magic $37 = 0xd09ba115 (gdb) print/x **m $39 = {m_lock = {l_head = 0x7273752f, l_tail = 0x636f6c2f, l_type = 0x6c2f6c61, l_wait = 0x6d2f6269, l_wakeup = 0x726f6373}, m_type = 0x2e62696c, m_protocol = 0x7c6c6c64, m_queue = { tqh_first = 0x74737953, tqh_last = 0x522e6d65}, m_owner = 0x69746e75, m_flags = 0x532e656d, m_count = 0x61697265, m_refcount = 0x617a696c, m_prio = 0x6e6f6974, m_saved_prio = 0x6553492e, m_qe = {tqe_next = 0x6c616972, tqe_prev = 0x62617a69}} The thread seems to be correct, but the mutex is trashed. It's not a valid mutex and the lock type (l_type) does indeed have LCK_PRIORITY set. Note that libpthread doesn't create any locks of this type, so this trips the assertion failure. -- Dan Eischen