From owner-freebsd-threads@FreeBSD.ORG Sun Apr 13 20:59:03 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 440BA37B401 for ; Sun, 13 Apr 2003 20:59:03 -0700 (PDT) Received: from exchhz01.viatech.com.cn (ip-167-164-97-218.anlai.com [218.97.164.167]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B39343F93 for ; Sun, 13 Apr 2003 20:58:57 -0700 (PDT) (envelope-from davidxu@freebsd.org) Received: from davidw2k (ip-240-1-168-192.rev.dyxnet.com [192.168.1.240]) by exchhz01.viatech.com.cn with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HLDQQ1LT; Mon, 14 Apr 2003 11:44:54 +0800 Message-ID: <006001c3023a$65fe01d0$f001a8c0@davidw2k> From: "David Xu" To: "Daniel Eischen" References: Date: Mon, 14 Apr 2003 12:00:33 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4910.0300 cc: freebsd-threads@freebsd.org Subject: libpthread patch X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2003 03:59:03 -0000 After tested Daniel's pthread patch, I found 50% of ACE test program are core dumpped on my machine. So I studied the libpthread source code after applied the patch, I found that the main problem is thread state transition is not atomic,=20 for example in thr_mutex: mutex_queue_enq(*m, curthread); curthread->data.mutex =3D *m; /* This thread is active and is in a critical region (holding the mutex lock); we should be able to safely set the state. */ THR_SET_STATE(curthread, PS_MUTEX_WAIT); /* Unlock the mutex structure: */ THR_LOCK_RELEASE(curthread, &(*m)->m_lock); /* Schedule the next thread: */ _thr_sched_switch(curthread); thread sets its state to PS_MUTEX_WAIT, and call _thr_sched_switch, but it is not under scheduler lock, so there is a race between THR_SET_STATE and thr_sched_switch. I have inserted _kse_critical_enter() before THR_SET_STATE, the code looks as following: mutex_queue_enq(*m, curthread); curthread->data.mutex =3D *m; _kse_critical_enter(); /* This thread is active and is in a critical region (holding the mutex lock); we should be able to safely set the state. */ THR_SET_STATE(curthread, PS_MUTEX_WAIT); /* Unlock the mutex structure: */ THR_LOCK_RELEASE(curthread, &(*m)->m_lock); /* Schedule the next thread: */ _thr_sched_switch(curthread); I also commented out most code in thr_lock_wait() and thr_lock_wakeup(), I think without better scheduler lock, these code has race condition, and in most case will this cause a thread be reinserted into runq while it=20 is already in this queue. now, I can run ACE test programs without any core dumpped, and only the following program are failed: Cached_Conn_Test Conn_Test MT_Reactor_Timer_Test Malloc_Test Process_Strategy_Test Thread_Pool_Test a complete log file is at: http://people.freebsd.org/~davidxu/run_test.log the libpthread package I modified is at: http://people.freebsd.org/~davidxu/libpthread.tgz Also, I can run crew test program without any problem. I think the whole scheduler lock should be reworked to allow state transition is in atomic, my change is not SMP safe, only works on UP, because kse_critical_enter is only works for UP system. If we fixed this scheduler lock problem, I think the libpthread will be stable enough. David Xu