Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Apr 2003 22:58:38 -0400 (EDT)
From:      Daniel Eischen <eischen@pcnet1.pcnet.com>
To:        David Xu <davidxu@freebsd.org>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: libpthread patch
Message-ID:  <Pine.GSO.4.10.10304152253300.25176-100000@pcnet1.pcnet.com>
In-Reply-To: <005501c303b8$cda19390$f001a8c0@davidw2k>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 16 Apr 2003, David Xu wrote:

> ----- Original Message ----- 
> From: "Daniel Eischen" <eischen@pcnet1.pcnet.com>
> To: "David Xu" <davidxu@freebsd.org>
> Cc: <freebsd-threads@freebsd.org>; "Craig Rodrigues" <rodrigc@attbi.com>
> Sent: Wednesday, April 16, 2003 5:26 AM
> Subject: Re: libpthread patch
> 
> 
> > There's an updated patch file available at (a slightly different place):
> > 
> >     http://people.freebsd.org/~deischen/kse/libpthread.diffs
> > 
> 
> Will test it.

I found another problem with one of my other tests.  It doesn't seem
to affect any of the ACE tests, though.  I'll continue debugging it.

> > There's also an html'ized log of the ACE tests:
> > 
> >     http://people.freebsd.org/~deischen/kse/ace_build_logs/index.html
> > 
> > The only real problems seem to be with the ACE tests:
> > 
> >     Cached_Conn_Test
> >     Process_Manager_Test
> > 
> > And I think these have something to do with wait() or waitpid()
> > not working correctly.  David, do you know of any problems in
> > this area?  It seems that sometimes waitpid() is returning 0
> > and the next time it is called it returns the process id.
> > I wonder if it is being interrupted by a signal (either the
> > kernel doing it or the UTS by use of kse_thr_interrupt)?
> > 
> 
> Remember current signal handling for threaded program is
> broken in kernel, any signal can be lost in kernel because
> of thread exiting, for our M:N based threaded process, the
> case is worse than 1:1 because we exit thread more often than
> 1:1 threading, so any signal related tests will frequently
> be failed. some code in ACE I find :
>     for (;;)
>     {
>       int result = ACE_OS::waitpid (this->getpid (),
>                                     status,
>                                     WNOHANG);
>       if (result != 0)
>         return result;
>       
>       ACE_Sig_Set alarm_or_child;
>       
>       alarm_or_child.sig_add (SIGALRM);
>       alarm_or_child.sig_add (SIGCHLD);
>       ACE_Time_Value time_left = wait_until - ACE_OS::gettimeofday ();
>       
>       // If ACE_OS::ualarm doesn't have sub-second resolution:
>       time_left += ACE_Time_Value (0, 500000);
>       time_left.usec (0);
>       
>       if (time_left <= ACE_Time_Value::zero)
>         return 0; // timeout
> 
>       ACE_OS::ualarm (time_left);
>       if (ACE_OS::sigwait (alarm_or_child) == -1)
>         return ACE_INVALID_PID;
>     }
> ...
> so you see, the code expects SIGCHLD and SIGALRM, if
> SIGCHLD lost, it would timeout and return 0;
> I did not find waitpid has bug.

I thought it might also be the UTS trying to interrupt
the thread (kse_thr_interruot) while it was in the kernel
(assuming the UTS did get the signal).

> BTW, I have a patch for kse_release to let it direct
> return to userland and not schedule an upcall.
> the bit 0 of km_flags in kse_mailbox is used as a hint
> to tell kernel not to schedule an upcall for the kse.
> http://people.freebsd.org/~davidxu/kse_release.diff

I haven't tested that yet; that's on my list of things to
do :-)

> If nobody objects it, I will commit it.

-- 
Dan Eischen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.10.10304152253300.25176-100000>