Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Apr 2003 09:37:55 +0800
From:      "David Xu" <davidxu@freebsd.org>
To:        "\"Daniel Eischen\"" <eischen@pcnet1.pcnet.com>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: libpthread patch
Message-ID:  <005501c303b8$cda19390$f001a8c0@davidw2k>
References:  <000701c303a9$0cdd9370$0701a8c0@tiger>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message -----=20
From: "Daniel Eischen" <eischen@pcnet1.pcnet.com>
To: "David Xu" <davidxu@freebsd.org>
Cc: <freebsd-threads@freebsd.org>; "Craig Rodrigues" <rodrigc@attbi.com>
Sent: Wednesday, April 16, 2003 5:26 AM
Subject: Re: libpthread patch


> There's an updated patch file available at (a slightly different =
place):
>=20
>     http://people.freebsd.org/~deischen/kse/libpthread.diffs
>=20

Will test it.

> There's also an html'ized log of the ACE tests:
>=20
>     http://people.freebsd.org/~deischen/kse/ace_build_logs/index.html
>=20
> The only real problems seem to be with the ACE tests:
>=20
>     Cached_Conn_Test
>     Process_Manager_Test
>=20
> And I think these have something to do with wait() or waitpid()
> not working correctly.  David, do you know of any problems in
> this area?  It seems that sometimes waitpid() is returning 0
> and the next time it is called it returns the process id.
> I wonder if it is being interrupted by a signal (either the
> kernel doing it or the UTS by use of kse_thr_interrupt)?
>=20

Remember current signal handling for threaded program is
broken in kernel, any signal can be lost in kernel because
of thread exiting, for our M:N based threaded process, the
case is worse than 1:1 because we exit thread more often than
1:1 threading, so any signal related tests will frequently
be failed. some code in ACE I find :
    for (;;)
    {
      int result =3D ACE_OS::waitpid (this->getpid (),
                                    status,
                                    WNOHANG);
      if (result !=3D 0)
        return result;
     =20
      ACE_Sig_Set alarm_or_child;
     =20
      alarm_or_child.sig_add (SIGALRM);
      alarm_or_child.sig_add (SIGCHLD);
      ACE_Time_Value time_left =3D wait_until - ACE_OS::gettimeofday ();
     =20
      // If ACE_OS::ualarm doesn't have sub-second resolution:
      time_left +=3D ACE_Time_Value (0, 500000);
      time_left.usec (0);
     =20
      if (time_left <=3D ACE_Time_Value::zero)
        return 0; // timeout

      ACE_OS::ualarm (time_left);
      if (ACE_OS::sigwait (alarm_or_child) =3D=3D -1)
        return ACE_INVALID_PID;
    }
...
so you see, the code expects SIGCHLD and SIGALRM, if
SIGCHLD lost, it would timeout and return 0;
I did not find waitpid has bug.

BTW, I have a patch for kse_release to let it direct
return to userland and not schedule an upcall.
the bit 0 of km_flags in kse_mailbox is used as a hint
to tell kernel not to schedule an upcall for the kse.
http://people.freebsd.org/~davidxu/kse_release.diff

If nobody objects it, I will commit it.

David Xu




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?005501c303b8$cda19390$f001a8c0>