Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jul 2008 21:54:55 -0400
From:      "Michael B Allen" <ioplex@gmail.com>
To:        "John Baldwin" <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Pls sanity check my semtimedop(2) implementation
Message-ID:  <78c6bd860807171854o6e566b2h6ee3b77008dc541f@mail.gmail.com>
In-Reply-To: <200807172015.11460.jhb@freebsd.org>
References:  <78c6bd860807121611w4f6ab44brbebfffea9929682a@mail.gmail.com> <200807171005.53148.jhb@freebsd.org> <78c6bd860807171042o54627c78nfcc0c19717b75f1e@mail.gmail.com> <200807172015.11460.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 17, 2008 at 8:15 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday 17 July 2008 01:42:31 pm Michael B Allen wrote:
>> On Thu, Jul 17, 2008 at 10:05 AM, John Baldwin <jhb@freebsd.org> wrote:
>> > On Saturday 12 July 2008 07:11:26 pm Michael B Allen wrote:
>> >> Hi,
>> >>
>> >> Below is a semtimedop(2) implementation that I'm using for FreeBSD. I
>> >> was hoping someone could look it over and tell me if they think the
>> >> implementation is sound.
>> >>
>> >> The code seems to work ok but when stressing the FreeBSD build of my app
>> >> I have managed to provoke errors related to concurrency (usually when a
>> >> SIGALRM goes off). The Linux build works flawlessesly so I'm wondering
>> >> about this one critical function that is different.
>> >>
>> >> Do you think it would make any difference if I used
>> >> ITIMER_VIRTUAL / SIGVTALRM instead of ITIMER_REAL / SIGALRM?
>> >>
>> >> Or perhaps I should be using a different implementation entirely?
>> >
>> > What specific races are you seeing?  The timer is firing too early, too
>> > late?
>>
>> It's very difficult to tell. I can only trigger the issue very
>> occasionally running my torture test such that any diagnostic logging
>> changes the results.
>>
>> And at this point I'm not sure my semtimedop implementation is
>> responsible. I have not seen the issue since fixing the race pointed
>> out by Mikko (although I have not tried very hard to provoke it).
>>
>> For now, I'm satisfied since I do not think the issue will be
>> triggered in the wild. I hate to use signals for anything but as much
>> as I try, there's just no other way to implement semtimedop within a
>> single largely self-contained function. In the future I will likely
>> use another process in the application that uses select(2) as an
>> "event service" to post on semaphores after a certain time period.
>> Unfortunately, right now, that service ultimately calls semtimedop so
>> I'll save it for a rainy day.
>>
>> Although if you implemented semtimedop(2) into the FreeBSD API that
>> would work too :-)
>
> POSIX semaphores (sem_open(3), sem_create(3), etc.) do have a
> sem_timedwait(3).  However, POSIX semaphores have several bugs in 6.x and 7.x
> (they should work a lot better in 8).  If you want I can give you a patch for
> 6.x or 7.x that backports the 8.x POSIX semaphores.

I can't ask my customers to patch their systems.

But I'll keep it in mind for the future. I don't recall why I chose
System V semaphores originally. I think process-shared semantics in
the POSIX implementations where not mature at the time. I would love
to move away from System V semaphores. It's all too easy to leak them
and trying to clean up on restart is dangerous.

Mike



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?78c6bd860807171854o6e566b2h6ee3b77008dc541f>