Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Feb 2018 16:15:16 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Conrad Meyer <cem@freebsd.org>
Cc:        Bruce Evans <brde@optusnet.com.au>,  src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r327495 - head/usr.sbin/rpcbind
Message-ID:  <20180204150751.V909@besplex.bde.org>
In-Reply-To: <CAG6CVpXodjf7HfagpaY6K1t69a1jx37FRqvQq2acg5eT0r1Z-A@mail.gmail.com>
References:  <201801021725.w02HPDaj068477@repo.freebsd.org> <20180203232725.U1389@besplex.bde.org> <CAG6CVpXodjf7HfagpaY6K1t69a1jx37FRqvQq2acg5eT0r1Z-A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 3 Feb 2018, Conrad Meyer wrote:

> On Sat, Feb 3, 2018 at 6:46 AM, Bruce Evans <brde@optusnet.com.au> wrote:
>> On Tue, 2 Jan 2018, Conrad Meyer wrote:
>>> ...
>> Today I learned from the POSIX list that the pselect() function is designed
>> to avoid this bug.  It takes a signal mask arg (like msleep() in the
>> kernel).
>>
>> I haven't noticed any function related to poll() or kevent that takes a
>> signal mask.
>
> There is the similar function ppoll(), although it complies only with
> the Linux standard, not POSIX.

ppoll() is relatively new in FreeBSD (2014-11-13).  Its man page says
that it is first appeared in 11.0, but it is now in 10.x, and according
to bsd-family-tree it actually appeared in the 10.2 release more than a
year before it appeared in the 11.0 release:
- 2014-11-13  pre-11.0 source tree commit
- 2014-12-21  10.x source tree commit
- 2015-08-13  10.2 release
- 2016-10-10  11.0 release.

> With kevent, you can simply mask all (or most) signals and watch on a
> signal event.  Conversion to kevent is more complicated, though.
>
>> Programs broken by buggy conversion:
>> ...
>> - rpcbind(8).  This is not interactive and normally doesn't use ttys
>>   which might block.  However, the -d flag makes it do fprintf() to
>>   stderr.  This may block forever (e.g., for flow control), and you
>>   would like to be able to kill it using a signal.  But write() will
>>   restart.  rpcbind also uses plain signal() and doesn't know anything
>>   about SA_RESTART.
>
> This was not broken by conversion -- it was already broken in this
> case.  If the signal delivery raced with an fprintf, we ended up
> invoking the stdio atexit handlers via exit(3) call in terminate(),
> which of course encountered corrupted state.  Now it is broken in a
> slightly different way, yes, if signal delivery races fprintf *and*
> fprintf is blocked in flow control.  This might happen with a slow
> serial for stderr but seems extraordinarily unlikely otherwise.

Unsafe signal handlers are broken by their existence, but sometimes work.

It is unlikely for ttys too.  I forgot that stdio is clueless about ttys
so it doesn't wait for output to drain even for stderr.  So the output is
normally buffered in the kernel and write() returns.

> ...
>> perror()
>> is broken as designed since it uses stdio, so it is unsafe in signal
>> handlers.  The err() family copies this bug.  Even *s*printf() is not
>> required to be safe in signal handlers.  I would fix the latter first.
>
> It does seem like the printf family of routines should be
> signal-handler safe.  Unfortunately, they are implemented in terms of
> the unsafe stdio streams :-(.

Even sprintf() uses the generic function __vfprintf() on a fake FILE.  I
think it has problems with locales.  No ctype functions are in the list
of async-signal-safe functions in at least the 2001 version of POSIX.  It
is easy to see why they might not be safe.  They access lots of static
storage, and setlocale() or a simpler function that changes state might
run in another or just a nested signal handler, so locking is required...

So the safe sprintf() would probably have to use the C locale.  The locale
is already passed to __vfprintf(), as needed for sprintf_l().

>...
>>> Modified: head/usr.sbin/rpcbind/rpcbind.c
>>> ...
>>> terminate(int signum)
>>> ...
>>> +       wr = write(terminate_wfd, &c, 1);
>>
>>
>> Not async-signal-safe.  Acccesses in signal handlers to objects with
>> static storage duration give undefined behaviour except for assignment to
>> objects of type volatile sig_atomic_t, but the access to terminate_wfd is
>> a read and the type is plain int.
>
> The type can be changed to volatile sig_atomic_t if you believe plain
> int will trigger nonsensical compiler behavior.  The value is
> initialized once, before the signal handler is registered, so unless
> the compiler does something totally insane it should be fine on all
> architectures FreeBSD runs on.

sig_atomic_t is no better than plain int.  This behaviour now makes complete
sense.  It is just like the undefined behaviour with the ctype functions,
except since we own terminate_wfd we can guarantee that it doesn't change
while the handler is active (and is valid when the handler is entered).
We could also use atomic ops.  However, the C standard doesn't require
anything that we do to work (except maybe in C11, atomic ops might be
explicitly or implicitly specifed to work for things like this).

>>> +       if (wr < 1)
>>> +               _exit(2);
>>
>>
>> Best to not check for write errors, since the error handling of using
>> _exit()
>> is worse than none.  It loses stdio flushing to handle an almost-harmless
>> error.  The main problem with keeping everything in a safe handler is that
>> it
>> is impossible to keep stdio flushing there and we would prefer to not lost
>> the stdio flushing.
>
> I don't necessarily agree.  If the write fails, we missed the signal
> telling us to terminate the program and will never exit.  That said,
> how would the write ever fail?

We still set the flag, so can only fail to exit() in the race case,
but the write() failure affects all cases.

I can't see how the write() can fail.  That was part of the excuse for
removing the check.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180204150751.V909>