Date: Sun, 4 Feb 2018 16:15:16 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Conrad Meyer <cem@freebsd.org> Cc: Bruce Evans <brde@optusnet.com.au>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r327495 - head/usr.sbin/rpcbind Message-ID: <20180204150751.V909@besplex.bde.org> In-Reply-To: <CAG6CVpXodjf7HfagpaY6K1t69a1jx37FRqvQq2acg5eT0r1Z-A@mail.gmail.com> References: <201801021725.w02HPDaj068477@repo.freebsd.org> <20180203232725.U1389@besplex.bde.org> <CAG6CVpXodjf7HfagpaY6K1t69a1jx37FRqvQq2acg5eT0r1Z-A@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 3 Feb 2018, Conrad Meyer wrote: > On Sat, Feb 3, 2018 at 6:46 AM, Bruce Evans <brde@optusnet.com.au> wrote: >> On Tue, 2 Jan 2018, Conrad Meyer wrote: >>> ... >> Today I learned from the POSIX list that the pselect() function is designed >> to avoid this bug. It takes a signal mask arg (like msleep() in the >> kernel). >> >> I haven't noticed any function related to poll() or kevent that takes a >> signal mask. > > There is the similar function ppoll(), although it complies only with > the Linux standard, not POSIX. ppoll() is relatively new in FreeBSD (2014-11-13). Its man page says that it is first appeared in 11.0, but it is now in 10.x, and according to bsd-family-tree it actually appeared in the 10.2 release more than a year before it appeared in the 11.0 release: - 2014-11-13 pre-11.0 source tree commit - 2014-12-21 10.x source tree commit - 2015-08-13 10.2 release - 2016-10-10 11.0 release. > With kevent, you can simply mask all (or most) signals and watch on a > signal event. Conversion to kevent is more complicated, though. > >> Programs broken by buggy conversion: >> ... >> - rpcbind(8). This is not interactive and normally doesn't use ttys >> which might block. However, the -d flag makes it do fprintf() to >> stderr. This may block forever (e.g., for flow control), and you >> would like to be able to kill it using a signal. But write() will >> restart. rpcbind also uses plain signal() and doesn't know anything >> about SA_RESTART. > > This was not broken by conversion -- it was already broken in this > case. If the signal delivery raced with an fprintf, we ended up > invoking the stdio atexit handlers via exit(3) call in terminate(), > which of course encountered corrupted state. Now it is broken in a > slightly different way, yes, if signal delivery races fprintf *and* > fprintf is blocked in flow control. This might happen with a slow > serial for stderr but seems extraordinarily unlikely otherwise. Unsafe signal handlers are broken by their existence, but sometimes work. It is unlikely for ttys too. I forgot that stdio is clueless about ttys so it doesn't wait for output to drain even for stderr. So the output is normally buffered in the kernel and write() returns. > ... >> perror() >> is broken as designed since it uses stdio, so it is unsafe in signal >> handlers. The err() family copies this bug. Even *s*printf() is not >> required to be safe in signal handlers. I would fix the latter first. > > It does seem like the printf family of routines should be > signal-handler safe. Unfortunately, they are implemented in terms of > the unsafe stdio streams :-(. Even sprintf() uses the generic function __vfprintf() on a fake FILE. I think it has problems with locales. No ctype functions are in the list of async-signal-safe functions in at least the 2001 version of POSIX. It is easy to see why they might not be safe. They access lots of static storage, and setlocale() or a simpler function that changes state might run in another or just a nested signal handler, so locking is required... So the safe sprintf() would probably have to use the C locale. The locale is already passed to __vfprintf(), as needed for sprintf_l(). >... >>> Modified: head/usr.sbin/rpcbind/rpcbind.c >>> ... >>> terminate(int signum) >>> ... >>> + wr = write(terminate_wfd, &c, 1); >> >> >> Not async-signal-safe. Acccesses in signal handlers to objects with >> static storage duration give undefined behaviour except for assignment to >> objects of type volatile sig_atomic_t, but the access to terminate_wfd is >> a read and the type is plain int. > > The type can be changed to volatile sig_atomic_t if you believe plain > int will trigger nonsensical compiler behavior. The value is > initialized once, before the signal handler is registered, so unless > the compiler does something totally insane it should be fine on all > architectures FreeBSD runs on. sig_atomic_t is no better than plain int. This behaviour now makes complete sense. It is just like the undefined behaviour with the ctype functions, except since we own terminate_wfd we can guarantee that it doesn't change while the handler is active (and is valid when the handler is entered). We could also use atomic ops. However, the C standard doesn't require anything that we do to work (except maybe in C11, atomic ops might be explicitly or implicitly specifed to work for things like this). >>> + if (wr < 1) >>> + _exit(2); >> >> >> Best to not check for write errors, since the error handling of using >> _exit() >> is worse than none. It loses stdio flushing to handle an almost-harmless >> error. The main problem with keeping everything in a safe handler is that >> it >> is impossible to keep stdio flushing there and we would prefer to not lost >> the stdio flushing. > > I don't necessarily agree. If the write fails, we missed the signal > telling us to terminate the program and will never exit. That said, > how would the write ever fail? We still set the flag, so can only fail to exit() in the race case, but the write() failure affects all cases. I can't see how the write() can fail. That was part of the excuse for removing the check. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180204150751.V909>