Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Mar 2019 22:27:54 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r345696 - head/lib/libvgl
Message-ID:  <20190331214235.K961@besplex.bde.org>
In-Reply-To: <20190330094558.GA1923@kib.kiev.ua>
References:  <201903291557.x2TFv9AW097226@repo.freebsd.org> <20190329182100.GZ1923@kib.kiev.ua> <20190330142319.I1011@besplex.bde.org> <20190330094558.GA1923@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 30 Mar 2019, Konstantin Belousov wrote:

> On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
>> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
>>
>>> On Fri, Mar 29, 2019 at 03:57:09PM +0000, Bruce Evans wrote:
>>>> Author: bde
>>>> Date: Fri Mar 29 15:57:08 2019
>>>> New Revision: 345696
>>>> URL: https://svnweb.freebsd.org/changeset/base/345696
>>>>
>>>> Log:
>>>>   Fix endless loops for handling SIGBUS and SIGSEGV.
>>>>
>>>>   r80270 has the usual wrong fix for unsafe signal handling -- just set
>>>>   a flag and return to let an event loop check the flag and do safe
>>>>   handling.  This never works for signals like SIGBUS and SIGSEGV that
>>>>   repeat and works poorly for others unless the application has an event
>>>>   loop designed to support this.
>>>>
>>>>   For these signals, clean up unsafely as before, except for arranging that
>>>>   nested signals are fatal and forcing a nested signal if the cleanup doesn't
>>>>   cause one.
>>>>
>>>> Modified:
>>>>   head/lib/libvgl/main.c
>>>>
>>>> Modified: head/lib/libvgl/main.c
>>>> ==============================================================================
>>>> --- head/lib/libvgl/main.c	Fri Mar 29 15:20:48 2019	(r345695)
>>>> +++ head/lib/libvgl/main.c	Fri Mar 29 15:57:08 2019	(r345696)
>>>> ...
>>>> @@ -107,14 +107,22 @@ struct vt_mode smode;
>>>>  }
>>>>
>>>>  static void
>>>> -VGLAbort(int arg __unused)
>>>> +VGLAbort(int arg)
>>>>  {
>>>> +  sigset_t mask;
>>>> +
>>>>    VGLAbortPending = 1;
>>>>    signal(SIGINT, SIG_IGN);
>>>>    signal(SIGTERM, SIG_IGN);
>>>> -  signal(SIGSEGV, SIG_IGN);
>>>> -  signal(SIGBUS, SIG_IGN);
>>>>    signal(SIGUSR2, SIG_IGN);
>>>> +  if (arg == SIGBUS || arg == SIGSEGV) {
>>>> +    signal(arg, SIG_DFL);
>>>> +    sigemptyset(&mask);
>>>> +    sigaddset(&mask, arg);
>>>> +    sigprocmask(SIG_UNBLOCK, &mask, NULL);
>>>> +    VGLEnd();
>>>> +    kill(getpid(), arg);
>>> This of course misses the siginfo information from the real fault.
>>
>> It is in the nested signal frame.
>>
>>> Why SIGBUS/SIGSEGV are caught at all ?
>>
>> Otherwise, the screen is left in an application mode that the kernel
>> doesn't support (except to not write to the screen, so that the
>> application has full control).

Also, the keyboard may be left in a strange state.  Usually this is no
worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
switching to another vty without going through a libvgl handler anyway.
The combination is usually enough to break switching to vty0 to run ddb,
though that should work for at least breakpoints in the kernel.  IIRC,
sc_cngrab() switches the keyboard mode, and my version of it restores
ignoring of the anti-switch flag.

> ...
> I am more about not doing kill(2) from the handler, instead do plain
> return to the context which caused the original signal.

That won't work well for SIGBUS's and SIGSEGV's related to the library.
If the above cleanup is not done, then it is too hard to debug the problem,
and if the above cleanup is done then it is impossible to debug the original
problem if it was for and invalid access in libvgl code.

The latter is a problem for the above cleanup too -- the cleanup clobbers
the libvgl state.

However, I could do a more limited cleanup of just restoring the screen and
keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
most ioctls are async-signal safe in practice provided they don't use
global variables that were set too recently or too non-atomically.
VGLEnd() uses main globals established by VGLInit().  It depends on program
order which is not guaranteed in signal handlers for testing the flags set
by initialization.  The cleanup now also does:
- screen clearing.  This breaks debugging and is dangerous if the bug is
   in screen clearing.  Screen clearing is also very slow, and belongs
   in the kernel.  It is now done 4 times per libvgl use, but its
   slowness is limited by the kernel not clearing properly and the
   kernel not allowing the application to mmap() the whole physical
   frame buffer in some versions of FreeBSD (this breaks panning).
- munmap().  This breaks debugging.
- free() of backing store buffer and the main VGLDisplay data.  This breaks
   debugging and is unnecessary if we are going to exit.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190331214235.K961>