Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Mar 2019 15:10:15 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r345696 - head/lib/libvgl
Message-ID:  <20190331121015.GK1923@kib.kiev.ua>
In-Reply-To: <20190331214235.K961@besplex.bde.org>
References:  <201903291557.x2TFv9AW097226@repo.freebsd.org> <20190329182100.GZ1923@kib.kiev.ua> <20190330142319.I1011@besplex.bde.org> <20190330094558.GA1923@kib.kiev.ua> <20190331214235.K961@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 31, 2019 at 10:27:54PM +1100, Bruce Evans wrote:
> On Sat, 30 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sat, Mar 30, 2019 at 03:24:40PM +1100, Bruce Evans wrote:
> >> On Fri, 29 Mar 2019, Konstantin Belousov wrote:
> >>
> >>> On Fri, Mar 29, 2019 at 03:57:09PM +0000, Bruce Evans wrote:
> >>>> Author: bde
> >>>> Date: Fri Mar 29 15:57:08 2019
> >>>> New Revision: 345696
> >>>> URL: https://svnweb.freebsd.org/changeset/base/345696
> >>>>
> >>>> Log:
> >>>>   Fix endless loops for handling SIGBUS and SIGSEGV.
> >>>>
> >>>>   r80270 has the usual wrong fix for unsafe signal handling -- just set
> >>>>   a flag and return to let an event loop check the flag and do safe
> >>>>   handling.  This never works for signals like SIGBUS and SIGSEGV that
> >>>>   repeat and works poorly for others unless the application has an event
> >>>>   loop designed to support this.
> >>>>
> >>>>   For these signals, clean up unsafely as before, except for arranging that
> >>>>   nested signals are fatal and forcing a nested signal if the cleanup doesn't
> >>>>   cause one.
> >>>>
> >>>> Modified:
> >>>>   head/lib/libvgl/main.c
> >>>>
> >>>> Modified: head/lib/libvgl/main.c
> >>>> ==============================================================================
> >>>> --- head/lib/libvgl/main.c	Fri Mar 29 15:20:48 2019	(r345695)
> >>>> +++ head/lib/libvgl/main.c	Fri Mar 29 15:57:08 2019	(r345696)
> >>>> ...
> >>>> @@ -107,14 +107,22 @@ struct vt_mode smode;
> >>>>  }
> >>>>
> >>>>  static void
> >>>> -VGLAbort(int arg __unused)
> >>>> +VGLAbort(int arg)
> >>>>  {
> >>>> +  sigset_t mask;
> >>>> +
> >>>>    VGLAbortPending = 1;
> >>>>    signal(SIGINT, SIG_IGN);
> >>>>    signal(SIGTERM, SIG_IGN);
> >>>> -  signal(SIGSEGV, SIG_IGN);
> >>>> -  signal(SIGBUS, SIG_IGN);
> >>>>    signal(SIGUSR2, SIG_IGN);
> >>>> +  if (arg == SIGBUS || arg == SIGSEGV) {
> >>>> +    signal(arg, SIG_DFL);
> >>>> +    sigemptyset(&mask);
> >>>> +    sigaddset(&mask, arg);
> >>>> +    sigprocmask(SIG_UNBLOCK, &mask, NULL);
> >>>> +    VGLEnd();
> >>>> +    kill(getpid(), arg);
> >>> This of course misses the siginfo information from the real fault.
> >>
> >> It is in the nested signal frame.
> >>
> >>> Why SIGBUS/SIGSEGV are caught at all ?
> >>
> >> Otherwise, the screen is left in an application mode that the kernel
> >> doesn't support (except to not write to the screen, so that the
> >> application has full control).
> 
> Also, the keyboard may be left in a strange state.  Usually this is no
> worse than termios raw mode, but it may be raw scancodes.  In raw scancodes
> mode, Alt-Fn to switch to another vty to fix the problem.  libvgl disables
> switching to another vty without going through a libvgl handler anyway.
> The combination is usually enough to break switching to vty0 to run ddb,
> though that should work for at least breakpoints in the kernel.  IIRC,
> sc_cngrab() switches the keyboard mode, and my version of it restores
> ignoring of the anti-switch flag.
> 
> > ...
> > I am more about not doing kill(2) from the handler, instead do plain
> > return to the context which caused the original signal.
> 
> That won't work well for SIGBUS's and SIGSEGV's related to the library.
> If the above cleanup is not done, then it is too hard to debug the problem,
> and if the above cleanup is done then it is impossible to debug the original
> problem if it was for and invalid access in libvgl code.
> 
> The latter is a problem for the above cleanup too -- the cleanup clobbers
> the libvgl state.
> 
> However, I could do a more limited cleanup of just restoring the screen and
> keyboard state using ioctls.  ioctl() isn't async-signal safe in POSIX, but
> most ioctls are async-signal safe in practice provided they don't use
> global variables that were set too recently or too non-atomically.
> VGLEnd() uses main globals established by VGLInit().  It depends on program
> order which is not guaranteed in signal handlers for testing the flags set
> by initialization.  The cleanup now also does:
> - screen clearing.  This breaks debugging and is dangerous if the bug is
>    in screen clearing.  Screen clearing is also very slow, and belongs
>    in the kernel.  It is now done 4 times per libvgl use, but its
>    slowness is limited by the kernel not clearing properly and the
>    kernel not allowing the application to mmap() the whole physical
>    frame buffer in some versions of FreeBSD (this breaks panning).
> - munmap().  This breaks debugging.
> - free() of backing store buffer and the main VGLDisplay data.  This breaks
>    debugging and is unnecessary if we are going to exit.

You can only try to do async-safe calls from SIGSEGV handler, to not
break a limited chances to success. Also you should not modify in-memory
state for the library, for the same reason, and then it fits your goal
of keeping the state intact for debugging. Then you should return to
re-create the original situation causing and not kill(2).

I find it weird to try to debug mode-switched code without serial console.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190331121015.GK1923>