Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Jan 2019 07:56:42 +0100
From:      Michal Meloun <melounmichal@gmail.com>
To:        Dennis Clarke <dclarke@blastwave.org>, freebsd-arm@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Mark Millard <marklmi@yahoo.com>
Subject:   Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
Message-ID:  <ddae0990-f68b-549e-18da-1388e772a3ff@freebsd.org>
In-Reply-To: <d99f28fd-5c6c-db6f-2d78-9ea6a697af2e@blastwave.org>
References:  <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <A6A58CE3-062B-4B79-A8C2-ADFDAA04C6AF@yahoo.com> <13f5e4dd-33fb-2170-e31a-1b5d5f155869@freebsd.org> <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com> <2E3F6196-4652-40D2-937F-8860B6005A35@yahoo.com> <d99f28fd-5c6c-db6f-2d78-9ea6a697af2e@blastwave.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.12.2018 18:47, Dennis Clarke wrote:
> On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
>>
>> On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com>
>>> wrote:
>>>
>>>> Mark,
>>>> this is known problem with qemu-user-static.
>>>> Emulation of every single interruptible syscall is broken by design (it
>>>> have signal related races). Theses races cannot be solved without major
>>>> rewrite of syscall emulation code.
>>>> Unfortunately, nobody actively works on this, I think.
>>>>
> 
> Following along here quietly and I had to blink at this a few times.
> Is there a bug report somewhere within the qemu world related to this
>  'broken by design' qemu feature?

Firstly, I apologize for late answer. Writing a technically accurate but
still comprehensible report is extremely difficult for me.

Major design issue with qemu-user is the fact that guest (blocking /
interruptible) syscalls must be emulated atomically, including
delivering of asynchronous signals (including signals originated by
other thread).
This is something that cannot be emulated precisely by user mode
program, without specific kernel support. Let me explain this in a
little more details.

Assume that we have following trivial code:
void sig_alarm_handler(…)
{
  if (!done) {
    do some work;
    alarm(10);
  }
}

void foo(void)
{
  install_signal_handler(SIGALARM, sig_alarm_handler);
  alarm(10);
  do some work;
  while (true) {
    rv = select(…, NULL);
    if (rv == 0)
      do some work;
    else if (rv != EINTR)
      Report error end exit;
  }
}

In native environment, this code works well. It calls alarm signal
handler every 10s, irrespective if signal is fired in the program code
or in libc implementation of select() or if program is waiting in kernel
part of select() syscall.

In qemu-user environment, things get significantly harder. Qemu can
deliver signals to guest only on instruction boundary, the guest signal
handler should see emulated CPU context in consistent state. But kernel
can deliver signal to qemu in any time. Due to this, qemu must store
delivered signals into queue and emit these later, when emulator steps
over next instruction boundary.
Assume that qemu just emulates 'syscall' instruction from guest select()
call. Also assume that no other signals (but SIGALARM) are generated,
and socket used in select() never received or transmits any data.

The first version of qemu-user code emulating select() was:
abi_long do_freebsd_select(..)
{
 convert input guest arguments to host;
 rv = select(…);
 convert output host arguments to guest;
 return(rv);
}

But this is very racy. If alarm signal is fired before select(…) enters
kernel, qemu queues it (but does not deliver it to guest because it
isn't on instruction boundary) and continues in emulation. And because
(in our case) select() waits indefinitely, alarm signal is never
delivered to guest and whole program hangs.

Actual qemu code emulating select() looks like:
abi_long do_freebsd_select(..)
{
  convert input guest arguments to host;
  sigfillset(&mask);
  sigprocmask(SIG_BLOCK, &mask, &omask);
  if (ts->signal_pending) {
    sigprocmask(SIG_SETMASK, &omask, NULL);
   /* We have a signal pending so just poll select() and return. */
   tv2.tv_sec = tv2.tv_usec = 0;
   ret = select(…, , &tv2));
     if (ret == 0)
       ret = TARGET_EINTR;
  } else {
    ret = pselect(…, &omask));
    sigprocmask(SIG_SETMASK, &omask, NULL);
  }
  convert output host arguments to guest;
  return(rv);
}

This look a much better. The code blocks all signals first, then checks
if any signal is pending. If yes, then does not-blocking select()
(because timeout is zero) and correctly returns EINTR immediately.
Otherwise, it uses other variant of select(), pselect() which adjusts
right signal mask itself.
That's mean that syscall is called with blocked signal delivery, but
kernel adjusts right sigmask before it waits for event. While this looks
like perfect solution and this code closes all races from first version,
then it doesn't. pselect() uses different semantic that select(), it
doesn't update timeout argument. So this solution is also inappropriate.
Moreover, I think, we don't have p<foo> equivalents for all blocking
syscalls.
Mark, I hope that this is also the answer to your question posted to
hackers@ and also the exploitation why you see hang.

Linux uses different approach to overcome this issue, safe_syscall ->
https://gitlab.collabora.com/tomeu/qemu/commit/4d330cee37a21aabfc619a1948953559e66951a4
It looks like workable workaround, but I'm not sure about ERESTART
versus EINTR return values. Imho, this can be problem.

I have list of other qemu-user problems (I mean mainly a bsd-user part
of qemu code here), not counting normal coding bugs:
- code is not thread safety but is used in threaded environment (rw
locks for example),
- emulate  some sysctl's and resource limits / usage behavior is very
hard  (mainly if we emulate 32-bits guest on 64-bits host)
- if host syscall returns ERESTART, we should do full unroll and pass it
to guest.
- the syscalls emulation should not use the libc functions, but syscall
instruction directly. Libc shims can have side effects so we should not
to execute it twice. Once in guest, second time in host.
- and last major one. At this time, all guest structures are maintained
by hand. Due to huge amount of these structures, this is the extreme
error prone approach.  We should convert this to script generated code,
including guest syscalls definition.

Again, my apology for slightly (or much) chaotic report, but this is the
best what's I capable.

Michal



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ddae0990-f68b-549e-18da-1388e772a3ff>