Date: Fri, 4 Jan 2019 07:56:42 +0100 From: Michal Meloun <melounmichal@gmail.com> To: Dennis Clarke <dclarke@blastwave.org>, freebsd-arm@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Mark Millard <marklmi@yahoo.com> Subject: Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) Message-ID: <ddae0990-f68b-549e-18da-1388e772a3ff@freebsd.org> In-Reply-To: <d99f28fd-5c6c-db6f-2d78-9ea6a697af2e@blastwave.org> References: <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <A6A58CE3-062B-4B79-A8C2-ADFDAA04C6AF@yahoo.com> <13f5e4dd-33fb-2170-e31a-1b5d5f155869@freebsd.org> <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com> <2E3F6196-4652-40D2-937F-8860B6005A35@yahoo.com> <d99f28fd-5c6c-db6f-2d78-9ea6a697af2e@blastwave.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 29.12.2018 18:47, Dennis Clarke wrote: > On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote: >> >> On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote: >> >>> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> >>> wrote: >>> >>>> Mark, >>>> this is known problem with qemu-user-static. >>>> Emulation of every single interruptible syscall is broken by design (it >>>> have signal related races). Theses races cannot be solved without major >>>> rewrite of syscall emulation code. >>>> Unfortunately, nobody actively works on this, I think. >>>> > > Following along here quietly and I had to blink at this a few times. > Is there a bug report somewhere within the qemu world related to this > 'broken by design' qemu feature? Firstly, I apologize for late answer. Writing a technically accurate but still comprehensible report is extremely difficult for me. Major design issue with qemu-user is the fact that guest (blocking / interruptible) syscalls must be emulated atomically, including delivering of asynchronous signals (including signals originated by other thread). This is something that cannot be emulated precisely by user mode program, without specific kernel support. Let me explain this in a little more details. Assume that we have following trivial code: void sig_alarm_handler(…) { if (!done) { do some work; alarm(10); } } void foo(void) { install_signal_handler(SIGALARM, sig_alarm_handler); alarm(10); do some work; while (true) { rv = select(…, NULL); if (rv == 0) do some work; else if (rv != EINTR) Report error end exit; } } In native environment, this code works well. It calls alarm signal handler every 10s, irrespective if signal is fired in the program code or in libc implementation of select() or if program is waiting in kernel part of select() syscall. In qemu-user environment, things get significantly harder. Qemu can deliver signals to guest only on instruction boundary, the guest signal handler should see emulated CPU context in consistent state. But kernel can deliver signal to qemu in any time. Due to this, qemu must store delivered signals into queue and emit these later, when emulator steps over next instruction boundary. Assume that qemu just emulates 'syscall' instruction from guest select() call. Also assume that no other signals (but SIGALARM) are generated, and socket used in select() never received or transmits any data. The first version of qemu-user code emulating select() was: abi_long do_freebsd_select(..) { convert input guest arguments to host; rv = select(…); convert output host arguments to guest; return(rv); } But this is very racy. If alarm signal is fired before select(…) enters kernel, qemu queues it (but does not deliver it to guest because it isn't on instruction boundary) and continues in emulation. And because (in our case) select() waits indefinitely, alarm signal is never delivered to guest and whole program hangs. Actual qemu code emulating select() looks like: abi_long do_freebsd_select(..) { convert input guest arguments to host; sigfillset(&mask); sigprocmask(SIG_BLOCK, &mask, &omask); if (ts->signal_pending) { sigprocmask(SIG_SETMASK, &omask, NULL); /* We have a signal pending so just poll select() and return. */ tv2.tv_sec = tv2.tv_usec = 0; ret = select(…, , &tv2)); if (ret == 0) ret = TARGET_EINTR; } else { ret = pselect(…, &omask)); sigprocmask(SIG_SETMASK, &omask, NULL); } convert output host arguments to guest; return(rv); } This look a much better. The code blocks all signals first, then checks if any signal is pending. If yes, then does not-blocking select() (because timeout is zero) and correctly returns EINTR immediately. Otherwise, it uses other variant of select(), pselect() which adjusts right signal mask itself. That's mean that syscall is called with blocked signal delivery, but kernel adjusts right sigmask before it waits for event. While this looks like perfect solution and this code closes all races from first version, then it doesn't. pselect() uses different semantic that select(), it doesn't update timeout argument. So this solution is also inappropriate. Moreover, I think, we don't have p<foo> equivalents for all blocking syscalls. Mark, I hope that this is also the answer to your question posted to hackers@ and also the exploitation why you see hang. Linux uses different approach to overcome this issue, safe_syscall -> https://gitlab.collabora.com/tomeu/qemu/commit/4d330cee37a21aabfc619a1948953559e66951a4 It looks like workable workaround, but I'm not sure about ERESTART versus EINTR return values. Imho, this can be problem. I have list of other qemu-user problems (I mean mainly a bsd-user part of qemu code here), not counting normal coding bugs: - code is not thread safety but is used in threaded environment (rw locks for example), - emulate some sysctl's and resource limits / usage behavior is very hard (mainly if we emulate 32-bits guest on 64-bits host) - if host syscall returns ERESTART, we should do full unroll and pass it to guest. - the syscalls emulation should not use the libc functions, but syscall instruction directly. Libc shims can have side effects so we should not to execute it twice. Once in guest, second time in host. - and last major one. At this time, all guest structures are maintained by hand. Due to huge amount of these structures, this is the extreme error prone approach. We should convert this to script generated code, including guest syscalls definition. Again, my apology for slightly (or much) chaotic report, but this is the best what's I capable. Michal
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ddae0990-f68b-549e-18da-1388e772a3ff>