Date: Fri, 28 Dec 2018 18:56:43 -0800 From: Mark Millard <marklmi@yahoo.com> To: mmel@freebsd.org Cc: freebsd-emulation@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, ports-list freebsd <freebsd-ports@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org> Subject: Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) Message-ID: <2E3F6196-4652-40D2-937F-8860B6005A35@yahoo.com> In-Reply-To: <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com> References: <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <A6A58CE3-062B-4B79-A8C2-ADFDAA04C6AF@yahoo.com> <13f5e4dd-33fb-2170-e31a-1b5d5f155869@freebsd.org> <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:
> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> wrote:
>
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design (it
>> have signal related races). Theses races cannot be solved without major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>>
>
> Thanks for the note setting some expectations.
>
> On the evidence that I have I expect that more is going on than that:
>
> A) The hang-up always happens and always in the same place. So
> it would appear that no race is involved.
>
> B) (A) is true even for varying the number of builders in parallel
> (so other builds also happening) and the number of jobs allowed per
> builder. It also fails for only one builder allowed only one process.
> (I get traces from that last kind of context.)
>
> C) The problem started on the package-building servers for armv7
> and armv6 without qemu-user-static having an update (FreeBSD and
> cmake had updates, for example).
>
> D) The problem is only observed for targeting armv7 and armv6 as
> far as I can tell. I've never seen it for aarch64, neither my
> own builds nor when I looked at the package-building server
> history.
>
> At least that is what got me started. (I've since learned that
> qemu-user-static uses fork in place of a requested vfork.)
>
> My ktrace/kdump experiment yesterday showed something odd for the
> kevent that hangs in cmake:
>
> 93172 qemu-arm-static CALL kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0)
> 93172 qemu-arm-static STRU struct kevent[] = { { ident=6, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 }
> { ident=0x0, filter=<invalid=0>, flags=0, fflags=0x8, data=0x1ffff, udata=0x0 } }
>
> Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
> kevent[]. The kevent use is from cmake.
>
> So far I've not identified a signal being delivered at a time that would seem
> to me to be likely to contribute. (But this is not familiar code so my judgment
> is likely not the best.)
>
> Note: I normally run FreeBSD using a non-debug kernel, even when using
> head. (The kernel does have symbols.)
The detail of the signal usage involved leading up to the hang-up,
starting from just before the "press return" for the "make FLAVOR=qt5"
command that I had entered:
The only "Interrupted system call" prior to my killing the hung cmake
process was (kdump -H -r -S output):
93172 100717 qemu-arm-static CALL execve[59](0x10392,0x8605051a0,0x860cf5400)
93172 101706 qemu-arm-static RET nanosleep[240] -1 errno 4 Interrupted system call
93172 100717 qemu-arm-static NAMI "/bin/sh"
93172 100717 sh RET execve[59] JUSTRETURN
93172 100717 sh CALL readlink[58](0x207a65,0x7fffffffccc0,0x400)
This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to
in turn later run cmake via qemu-arm-static). (This was after the fork [for the
requested vfork].) So it is for the close-down of the thread that was in
nanosleep.
There were no PSIG's and no sigreturn's prior to the kill according to the
kdump output.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2E3F6196-4652-40D2-937F-8860B6005A35>
