Date: Fri, 28 Dec 2018 18:56:43 -0800 From: Mark Millard <marklmi@yahoo.com> To: mmel@freebsd.org Cc: freebsd-emulation@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, ports-list freebsd <freebsd-ports@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org> Subject: Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) Message-ID: <2E3F6196-4652-40D2-937F-8860B6005A35@yahoo.com> In-Reply-To: <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com> References: <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <A6A58CE3-062B-4B79-A8C2-ADFDAA04C6AF@yahoo.com> <13f5e4dd-33fb-2170-e31a-1b5d5f155869@freebsd.org> <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote: > On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> = wrote: >=20 >> Mark, >> this is known problem with qemu-user-static. >> Emulation of every single interruptible syscall is broken by design = (it >> have signal related races). Theses races cannot be solved without = major >> rewrite of syscall emulation code. >> Unfortunately, nobody actively works on this, I think. >>=20 >=20 > Thanks for the note setting some expectations. >=20 > On the evidence that I have I expect that more is going on than that: >=20 > A) The hang-up always happens and always in the same place. So > it would appear that no race is involved. >=20 > B) (A) is true even for varying the number of builders in parallel > (so other builds also happening) and the number of jobs allowed per > builder. It also fails for only one builder allowed only one process. > (I get traces from that last kind of context.) >=20 > C) The problem started on the package-building servers for armv7 > and armv6 without qemu-user-static having an update (FreeBSD and > cmake had updates, for example). >=20 > D) The problem is only observed for targeting armv7 and armv6 as > far as I can tell. I've never seen it for aarch64, neither my > own builds nor when I looked at the package-building server > history. >=20 > At least that is what got me started. (I've since learned that > qemu-user-static uses fork in place of a requested vfork.) >=20 > My ktrace/kdump experiment yesterday showed something odd for the > kevent that hangs in cmake: >=20 > 93172 qemu-arm-static CALL = kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0) > 93172 qemu-arm-static STRU struct kevent[] =3D { { ident=3D6, = filter=3DEVFILT_READ, flags=3D0x1<EV_ADD>, fflags=3D0, data=3D0, = udata=3D0x0 } > { ident=3D0x0, filter=3D<invalid=3D0>, flags=3D0, = fflags=3D0x8, data=3D0x1ffff, udata=3D0x0 } } >=20 > Note the 0x2 argument to kevent and the apparently-odd 2nd entry in = the struct > kevent[]. The kevent use is from cmake. >=20 > So far I've not identified a signal being delivered at a time that = would seem > to me to be likely to contribute. (But this is not familiar code so my = judgment > is likely not the best.) >=20 > Note: I normally run FreeBSD using a non-debug kernel, even when using > head. (The kernel does have symbols.) The detail of the signal usage involved leading up to the hang-up, starting from just before the "press return" for the "make FLAVOR=3Dqt5" command that I had entered: The only "Interrupted system call" prior to my killing the hung cmake process was (kdump -H -r -S output): 93172 100717 qemu-arm-static CALL = execve[59](0x10392,0x8605051a0,0x860cf5400) 93172 101706 qemu-arm-static RET nanosleep[240] -1 errno 4 = Interrupted system call 93172 100717 qemu-arm-static NAMI "/bin/sh" 93172 100717 sh RET execve[59] JUSTRETURN 93172 100717 sh CALL readlink[58](0x207a65,0x7fffffffccc0,0x400) This is where ninja (via qemu-arm-static) execve's the amd64-native = /bin/sh (to in turn later run cmake via qemu-arm-static). (This was after the fork = [for the requested vfork].) So it is for the close-down of the thread that was in nanosleep. There were no PSIG's and no sigreturn's prior to the kill according to = the kdump output. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2E3F6196-4652-40D2-937F-8860B6005A35>