Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Dec 2018 13:02:06 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-emulation@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, ports-list freebsd <freebsd-ports@freebsd.org>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
Message-ID:  <050F532F-85A8-4FF8-A7B0-178598B06BE1@yahoo.com>
In-Reply-To: <3946C94A-FCA6-49FE-ADDB-B042BBE50913@yahoo.com>
References:  <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <5C3F09FE-EA50-452D-98EE-364B7BF3ECD0@yahoo.com> <3946C94A-FCA6-49FE-ADDB-B042BBE50913@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Using ktrace/kdump shows an apperent oddity in the kevent use that
hang-up in cmake, not that I know it causes the hang-up.]

On 2018-Dec-28, at 00:16, Mark Millard <marklmi at yahoo.com> wrote:

> [The historical notes are removed and replaced by partial trace
> information from example hang-ups, not that I've figured out
> what contributes yet.]
>=20
> I ran into the following while trying to get evidence
> about the hang-up for an amd64->armv7 cross-build of
> multimedia/gstreamer1-qt@qt5 .
>=20
> The following from trying to get evidence for the hang-up
> via a manual run of "make multimedia/gstreamer1-qt FLAVOR=3Dqt5=E2=80=9D=

> in a poudriere bulk -i=E2=80=99s interactive mode for the context
> that has the hang-up in normal poudriere-devel runs.
>=20
>=20
> =46rom top after the hang-up (to identify some context):
>=20
> 14528 root          2  52    0   100M    24M    0 kqread  11   0:00   =
0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E =
cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
> 14527 root          2  52    0    88M    13M    0 select  22   0:00   =
0.00% /usr/local/bin/qemu-arm-static ninja -j1 -v all
>=20
> from ps -auxd as well (to identify more context):
>=20
> root       10114    0.0  0.0  10328  1756  1  I+J  13:47       0:00.01 =
|                 `-- make FLAVOR=3Dqt5
> root       14526    0.0  0.0  10204  1792  1  I+J  13:50       0:00.00 =
|                   `-- /bin/sh -e -c (cd =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! =
/usr/bin/env QT_SELE
> root       14527    0.0  0.0  90304 13084  1  I+J  13:50       0:00.09 =
|                     `-- /usr/local/bin/qemu-arm-static ninja -j1 -v =
all
> root       14528    0.0  0.0 102876 25060  1  IJ   13:50       0:00.12 =
|                       `-- /usr/local/bin/qemu-arm-static =
/usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>=20
> I had made a qemu-user-static that enabled do_strace when
> it is used to run cmake or ninja.
>=20
> The only do_strace lines from qemu-arm-static running cmake
> or ninja mentioning process 14528 are included in the sequence:
>=20
> (Before the below was a long list of "14527 fstatat=E2=80=9D lines.
> I=E2=80=99ll note that "'Unknown syscall 545=E2=80=9D is from ppoll =
use.)
>=20
> 82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) =3D 0
> 82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) =3D 0
> 82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) =3D 0
> 82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) =3D 0
> 82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) =
=3D 0
> 82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) =3D 0
> 82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) =3D 0
> 82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, =
0) =3D 0
> 82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", =
0x9fffe200, 0) =3D 0
> 82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) =3D 0
> 82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) =3D 0
> 82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) =3D 0
> 82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) =3D 82401
> 82400 close(6) =3D 0
> =3D 0
> 82400 Unknown syscall 545
> 82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) =3D 0
> 82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) =3D 0
> 82401 close(5) =3D 0
> 82401 open("/dev/null",0,0) =3D 5
> 82401 dup2(5,0,0,-1610620184,-1610620240,0) =3D 0
> 82401 close(5) =3D 0
> 82401 fcntl(0,2,0,-1610620184,-1610620240,0) =3D 0
> 82401 dup2(6,1,0,-1610620184,-1610620240,0) =3D 1
> 82401 fcntl(1,2,0,-1610620184,-1610620240,0) =3D 0
> 82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 =
sigpending(-1610620072,1,0,-191968524,0,0) =3D 0
>=20
> The vfork then close(6) sequence for 82400 vs. the later
> use of 6 in dup2 in 82401 may be rather odd. But it looks
> like qemu-*-static uses do_freebsd_fork to implement
> do_freebsd_vfork, despite reporting vfork before
> calling do_freebsd_vfork. (Does the close(6) appear to
> indicate a race for native operation of ninja for the
> period when the address space is shared?)
>=20
> Ninja has Subprocess::Start code that has:
>=20
> #ifdef POSIX_SPAWN_USEVFORK
>  flags |=3D POSIX_SPAWN_USEVFORK;
> #endif
>=20
>=20
>  if (posix_spawnattr_setflags(&attr, flags) !=3D 0)
>    Fatal("posix_spawnattr_setflags: %s", strerror(errno));
>=20
>  const char* spawned_args[] =3D { "/bin/sh", "-c", command.c_str(), =
NULL };
>  if (posix_spawn(&pid_, "/bin/sh", &action, &attr,
>                  const_cast<char**>(spawned_args), environ) !=3D 0)
>    Fatal("posix_spawn: %s", strerror(errno));
>=20
> that is in use here. I think that this explains the vfork use.
>=20
>=20
> It turns out that putting the hung-up build in the background
> and then killing 82401 with the likes of kill -6 leads to more
> output that had apparently been buffered. It shows the use of
> the (amd64 native) /bin/sh that in turn leads to
> /usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
> native, gets no do_strace output from qemu-arm-static.
>=20
> 82400 sigpending(-1610620072,1,0,-191968524,0,0) =3D 0
> 82400 read(5,0x9fffd368,4096) =3D 58
> 82400 Unknown syscall 545
> 82400 sigpending(-1610620072,1,0,-191968524,0,0) =3D 0
> 82400 read(5,0x9fffd368,4096) =3D 0
> 82400 close(5) =3D 0
> 82400 wait4(82401,-1610620004,0,0,-191968640,0) =3D 82401
> 82400 mmap(0,86016,3,201330690,-1,-1610620169) =3D 0xf4777000
> 82400 gettimeofday(-1610620224,0,4,-1610619944,31,16777216) =3D 0
> 82400 write(1,0xf4950000,283)[1/129] cd =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink && /usr/local/bin/cmake -E cmake_autogen =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake Debug
> =3D 283
> 82400 write(1,0xf4950000,137)FAILED: =
elements/gstqtvideosink/CMakeFiles/gstqt5videosink_autogen =
elements/gstqtvideosink/gstqt5videosink_autogen/mocs_compilation.cpp=20
> =3D 137
> 82400 write(1,0xf4950000,275)cd =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink && /usr/local/bin/cmake -E cmake_autogen =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake Debug
> =3D 275
> 82400 write(1,0xf4950000,5) =3D 2
> =3D 5
>=20
> (Note that some 82400 writes are reporting 82401 information:)
>=20
> 82400 write(1,0xf4950000,49)82401 =
fcntl(2,2,0,-1610620184,-1610620240,0) =3D 0
> =3D 49
> 82400 write(1,0xf4950000,19)82401 close(6) =3D 0
> =3D 19
> 82400 write(1,0xf4950000,401)82401 =
execve("/bin/sh",{"/bin/sh","-c","cd =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink && /usr/local/bin/cmake -E cmake_autogen =
/wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqt=
videosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake =
Debug",NULL})82401 __sysctl({ 0 3 }, 2, 0x9fffda80, 0x9fffdf64, =
0xf5002097, 0x0000000c) =3D 0
> =3D 401
>=20
> (The /bin/sh activity is not logged: /bin/sh is native amd64 code =
here. The
> below is from the later /usr/local/bin/cmake via qemu-arm-static.
>=20
> . . . (much omitted) . . .
>=20
> 82400 write(1,0xf4950000,60)82401 =
mmap(0,28672,3,201330690,-1,-1610621989) =3D 0xf41a0000
> =3D 60
> 82400 write(1,0xf4950000,74)82401 =
clock_gettime(4,-1610621832,4,-199622492,-199622492,-199622656) =3D 0
> =3D 74
> 82400 write(1,0xf4950000,62)82401 =
kqueue(-199622656,0,53102,0,-199622656,-1610621444) =3D 3
> =3D 62
> 82400 write(1,0xf4950000,81)82401 ioctl(3, 0x20006601 { IO =
GRP:0x66('f') CMD:1 LEN:0 }, 0x0000cf6e, ...) =3D 0
> =3D 81
>=20
> . . . (some omitted) . . .
>=20
> (Then there is a fairly long sequence of access's and then a sequence =
of
> fstatat's just before:)
>=20
>=20
> 82400 write(1,0xf4950000,32)82401 write(9,0xf4e1a945,1) =3D 1
> =3D 32
> 82400 write(1,0xf4950000,61)82401 =
clock_gettime(4,-1610622624,4,100863,1,-199483392) =3D 0
> =3D 61
> 82400 write(1,0xf4950000,106)82401 =
kevent(3,-1610688200,2,-1610688200,1024,0)qemu: uncaught target signal 6 =
(Abort trap) - core dumped
> =3D 106

ktrace/kdump shows an oddity for the kevent that hangs-up in cmake
(from a different run so a different process ID):

 93172 qemu-arm-static CALL  =
kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0)
 93172 qemu-arm-static STRU  struct kevent[] =3D { { ident=3D6, =
filter=3DEVFILT_READ, flags=3D0x1<EV_ADD>, fflags=3D0, data=3D0, =
udata=3D0x0 }
             { ident=3D0x0, filter=3D<invalid=3D0>, flags=3D0, =
fflags=3D0x8, data=3D0x1ffff, udata=3D0x0 } }

Note the 0x2 kevent argument and the apparently-odd 2nd entry in the =
struct
kevent[] .

> 82400 write(1,0xf4950000,41)ninja: build stopped: subcommand failed.
> =3D 41
>=20
> So it was hung at the kevent until the kill -6 .
>=20
>=20
> Via another experiment ninja was at the time waiting
> in ppoll:
>=20
> Reading symbols from ninja...done.
> [New LWP 73023]
> Core was generated by `ninja'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0xf4e5e0dc in _ppoll () from /lib/libc.so.7
> (gdb) bt
> #0  0xf4e5e0dc in _ppoll () from /lib/libc.so.7
> #1  0x00033bf0 in SubprocessSet::DoWork (this=3D<optimized out>) at =
src/subprocess-posix.cc:237
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>=20



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?050F532F-85A8-4FF8-A7B0-178598B06BE1>