Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Oct 2018 06:58:13 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        =?utf-8?Q?Mika=C3=ABl_Urankar?= <mikael.urankar@gmail.com>, Sean Bruno <sbruno@freebsd.org>
Cc:        FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, freeBSD <freebsd-hackers@freebsd.org>, FreeBSD Ports ML <freebsd-ports@freebsd.org>
Subject:   Re: head -r339076 amd64 -> armv7 port cross build attempt with native tools involved: hangs between a cc (wait) and its child ld (uwait)
Message-ID:  <324BD0F0-4017-4395-9B59-B7A8558EA6FD@yahoo.com>
In-Reply-To: <D333D3B5-C7B3-4A48-92E2-673C0FFAA96F@yahoo.com>
References:  <33C58480-1E76-4748-83B4-CB39FAD8584A@yahoo.com> <CAJwjRmS0u6ONZTOX%2B-aFuOjm2FFDR-vkSO8h4j47d5OODPsDjA@mail.gmail.com> <D3CCBEF4-BCEF-4D6F-A503-AAE512D3D875@yahoo.com> <CBB0AC55-9EFE-4B58-8139-CE7CC265BF21@yahoo.com> <E0E27A7F-D4F5-450B-B6FE-03664E48D3BB@yahoo.com> <220332B7-0B5E-4378-AD48-FDFB8F135A50@yahoo.com> <D333D3B5-C7B3-4A48-92E2-673C0FFAA96F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I have a work around for the specific activity to avoid
the hang.]

On 2018-Oct-27, at 6:00 PM, Mark Millard <marklmi at yahoo.com> wrote:

> [The bigger test still hung up.]
>=20
> On 2018-Oct-27, at 5:30 PM, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [Just the __packed removal patch was sufficient to no longer
>> have the hang problem that I originally reported for the
>> print/texinfo build in poudriere.]
>>=20
>> On 2018-Oct-27, at 4:33 PM, Mark Millard <marklmi at yahoo.com> =
wrote:
>>=20
>>> [Some of this discussion occurred off list. The point here
>>> is not specific to the hang that I originally reported.]
>>>=20
>>> On 2018-Oct-27, at 3:03 PM, Mark Millard <marklmi at yahoo.com> =
wrote:
>>>>=20
>>=20
>> Mika=C3=ABl Urankar is being quoted below:
>>=20
>>>>> . . .
>>>>>=20
>>>>>> There are bugs in qemu that can cause such deadlock, you can try =
these
>>>>>> 2 patches:
>>>>>> =
https://github.com/MikaelUrankar/qemu-bsd-user/commit/9424a5ffde4de2768ab6=
baa45fdbe0dbb56a7371
>>>>>> =
https://github.com/MikaelUrankar/qemu-bsd-user/commit/d6f65a7f07d280b6906d=
499d8e465d4d2026c52b
>>=20
>> Back to me:
>>=20
>>>>> I'll try those later. Thanks. (I need to get back to sleep.)
>>>>>=20
>>>>> It was interesting that attach/detach to the ld process
>>>>> caused it to progress. The rest of the build completed
>>>>> just fine. But that one spot consistently hung up before
>>>>> trying gdb to look at the back trace.
>>>>>=20
>>>>=20
>>>> Looking at the qemu code related to the 2nd patch: the
>>>> structure of the field copies (via __get_user) seems
>>>> very sensitive to the ABI rules for the target and
>>>> how things align and such, given that the structure
>>>> description and code are host code. __packed vs. not
>>>> is possibly not sufficient control to always make things
>>>> match right across all the potential combinations of
>>>> host and target from what I can see.
>>>>=20
>>>> Lack of __packed may prove sufficient for my specific
>>>> context (amd64 host and armv7 target) but it seems
>>>> non-obvious what to do in general.
>>>>=20
>>>> There would also seem to be big endian vs. little endian
>>>> issues on the individual __get_user styles of copies
>>>> when the host and target do not match for a multi-byte
>>>> numeric encoding.
>>>=20
>>> Well, I get the following for:
>>>=20
>>> #include "/usr/include/sys/event.h" // kevent
>>> #include <stddef.h> // offsetof
>>> #include <stdio.h>  // printf
>>>=20
>>> int
>>> main()
>>> {
>>>      printf("%lu\n", (unsigned long) sizeof(struct kevent));
>>>      printf("ident %lu\n", (unsigned long) offsetof(struct kevent, =
ident));
>>>      printf("filter %lu\n", (unsigned long) offsetof(struct kevent, =
filter));
>>>      printf("flags %lu\n", (unsigned long) offsetof(struct kevent, =
flags));
>>>      printf("fflags %lu\n", (unsigned long) offsetof(struct kevent, =
fflags));
>>>      printf("data %lu\n", (unsigned long) offsetof(struct kevent, =
data));
>>>      printf("udata %lu\n", (unsigned long) offsetof(struct kevent, =
udata));
>>>      printf("ext %lu\n", (unsigned long) offsetof(struct kevent, =
ext));
>>>      return 0;
>>> }
>>>=20
>>> (This code avoided warnings for type mismatches with the
>>> printf strings and such.)
>>>=20
>>> amd64 native [host of qemu use] (comments hand added):
>>>=20
>>> # ./a.out
>>> 64
>>> ident 0
>>> filter 8  // NOTE!
>>> flags 10  // NOTE!
>>> fflags 12 // NOTE!
>>> data 16
>>> udata 24
>>> ext 32
>>>=20
>>> (The above is not particularly important but I
>>> include it for completeness.)
>>>=20
>>> armv7 native [target in qemu use] (comments hand added):
>>>=20
>>> # ./a.out
>>> 64       // NOTE vs. below!
>>> ident 0
>>> filter 4 // NOTE vs. above!
>>> flags 6  // NOTE vs. above!
>>> fflags 8 // NOTE vs. above!
>>> data 16  // NOTE vs. below!
>>> udata 24 // NOTE vs. below!
>>> ext 32   // NOTE vs. below!
>>>=20
>>> /usr/include/sys/event.h lacks __packed in both cases.
>>>=20
>>> With __packed in qemu-arm-static's source code
>>> for target_freebsd_kevent I confirm that via
>>> gdb for the qemu-arm-static:
>>>=20
>>> p/d sizeof(struct target_freebsd_kevent)
>>> p/d &((struct target_freebsd_kevent *)0)->ident
>>> p/d &((struct target_freebsd_kevent *)0)->filter
>>> p/d &((struct target_freebsd_kevent *)0)->flags
>>> p/d &((struct target_freebsd_kevent *)0)->fflags
>>> p/d &((struct target_freebsd_kevent *)0)->data
>>> p/d &((struct target_freebsd_kevent *)0)->udata
>>> p/d &((struct target_freebsd_kevent *)0)->ext
>>>=20
>>> reports as the 2nd patch's problem-report
>>> material reports (56,0,4,6,8,12,20,24): not
>>> even the right size.
>>>=20
>>> I also confirm that removing __packed in qemu's
>>> code and rebuilding and then checking with gdb
>>> reported a match to the above armv7 native report
>>> (64,0,4,6,8,16,24,32).
>>>=20
>>> I have not verified __packed used vs. not for any
>>> other combination of host and target platforms.
>>=20
>> Removing the 2 examples of __packed, including the
>> 1 for target_freebsd_kevent, as in Mika=C3=ABl Urankar's
>> 2nd listed patch, was sufficient to avoid the hang
>> that I originally reported. (Technically FreeBSD 11
>> is not involved and so one of the __packed removals
>> is not relevant to my example.)
>>=20
>> I have not applied Mika=C3=ABl Urankar's first listed
>> patch at all. It did not prove necessary for my
>> context.
>>=20
>> Again: the only tested context is amd64 -> armv7
>> (host -> target) under a head -r339076 based
>> build. (So still 12.)
>>=20
>> I'm doing a larger amd64 -> armv7 rebuild (around
>> 210 ports overall) that originally included the
>> problematical hang and a full-bootstrap build
>> of lang/gcc8 (so extensive emulation use after
>> the clang-based stages). Prior to the patch,
>> all smaller attempts also hung at the same
>> place for print/texinfo.
>>=20
>> But I'll only report if this larger test has
>> a problem.
>=20
>=20
> The bigger test still hung up in the same old place.
> A gdb attach/detach sequence against the qemu-arm-static
> for the ld again let it continue from there.
>=20
> Drat. But good to know.


Having lld use -Wl,--no-threads avoids the problem.

Without the option, lld for N "cpus" creates N
or so extra worker threads (besides the thread
for main) plus one more that does something
different. Having only the thread for main (and
possibly one more) avoids the hangups.

In my context, N=3D=3D28 (Hyper-V) or N=3D=3D32 (native
FreeBSD boot) was in use.

Also: The hangups when there were around N+2 threads
total only happened when lld was executed as
emulated code instead of as host-native code. Some
autoconfig activity does not use ${CC} or the like
and so some lld use ends up emulated even when most
of the clang/llvm activity in the poudriere bulk
run is host-native.


Side note:

The ports infrastructure does not have LINKER_TYPE
in use like buildworld buildkernel does, so I did
not use LDFLAGS.lld+=3D-Wl,--no-threads like I do
for buildworld buildkernel . For now I'm using
LDFLAGS.clang+=3D-Wl,--no-threads with
LDFLAGS+=3D${LDFLAGS.${CHOSEN_COMPILER_TYPE}} in
order to select the option when lld is more likely
to be in use. I also avoid the LDFLAGS.clang
assignment for powerpc* families, because lld is
not used in that context (so far).

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?324BD0F0-4017-4395-9B59-B7A8558EA6FD>