Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Feb 2023 20:29:35 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        "kd@freebsd.org" <kd@FreeBSD.org>, "wma@freebsd.org" <wma@FreeBSD.org>, dev-commits-src-main@freebsd.org
Cc:        Warner Losh <imp@bsdimp.com>
Subject:   Re: git: 6926e2699ae5 - main - arm: Add support for using VFP in kernel [td == curthread failed form of panic for bt in gdb]
Message-ID:  <782B252E-60AC-4036-BD74-46B95A31B337@yahoo.com>
In-Reply-To: <3A143148-895F-472B-9AFB-5F1AA0FD1FA0@yahoo.com>
References:  <3A143148-895F-472B-9AFB-5F1AA0FD1FA0@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 15, 2023, at 16:08, Mark Millard <marklmi@yahoo.com> wrote:

> Kornel Dul=C4=99ba <kd_at_FreeBSD.org> wrote on
> Date: Sat, 04 Feb 2023 19:22:23 UTC :
>=20
>> The branch main has been updated by kd:
>>=20
>> URL: =
https://cgit.FreeBSD.org/src/commit/?id=3D6926e2699ae55080f860488895a2a9aa=
6e6d9b4d
>>=20
>> commit 6926e2699ae55080f860488895a2a9aa6e6d9b4d
>> Author: Kornel Dul=C4=99ba <kd@FreeBSD.org>
>> AuthorDate: 2023-02-04 12:59:30 +0000
>> Commit: Kornel Dul=C4=99ba <kd@FreeBSD.org>
>> CommitDate: 2023-02-04 19:21:43 +0000
>>=20
>> arm: Add support for using VFP in kernel
>>=20
>> Add missing logic to allow in-kernel VFP usage for ARMv7 NEON.
>> The implementation is strongly based on arm64 code.
>> It introduces a family of fpu_kern_* functions to enable the usage
>> of VFP instructions in kernel.
>> Apart from that the existing armv7 VFP logic was modified,
>> taking into account that the state of the VFP registers can now
>> be modified in the kernel.
>>=20
>> Co-developed by: Wojciech Macek <wma@FreeBSD.org>
>> Sponsored by: Stormshield
>> Obtained from: Semihalf
>> Reviewed by: andrew
>> Differential Revision: https://reviews.freebsd.org/D37419
>> ---
>> lib/libthread_db/arch/arm/libpthread_md.c | 21 ++--
>> sys/arm/arm/exec_machdep.c | 49 ++++----
>> sys/arm/arm/machdep.c | 1 +
>> sys/arm/arm/machdep_kdb.c | 31 ++++-
>> sys/arm/arm/swtch-v6.S | 8 +-
>> sys/arm/arm/swtch.S | 8 +-
>> sys/arm/arm/vfp.c | 182 +++++++++++++++++++++++++++++-
>> sys/arm/arm/vm_machdep.c | 6 +-
>> sys/arm/include/fpu.h | 7 ++
>> sys/arm/include/pcb.h | 5 +
>> sys/arm/include/reg.h | 12 +-
>> sys/arm/include/vfp.h | 17 +++
>> 12 files changed, 293 insertions(+), 54 deletions(-)
>=20
> [This is a somewhat adjusted version of a note replying
> to a Warner note about a panic someone got during a
> process coredump that was happening.]
>=20
> Just a possible point, given recent kernel floating
> point work:
>=20
> I tried to do a typical build and test of some
> benchmark programs that I sometimes use that involve
> floating point in some of the programs, some use with
> multithreading involved. (As FreeBSD and g++ progress
> I tend to do this once and a while, not as often on
> armv7 as on aarch64.)
>=20
> On armv7, I now usually get a message about a failure
> of an internal cross-check, which also leads to the
> program being stopped early. The messaging from run
> to run varies what the failure is, but the runs should
> not vary and should not fail the cross-checks --and
> previously did not, including when I last tried armv7.
> (Not recently.)
>=20
> For the specific example failures, the initial serial
> (single thread) test with float involved works but the
> following multi-thread test in the same program fails
> and causes the program to stop when it notices there
> is a problem. (On occasion the cross-check does does
> not detect a problem.)
>=20
> The programs that do not test floating point do not
> fail. (Same algorithm on integral types.) These can
> involve floating point outside the algorithm
> benchmarked, but with no multi-threading involved for
> such and no floating point based cross-checks involved.
>=20
> At this point it is far from obvious to me how I
> would trackdown the specifics of what leads to the
> failed cross-checks. But the above is suggestive of
> there being problems for armv7 handling of saving
> and restoring floating point context for
> multi-threading in a process, at least. I've no clue
> if such are strictly limited to the floating point
> values that show up vs. if there might be wider
> memory handling problems that result in the process.
>=20

Further runs of the benchmark program show that I also
get cross-check failures for single-threaded (the first
way it tests).

But it turns out that, even for single treaded execution
of the algorithm benchmarked, it is not run on the
process's initial thread but instead on a created thread.

Turns out that for a debug armv7 kernel (debug is not
what I normally run) attempting a bt in gdb can lead to
a kernel panic (td =3D=3D curthread failed) related to
floating point handling:

. . .
(gdb) br serial_kernel_runner
Breakpoint 1 at 0x1db34: serial_kernel_runner. (6 locations)
(gdb) br parallel_kernel_runner
Breakpoint 2 at 0x1b43c: parallel_kernel_runner. (6 locations)
(gdb) run
Starting program: =
/root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32=
-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++-cpulockdown=20=

. . .

Breakpoint 1, serial_kernel_runner<float, unsigned short> =
(clock_info=3D..., laps=3D3, memry=3D2, ki=3D...) at =
acpphint_kernelrunners.cpp:69
69      static auto serial_kernel_runner
(gdb) bt
#0  serial_panic: Assertion td =3D=3D curthread failed at =
/usr/main-src/sys/arm/arm/exec_machdep.c:103
cpuid =3D 3
time =3D 1676519530
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc =3D 0xc05f04a0  lr =3D 0xc007ab0c =
(db_trace_self_wrapper+0x30)
         sp =3D 0xe28ea960  fp =3D 0xe28eaa78
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc =3D 0xc007ab0c  lr =3D 0xc02ddc44 (vpanic+0x140)
         sp =3D 0xe28eaa80  fp =3D 0xe28eaaa0
         r4 =3D 0x00000100  r5 =3D 0x00000000
         r6 =3D 0xc0790bb4  r7 =3D 0xc0b1b930
vpanic() at vpanic+0x140
         pc =3D 0xc02ddc44  lr =3D 0xc02dda28 (dump_savectx)
         sp =3D 0xe28eaaa8  fp =3D 0xe28eaaac
         r4 =3D 0xe28eaad0  r5 =3D 0xbfbfe150
         r6 =3D 0xe28eaad0  r7 =3D 0xc076a096
         r8 =3D 0xdb8a47f4  r9 =3D 0x00000016
        r10 =3D 0x00000040
dump_savectx() at dump_savectx
         pc =3D 0xc02dda28  lr =3D 0xc05f3354 (get_vfpcontext+0xb8)
         sp =3D 0xe28eaab4  fp =3D 0xe28eaac8
get_vfpcontext() at get_vfpcontext+0xb8
         pc =3D 0xc05f3354  lr =3D 0xc0611148 (cpu_ptrace+0x38)
         sp =3D 0xe28eaad0  fp =3D 0xe28eabe8
         r4 =3D 0xdb75cba0  r5 =3D 0xbfbfe150
cpu_ptrace() at cpu_ptrace+0x38
         pc =3D 0xc0611148  lr =3D 0xc0360f4c (kern_ptrace+0x810)
         sp =3D 0xe28eabf0  fp =3D 0xe28eac70
         r4 =3D 0xe583dba0  r5 =3D 0x00000000
         r6 =3D 0xdb8a47a8 r10 =3D 0x00000040
kern_ptrace() at kern_ptrace+0x810
         pc =3D 0xc0360f4c  lr =3D 0xc0360550 (sys_ptrace+0x1cc)
         sp =3D 0xe28eac78  fp =3D 0xe28eadc0
         r4 =3D 0xe583de5c  r5 =3D 0xe583dba0
         r6 =3D 0xbfbfe150  r7 =3D 0x00000000
         r8 =3D 0x00000000  r9 =3D 0xe583de50
        r10 =3D 0xdb756730
sys_ptrace() at sys_ptrace+0x1cc
         pc =3D 0xc0360550  lr =3D 0xc0613b48 (swi_handler+0x170)
         sp =3D 0xe28eadc8  fp =3D 0xe28eae38
         r4 =3D 0xe583dba0  r5 =3D 0x00000001
         r6 =3D 0xc090b220  r7 =3D 0x00000000
         r8 =3D 0x00000000  r9 =3D 0xe583de50
swi_handler() at swi_handler+0x170
         pc =3D 0xc0613b48  lr =3D 0xc05f2d90 (swi_exit)
         sp =3D 0xe28eae40  fp =3D 0xbfbfe128
         r4 =3D 0x00000042  r5 =3D 0x22e61c20
         r6 =3D 0xbfbfe150  r7 =3D 0x0000001a
         r8 =3D 0x00424124  r9 =3D 0x00000108
        r10 =3D 0x00000040
swi_exit() at swi_exit
         pc =3D 0xc05f2d90  lr =3D 0xc05f2d90 (swi_exit)
         sp =3D 0xe28eae40  fp =3D 0xbfbfe128
KDB: enter: panic
[ thread pid 5438 tid 106943 ]
Stopped at      kdb_enter+0x54: ldrb    r15, [r15, r15, ror r15]!

Note: the code was built via g++12 but using libc++,
not libstdc++.

So I tried the b=3Dprogram variant that does not tryin to
lock down which CPUs are used by the threads (a completely
C++20 standard program variant, not FreeBSD specific for
its used source code). Failure again . . .

(gdb) br serial_kernel_runner
Breakpoint 1 at 0x1c1bc: serial_kernel_runner. (6 locations)
(gdb) br parallel_kernel_runner
Breakpoint 2 at 0x19ac8: parallel_kernel_runner. (6 locations)
(gdb) run
Starting program: =
/root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32=
-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++=20
. . .
Breakpoint 1, serial_kernel_runner<float, unsigned short> =
(clock_info=3D..., laps=3D3, memry=3D2, ki=3D...) at =
acpphint_kernelrunners.cpp:69
69      static auto serial_kernel_runner
(gdb) bt
#0  serial_kernel_runner<float, unsigned short> (clock_info=3D...,panic: =
Assertion td =3D=3D curthread failed at =
/usr/main-src/sys/arm/arm/exec_machdep.c:103
cpuid =3D 0
time =3D 1676520400
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc =3D 0xc05f04a0  lr =3D 0xc007ab0c =
(db_trace_self_wrapper+0x30)
         sp =3D 0xe2964960  fp =3D 0xe2964a78
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc =3D 0xc007ab0c  lr =3D 0xc02ddc44 (vpanic+0x140)
         sp =3D 0xe2964a80  fp =3D 0xe2964aa0
         r4 =3D 0x00000100  r5 =3D 0x00000000
         r6 =3D 0xc0790bb4  r7 =3D 0xc0b1b930
vpanic() at vpanic+0x140
         pc =3D 0xc02ddc44  lr =3D 0xc02dda28 (dump_savectx)
         sp =3D 0xe2964aa8  fp =3D 0xe2964aac
         r4 =3D 0xe2964ad0  r5 =3D 0xbfbfe158
         r6 =3D 0xe2964ad0  r7 =3D 0xc076a096
         r8 =3D 0xdb7a511c  r9 =3D 0x00000016
        r10 =3D 0x00000040
dump_savectx() at dump_savectx
         pc =3D 0xc02dda28  lr =3D 0xc05f3354 (get_vfpcontext+0xb8)
         sp =3D 0xe2964ab4  fp =3D 0xe2964ac8
get_vfpcontext() at get_vfpcontext+0xb8
         pc =3D 0xc05f3354  lr =3D 0xc0611148 (cpu_ptrace+0x38)
         sp =3D 0xe2964ad0  fp =3D 0xe2964be8
         r4 =3D 0xdb7ca3e0  r5 =3D 0xbfbfe158
cpu_ptrace() at cpu_ptrace+0x38
         pc =3D 0xc0611148  lr =3D 0xc0360f4c (kern_ptrace+0x810)
         sp =3D 0xe2964bf0  fp =3D 0xe2964c70
         r4 =3D 0xdb76fba0  r5 =3D 0x00000000
         r6 =3D 0xdb7a50d0 r10 =3D 0x00000040
kern_ptrace() at kern_ptrace+0x810
         pc =3D 0xc0360f4c  lr =3D 0xc0360550 (sys_ptrace+0x1cc)
         sp =3D 0xe2964c78  fp =3D 0xe2964dc0
         r4 =3D 0xdb76fe5c  r5 =3D 0xdb76fba0
         r6 =3D 0xbfbfe158  r7 =3D 0x00000000
         r8 =3D 0x00000000  r9 =3D 0xdb76fe50
        r10 =3D 0xdb754000
sys_ptrace() at sys_ptrace+0x1cc
         pc =3D 0xc0360550  lr =3D 0xc0613b48 (swi_handler+0x170)
         sp =3D 0xe2964dc8  fp =3D 0xe2964e38
         r4 =3D 0xdb76fba0  r5 =3D 0x00000001
         r6 =3D 0xc090b220  r7 =3D 0x00000000
         r8 =3D 0x00000000  r9 =3D 0xdb76fe50
swi_handler() at swi_handler+0x170
         pc =3D 0xc0613b48  lr =3D 0xc05f2d90 (swi_exit)
         sp =3D 0xe2964e40  fp =3D 0xbfbfe130
         r4 =3D 0x00000042  r5 =3D 0x22e61c20
         r6 =3D 0xbfbfe158  r7 =3D 0x0000001a
         r8 =3D 0x00424124  r9 =3D 0x00000108
        r10 =3D 0x00000040
swi_exit() at swi_exit
         pc =3D 0xc05f2d90  lr =3D 0xc05f2d90 (swi_exit)
         sp =3D 0xe2964e40  fp =3D 0xbfbfe130
KDB: enter: panic
[ thread pid 1107 tid 100140 ]
Stopped at      kdb_enter+0x54: ldrb    r15, [r15, r15, ror r15]!

For reference (whitespace may not have
been preserved):

void
get_vfpcontext(struct thread *td, mcontext_vfp_t *vfp)
{
        struct pcb *pcb;
=20
        MPASS(td =3D=3D curthread);
=20
        pcb =3D td->td_pcb;
        if ((pcb->pcb_fpflags & PCB_FP_STARTED) !=3D 0) {
                critical_enter();
                vfp_store(&pcb->pcb_vfpstate, false);
                critical_exit();
        }
        KASSERT(pcb->pcb_vfpsaved =3D=3D &pcb->pcb_vfpstate,
                ("Called get_vfpcontext while the kernel is using the =
VFP"));
        memcpy(vfp->mcv_reg, pcb->pcb_vfpstate.reg,
                sizeof(vfp->mcv_reg));
        vfp->mcv_fpscr =3D pcb->pcb_vfpstate.fpscr;
}

Unfortunately the benchmark program is far from being a
minimalist/simple example.

I'm not sure what FreeBSD might have around that would
have floating point in use but be simple, and possibly
standardly available, to see if a simpler context is
available for analogous testing.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?782B252E-60AC-4036-BD74-46B95A31B337>