Date: Thu, 16 Feb 2023 10:42:38 -0800 From: Mark Millard <marklmi@yahoo.com> To: "kd@freebsd.org" <kd@FreeBSD.org>, "wma@freebsd.org" <wma@FreeBSD.org>, dev-commits-src-main@freebsd.org Cc: Warner Losh <imp@bsdimp.com> Subject: Re: git: 6926e2699ae5 - main - arm: Add support for using VFP in kernel [added new: Called fill_fpregs while the kernel is using the VFP] Message-ID: <402AEA29-B895-4031-99A0-876A39C02157@yahoo.com> In-Reply-To: <4F9A3687-9577-4419-AE1B-D02A4C9212ED@yahoo.com> References: <3A143148-895F-472B-9AFB-5F1AA0FD1FA0@yahoo.com> <782B252E-60AC-4036-BD74-46B95A31B337@yahoo.com> <4F9A3687-9577-4419-AE1B-D02A4C9212ED@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Adding: Small c++ program that leads to a FreeBSD crash when I force a core-dump (using control-\) during runs .] On Feb 16, 2023, at 00:09, Mark Millard <marklmi@yahoo.com> wrote: > [A very simple program gets the failure under gdb > or lldb of example breakpoints.] >=20 > On Feb 15, 2023, at 20:29, Mark Millard <marklmi@yahoo.com> wrote: >=20 >> On Feb 15, 2023, at 16:08, Mark Millard <marklmi@yahoo.com> wrote: >>=20 >>> Kornel Dul=C4=99ba <kd_at_FreeBSD.org> wrote on >>> Date: Sat, 04 Feb 2023 19:22:23 UTC : >>>=20 >>>> The branch main has been updated by kd: >>>>=20 >>>> URL: = https://cgit.FreeBSD.org/src/commit/?id=3D6926e2699ae55080f860488895a2a9aa= 6e6d9b4d >>>>=20 >>>> commit 6926e2699ae55080f860488895a2a9aa6e6d9b4d >>>> Author: Kornel Dul=C4=99ba <kd@FreeBSD.org> >>>> AuthorDate: 2023-02-04 12:59:30 +0000 >>>> Commit: Kornel Dul=C4=99ba <kd@FreeBSD.org> >>>> CommitDate: 2023-02-04 19:21:43 +0000 >>>>=20 >>>> arm: Add support for using VFP in kernel >>>>=20 >>>> Add missing logic to allow in-kernel VFP usage for ARMv7 NEON. >>>> The implementation is strongly based on arm64 code. >>>> It introduces a family of fpu_kern_* functions to enable the usage >>>> of VFP instructions in kernel. >>>> Apart from that the existing armv7 VFP logic was modified, >>>> taking into account that the state of the VFP registers can now >>>> be modified in the kernel. >>>>=20 >>>> Co-developed by: Wojciech Macek <wma@FreeBSD.org> >>>> Sponsored by: Stormshield >>>> Obtained from: Semihalf >>>> Reviewed by: andrew >>>> Differential Revision: https://reviews.freebsd.org/D37419 >>>> --- >>>> lib/libthread_db/arch/arm/libpthread_md.c | 21 ++-- >>>> sys/arm/arm/exec_machdep.c | 49 ++++---- >>>> sys/arm/arm/machdep.c | 1 + >>>> sys/arm/arm/machdep_kdb.c | 31 ++++- >>>> sys/arm/arm/swtch-v6.S | 8 +- >>>> sys/arm/arm/swtch.S | 8 +- >>>> sys/arm/arm/vfp.c | 182 +++++++++++++++++++++++++++++- >>>> sys/arm/arm/vm_machdep.c | 6 +- >>>> sys/arm/include/fpu.h | 7 ++ >>>> sys/arm/include/pcb.h | 5 + >>>> sys/arm/include/reg.h | 12 +- >>>> sys/arm/include/vfp.h | 17 +++ >>>> 12 files changed, 293 insertions(+), 54 deletions(-) >>>=20 >>> [This is a somewhat adjusted version of a note replying >>> to a Warner note about a panic someone got during a >>> process coredump that was happening.] >>>=20 >>> Just a possible point, given recent kernel floating >>> point work: >>>=20 >>> I tried to do a typical build and test of some >>> benchmark programs that I sometimes use that involve >>> floating point in some of the programs, some use with >>> multithreading involved. (As FreeBSD and g++ progress >>> I tend to do this once and a while, not as often on >>> armv7 as on aarch64.) >>>=20 >>> On armv7, I now usually get a message about a failure >>> of an internal cross-check, which also leads to the >>> program being stopped early. The messaging from run >>> to run varies what the failure is, but the runs should >>> not vary and should not fail the cross-checks --and >>> previously did not, including when I last tried armv7. >>> (Not recently.) >>>=20 >>> For the specific example failures, the initial serial >>> (single thread) test with float involved works but the >>> following multi-thread test in the same program fails >>> and causes the program to stop when it notices there >>> is a problem. (On occasion the cross-check does does >>> not detect a problem.) >>>=20 >>> The programs that do not test floating point do not >>> fail. (Same algorithm on integral types.) These can >>> involve floating point outside the algorithm >>> benchmarked, but with no multi-threading involved for >>> such and no floating point based cross-checks involved. >>>=20 >>> At this point it is far from obvious to me how I >>> would trackdown the specifics of what leads to the >>> failed cross-checks. But the above is suggestive of >>> there being problems for armv7 handling of saving >>> and restoring floating point context for >>> multi-threading in a process, at least. I've no clue >>> if such are strictly limited to the floating point >>> values that show up vs. if there might be wider >>> memory handling problems that result in the process. >>>=20 >>=20 >> Further runs of the benchmark program show that I also >> get cross-check failures for single-threaded (the first >> way it tests). >>=20 >> But it turns out that, even for single treaded execution >> of the algorithm benchmarked, it is not run on the >> process's initial thread but instead on a created thread. >>=20 >> Turns out that for a debug armv7 kernel (debug is not >> what I normally run) attempting a bt in gdb can lead to >> a kernel panic (td =3D=3D curthread failed) related to >> floating point handling: >>=20 >> . . . >> (gdb) br serial_kernel_runner >> Breakpoint 1 at 0x1db34: serial_kernel_runner. (6 locations) >> (gdb) br parallel_kernel_runner >> Breakpoint 2 at 0x1b43c: parallel_kernel_runner. (6 locations) >> (gdb) run >> Starting program: = /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32= -FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++-cpulockdown=20= >> . . . >>=20 >> Breakpoint 1, serial_kernel_runner<float, unsigned short> = (clock_info=3D..., laps=3D3, memry=3D2, ki=3D...) at = acpphint_kernelrunners.cpp:69 >> 69 static auto serial_kernel_runner >> (gdb) bt >> #0 serial_panic: Assertion td =3D=3D curthread failed at = /usr/main-src/sys/arm/arm/exec_machdep.c:103 >> cpuid =3D 3 >> time =3D 1676519530 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> pc =3D 0xc05f04a0 lr =3D 0xc007ab0c = (db_trace_self_wrapper+0x30) >> sp =3D 0xe28ea960 fp =3D 0xe28eaa78 >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> pc =3D 0xc007ab0c lr =3D 0xc02ddc44 (vpanic+0x140) >> sp =3D 0xe28eaa80 fp =3D 0xe28eaaa0 >> r4 =3D 0x00000100 r5 =3D 0x00000000 >> r6 =3D 0xc0790bb4 r7 =3D 0xc0b1b930 >> vpanic() at vpanic+0x140 >> pc =3D 0xc02ddc44 lr =3D 0xc02dda28 (dump_savectx) >> sp =3D 0xe28eaaa8 fp =3D 0xe28eaaac >> r4 =3D 0xe28eaad0 r5 =3D 0xbfbfe150 >> r6 =3D 0xe28eaad0 r7 =3D 0xc076a096 >> r8 =3D 0xdb8a47f4 r9 =3D 0x00000016 >> r10 =3D 0x00000040 >> dump_savectx() at dump_savectx >> pc =3D 0xc02dda28 lr =3D 0xc05f3354 (get_vfpcontext+0xb8) >> sp =3D 0xe28eaab4 fp =3D 0xe28eaac8 >> get_vfpcontext() at get_vfpcontext+0xb8 >> pc =3D 0xc05f3354 lr =3D 0xc0611148 (cpu_ptrace+0x38) >> sp =3D 0xe28eaad0 fp =3D 0xe28eabe8 >> r4 =3D 0xdb75cba0 r5 =3D 0xbfbfe150 >> cpu_ptrace() at cpu_ptrace+0x38 >> pc =3D 0xc0611148 lr =3D 0xc0360f4c (kern_ptrace+0x810) >> sp =3D 0xe28eabf0 fp =3D 0xe28eac70 >> r4 =3D 0xe583dba0 r5 =3D 0x00000000 >> r6 =3D 0xdb8a47a8 r10 =3D 0x00000040 >> kern_ptrace() at kern_ptrace+0x810 >> pc =3D 0xc0360f4c lr =3D 0xc0360550 (sys_ptrace+0x1cc) >> sp =3D 0xe28eac78 fp =3D 0xe28eadc0 >> r4 =3D 0xe583de5c r5 =3D 0xe583dba0 >> r6 =3D 0xbfbfe150 r7 =3D 0x00000000 >> r8 =3D 0x00000000 r9 =3D 0xe583de50 >> r10 =3D 0xdb756730 >> sys_ptrace() at sys_ptrace+0x1cc >> pc =3D 0xc0360550 lr =3D 0xc0613b48 (swi_handler+0x170) >> sp =3D 0xe28eadc8 fp =3D 0xe28eae38 >> r4 =3D 0xe583dba0 r5 =3D 0x00000001 >> r6 =3D 0xc090b220 r7 =3D 0x00000000 >> r8 =3D 0x00000000 r9 =3D 0xe583de50 >> swi_handler() at swi_handler+0x170 >> pc =3D 0xc0613b48 lr =3D 0xc05f2d90 (swi_exit) >> sp =3D 0xe28eae40 fp =3D 0xbfbfe128 >> r4 =3D 0x00000042 r5 =3D 0x22e61c20 >> r6 =3D 0xbfbfe150 r7 =3D 0x0000001a >> r8 =3D 0x00424124 r9 =3D 0x00000108 >> r10 =3D 0x00000040 >> swi_exit() at swi_exit >> pc =3D 0xc05f2d90 lr =3D 0xc05f2d90 (swi_exit) >> sp =3D 0xe28eae40 fp =3D 0xbfbfe128 >> KDB: enter: panic >> [ thread pid 5438 tid 106943 ] >> Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! >>=20 >> Note: the code was built via g++12 but using libc++, >> not libstdc++. >>=20 >> So I tried the b=3Dprogram variant that does not tryin to >> lock down which CPUs are used by the threads (a completely >> C++20 standard program variant, not FreeBSD specific for >> its used source code). Failure again . . . >>=20 >> (gdb) br serial_kernel_runner >> Breakpoint 1 at 0x1c1bc: serial_kernel_runner. (6 locations) >> (gdb) br parallel_kernel_runner >> Breakpoint 2 at 0x19ac8: parallel_kernel_runner. (6 locations) >> (gdb) run >> Starting program: = /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32= -FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++=20 >> . . . >> Breakpoint 1, serial_kernel_runner<float, unsigned short> = (clock_info=3D..., laps=3D3, memry=3D2, ki=3D...) at = acpphint_kernelrunners.cpp:69 >> 69 static auto serial_kernel_runner >> (gdb) bt >> #0 serial_kernel_runner<float, unsigned short> = (clock_info=3D...,panic: Assertion td =3D=3D curthread failed at = /usr/main-src/sys/arm/arm/exec_machdep.c:103 >> cpuid =3D 0 >> time =3D 1676520400 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> pc =3D 0xc05f04a0 lr =3D 0xc007ab0c = (db_trace_self_wrapper+0x30) >> sp =3D 0xe2964960 fp =3D 0xe2964a78 >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> pc =3D 0xc007ab0c lr =3D 0xc02ddc44 (vpanic+0x140) >> sp =3D 0xe2964a80 fp =3D 0xe2964aa0 >> r4 =3D 0x00000100 r5 =3D 0x00000000 >> r6 =3D 0xc0790bb4 r7 =3D 0xc0b1b930 >> vpanic() at vpanic+0x140 >> pc =3D 0xc02ddc44 lr =3D 0xc02dda28 (dump_savectx) >> sp =3D 0xe2964aa8 fp =3D 0xe2964aac >> r4 =3D 0xe2964ad0 r5 =3D 0xbfbfe158 >> r6 =3D 0xe2964ad0 r7 =3D 0xc076a096 >> r8 =3D 0xdb7a511c r9 =3D 0x00000016 >> r10 =3D 0x00000040 >> dump_savectx() at dump_savectx >> pc =3D 0xc02dda28 lr =3D 0xc05f3354 (get_vfpcontext+0xb8) >> sp =3D 0xe2964ab4 fp =3D 0xe2964ac8 >> get_vfpcontext() at get_vfpcontext+0xb8 >> pc =3D 0xc05f3354 lr =3D 0xc0611148 (cpu_ptrace+0x38) >> sp =3D 0xe2964ad0 fp =3D 0xe2964be8 >> r4 =3D 0xdb7ca3e0 r5 =3D 0xbfbfe158 >> cpu_ptrace() at cpu_ptrace+0x38 >> pc =3D 0xc0611148 lr =3D 0xc0360f4c (kern_ptrace+0x810) >> sp =3D 0xe2964bf0 fp =3D 0xe2964c70 >> r4 =3D 0xdb76fba0 r5 =3D 0x00000000 >> r6 =3D 0xdb7a50d0 r10 =3D 0x00000040 >> kern_ptrace() at kern_ptrace+0x810 >> pc =3D 0xc0360f4c lr =3D 0xc0360550 (sys_ptrace+0x1cc) >> sp =3D 0xe2964c78 fp =3D 0xe2964dc0 >> r4 =3D 0xdb76fe5c r5 =3D 0xdb76fba0 >> r6 =3D 0xbfbfe158 r7 =3D 0x00000000 >> r8 =3D 0x00000000 r9 =3D 0xdb76fe50 >> r10 =3D 0xdb754000 >> sys_ptrace() at sys_ptrace+0x1cc >> pc =3D 0xc0360550 lr =3D 0xc0613b48 (swi_handler+0x170) >> sp =3D 0xe2964dc8 fp =3D 0xe2964e38 >> r4 =3D 0xdb76fba0 r5 =3D 0x00000001 >> r6 =3D 0xc090b220 r7 =3D 0x00000000 >> r8 =3D 0x00000000 r9 =3D 0xdb76fe50 >> swi_handler() at swi_handler+0x170 >> pc =3D 0xc0613b48 lr =3D 0xc05f2d90 (swi_exit) >> sp =3D 0xe2964e40 fp =3D 0xbfbfe130 >> r4 =3D 0x00000042 r5 =3D 0x22e61c20 >> r6 =3D 0xbfbfe158 r7 =3D 0x0000001a >> r8 =3D 0x00424124 r9 =3D 0x00000108 >> r10 =3D 0x00000040 >> swi_exit() at swi_exit >> pc =3D 0xc05f2d90 lr =3D 0xc05f2d90 (swi_exit) >> sp =3D 0xe2964e40 fp =3D 0xbfbfe130 >> KDB: enter: panic >> [ thread pid 1107 tid 100140 ] >> Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! >>=20 >> For reference (whitespace may not have >> been preserved): >>=20 >> void >> get_vfpcontext(struct thread *td, mcontext_vfp_t *vfp) >> { >> struct pcb *pcb; >>=20 >> MPASS(td =3D=3D curthread); >>=20 >> pcb =3D td->td_pcb; >> if ((pcb->pcb_fpflags & PCB_FP_STARTED) !=3D 0) { >> critical_enter(); >> vfp_store(&pcb->pcb_vfpstate, false); >> critical_exit(); >> } >> KASSERT(pcb->pcb_vfpsaved =3D=3D &pcb->pcb_vfpstate, >> ("Called get_vfpcontext while the kernel is using the = VFP")); >> memcpy(vfp->mcv_reg, pcb->pcb_vfpstate.reg, >> sizeof(vfp->mcv_reg)); >> vfp->mcv_fpscr =3D pcb->pcb_vfpstate.fpscr; >> } >>=20 >> Unfortunately the benchmark program is far from being a >> minimalist/simple example. >>=20 >> I'm not sure what FreeBSD might have around that would >> have floating point in use but be simple, and possibly >> standardly available, to see if a simpler context is >> available for analogous testing. >>=20 >=20 > The program, an example way to build it such that > it can lead to crashes, and 2 ways to get the > FreeBSD crash with it (native armv7 context): >=20 > // # cc -std=3Dc17 -pedantic -g -O3 simple_dbl.c > // > // # gdb a.out > // (gdb) br test > // (gdb) run > // FreeBSD CRASHES > // > // # lldb a.out > // (lldb) br set -F test > // FreeBSD CRASHES >=20 > #include <stdlib.h> >=20 > _Bool test(double v) { > return v<0.5; > } >=20 > int main(void) { > return test(drand48()); > } Generating a FreeBSD crash during a core dump for this program looks like: # ./a.out ^\panic: Called fill_fpregs while the kernel is using the VFP cpuid =3D 3 time =3D 1676570748 KDB: stack backtrace: db_trace_self() at db_trace_self pc =3D 0xc05f04a0 lr =3D 0xc007ab0c = (db_trace_self_wrapper+0x30) sp =3D 0xe340b790 fp =3D 0xe340b8a8 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc =3D 0xc007ab0c lr =3D 0xc02ddc44 (vpanic+0x140) sp =3D 0xe340b8b0 fp =3D 0xe340b8d0 r4 =3D 0x00000100 r5 =3D 0x00000000 r6 =3D 0xc078f79c r7 =3D 0xc0b1b930 vpanic() at vpanic+0x140 pc =3D 0xc02ddc44 lr =3D 0xc02dda28 (dump_savectx) sp =3D 0xe340b8d8 fp =3D 0xe340b8dc r4 =3D 0xe5a68a80 r5 =3D 0xe29fbe90 r6 =3D 0xe2a132f0 r7 =3D 0xc5763140 r8 =3D 0xe2a132e0 r9 =3D 0xe5a68a80 r10 =3D 0xe340b960 dump_savectx() at dump_savectx pc =3D 0xc02dda28 lr =3D 0xc05fd5a4 (set_regs) sp =3D 0xe340b8e4 fp =3D 0xe340b8f8 set_regs() at set_regs pc =3D 0xc05fd5a4 lr =3D 0xc02617a4 (elf32_get_fpregset+0x2c) sp =3D 0xe340b900 fp =3D 0xe340b908 r4 =3D 0xe2a132f0 r5 =3D 0xc0261778 elf32_get_fpregset() at elf32_get_fpregset+0x2c pc =3D 0xc02617a4 lr =3D 0xc025f6b0 (elf32_coredump+0x308) sp =3D 0xe340b910 fp =3D 0xe340b988 r4 =3D 0xc090a65c r10 =3D 0xe340b960 elf32_coredump() at elf32_coredump+0x308 pc =3D 0xc025f6b0 lr =3D 0xc02e2af8 (sigexit+0xd18) sp =3D 0xe340b990 fp =3D 0xe340bcf8 r4 =3D 0x0000004e r5 =3D 0xe3e37300 r6 =3D 0x00000000 r7 =3D 0xc025f3a8 r8 =3D 0xd79c79ec r9 =3D 0xe3e37274 r10 =3D 0x00000000 sigexit() at sigexit+0xd18 pc =3D 0xc02e2af8 lr =3D 0xc02e3420 (postsig+0x12c) sp =3D 0xe340bd00 fp =3D 0xe340bd88 r4 =3D 0x00000003 r5 =3D 0xe32c23e0 r6 =3D 0xe340bd20 r7 =3D 0xe340bd18 r8 =3D 0xd79c7928 r9 =3D 0xe32cfab8 r10 =3D 0x00000002 postsig() at postsig+0x12c pc =3D 0xc02e3420 lr =3D 0xc02e73ec (ast_sig+0x11c) sp =3D 0xe340bd90 fp =3D 0xe340be08 r4 =3D 0xe32c23e0 r5 =3D 0xd79c79ec r6 =3D 0xc077b982 r7 =3D 0x00000000 r8 =3D 0xd79c7928 r9 =3D 0x00000ab8 r10 =3D 0x00000000 ast_sig() at ast_sig+0x11c pc =3D 0xc02e73ec lr =3D 0xc0348950 (ast_handler+0xe0) sp =3D 0xe340be10 fp =3D 0xe340be28 r4 =3D 0xe340be40 r5 =3D 0x0000000e r6 =3D 0x00004000 r7 =3D 0xc0973e3c r8 =3D 0xe32c23e0 r9 =3D 0x00000001 ast_handler() at ast_handler+0xe0 pc =3D 0xc0348950 lr =3D 0xc0348860 (ast+0x20) sp =3D 0xe340be30 fp =3D 0xe340be38 r4 =3D 0xe340be40 r5 =3D 0xe32c23e0 r6 =3D 0x00000000 r7 =3D 0x000001c6 r8 =3D 0x20032030 r9 =3D 0x204ae084 ast() at ast+0x20 pc =3D 0xc0348860 lr =3D 0xc05f2dcc (swi_exit+0x3c) sp =3D 0xe340be40 fp =3D 0xbfbfec60 r4 =3D 0x60000013 r5 =3D 0xe32c23e0 swi_exit() at swi_exit+0x3c pc =3D 0xc05f2dcc lr =3D 0xc05f2dcc (swi_exit+0x3c) sp =3D 0xe340be40 fp =3D 0xbfbfec60 KDB: enter: panic [ thread pid 3640 tid 100207 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! Note: Having done a dump before reboot after this lead to following boot to fail while processing the dump: . . . Writing crash summary to /var/crash/core.txt.7. panic: Called fill_fpregs while the kernel is using the VFP . . . (Rebooting again worked.) The C++ program, with comments for the usage is: // # c++ -std=3Dc++20 -pedantic -g -O3 -pthread = dbl_and_ull_via_async.cpp // or: // # g++12 -std=3Dc++20 -stdlib=3Dlibc++ -pedantic -g -O3 -pthread = -Wl,-rpath=3D/usr/local/lib/gcc12 dbl_and_ull_via_async.cpp // then: // ./a.out // control-\ # to force the process core dump. // FreeBSD CRASHES #include <limits> // std::numeric_limits #include <future> // std::future, std::async, std::launch::async #include <cstdlib> // std::abort int main(void) { static_assert(std::numeric_limits<double>::radix=3D=3D2,"double's = radix is not 2 and is unhandled"); constexpr unsigned int ull_width =3D std::numeric_limits<unsigned = long long>::digits; constexpr unsigned int dbl_width =3D = std::numeric_limits<double>::digits; constexpr unsigned int use_width =3D (dbl_width<ull_width) ? = dbl_width : ull_width; constexpr unsigned long long bound =3D (1ull<<use_width)-1ull; unsigned long long n =3D 0ull; double n_as_dbl =3D n; auto sequential=3D std::async ( std::launch::async , [&n,&n_as_dbl]() { while (n < bound) { if (n_as_dbl !=3D (double)n) break; n++; n_as_dbl+=3D 1.0; } } ); sequential.wait(); if (n_as_dbl !=3D (double)n) std::abort(); // testing same n_as_dbl = value? return 0; } Note: The program has never generated an example of n_as_dbl !=3D = (double)n so far. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?402AEA29-B895-4031-99A0-876A39C02157>