Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Apr 2023 12:02:19 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Kyle Evans <kevans@freebsd.org>
Cc:        src-committers@freebsd.org, dev-commits-src-all@freebsd.org,  dev-commits-src-main@freebsd.org
Subject:   Re: git: ce5a210997da - main - openzfs: arm64: implement kfpu_begin/kfpu_end
Message-ID:  <CANCZdfoY-ZnHp%2BH3J=iB7VS%2B1tO0ViZKWeexr0bicgTeckQCZg@mail.gmail.com>
In-Reply-To: <202304261724.33QHOl6x001199@gitrepo.freebsd.org>
References:  <202304261724.33QHOl6x001199@gitrepo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000082ee305fa410b01
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 26, 2023 at 11:24=E2=80=AFAM Kyle Evans <kevans@freebsd.org> wr=
ote:

> The branch main has been updated by kevans:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=3Dce5a210997da3c4064cfe162e760379=
f1fa8b587
>
> commit ce5a210997da3c4064cfe162e760379f1fa8b587
> Author:     Kyle Evans <kevans@FreeBSD.org>
> AuthorDate: 2023-04-26 17:23:48 +0000
> Commit:     Kyle Evans <kevans@FreeBSD.org>
> CommitDate: 2023-04-26 17:24:00 +0000
>
>     openzfs: arm64: implement kfpu_begin/kfpu_end
>
>     This is part one of a fix for booting with ZFS on arm64 using
>     accelerated checksum implementations.  Checksum benchmarking will
>     attempt to use the FPU, so we currently panic quickly on boot.  BLAKE=
3
>     is still broken, as it clobbers x18 and we promptly discover that fac=
t
>     as soon as we attempt to fetch curthread in kfpu_end().
>
>     Note that _STANDALONE is special-cased here, but ideally we wouldn't =
be
>     building the code that uses kfpu_begin()/kfpu_end() at all in the
> loader
>     environment.
>

Yes. As noted elsewhere, the upstream is a mess. There's no way to say "I
only want
the generic implementation" for these functions. So we went through some
crazy hoops
to try to disable that... only to run into a buzz-saw of upstream changes
that broke
the crazy hoops so now we're (bogusly in my opinion) forced to include the
acceleration
routines... which we effectively disable here... but since the boot loader
is space constrained
and the link time doesn't know that these routines will never be called,
they bloat
the boot loader... all because there's no effective way to turn off all the
accelerated
versions for arm like this happens to be for x86.

I've not yet had a chance to implement the obvious solution of having an
option just to
turn every acceleration off and not even try to compile it in because the
currount cross
coupling makes that insanely hard (which is how we wind up with needing the
_STANDALONE
stuff here: if we could just disable all smd stuff, this file would never
be included and we'd
not need this silly implementation).

<rant off>

Warner


>     Discussed with: imp (a bit)
>     Differential Revision:  https://reviews.freebsd.org/D39448
> ---
>  .../include/os/freebsd/spl/sys/simd_aarch64.h      | 28
> +++++++++++++++++++++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.=
h
> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h
> index 7d2e2db28017..9edbc5f40455 100644
> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h
> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h
> @@ -44,13 +44,39 @@
>  #define        _FREEBSD_SIMD_AARCH64_H
>
>  #include <sys/types.h>
> +#include <sys/ucontext.h>
>  #include <machine/elf.h>
> +#include <machine/fpu.h>
>  #include <machine/md_var.h>
> +#include <machine/pcb.h>
> +
> +#ifdef _STANDALONE
>
>  #define        kfpu_allowed()          0
> -#define        kfpu_initialize(tsk)    do {} while (0)
>  #define        kfpu_begin()            do {} while (0)
>  #define        kfpu_end()              do {} while (0)
> +
> +#else
> +
> +/*
> + * XXX kfpu_allowed() should be 1, but this is pending a fix to the BLAK=
E3
> + * generated assembly to avoid clobbering x18.  Turn it back on after th=
at
> + * lands.
> + */
> +#define        kfpu_allowed()          0
> +#define        kfpu_begin() do {
>      \
> +       if (__predict_false(!is_fpu_kern_thread(0)))                    \
> +               fpu_kern_enter(curthread, NULL, FPU_KERN_NOCTX);        \
> +} while(0)
> +
> +#define        kfpu_end() do {
>      \
> +       if (__predict_false(curthread->td_pcb->pcb_fpflags &
> PCB_FP_NOSAVE))    \
> +               fpu_kern_leave(curthread, NULL);                        \
> +} while(0)
> +
> +#endif
> +
> +#define        kfpu_initialize(tsk)    do {} while (0)
>  #define        kfpu_init()             (0)
>  #define        kfpu_fini()             do {} while (0)
>
>

--000000000000082ee305fa410b01
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Wed, Apr 26, 2023 at 11:24=E2=80=
=AFAM Kyle Evans &lt;<a href=3D"mailto:kevans@freebsd.org">kevans@freebsd.o=
rg</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">The branch main has been updated by kevans:<br>
<br>
URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3Dce5a210997da3c406=
4cfe162e760379f1fa8b587" rel=3D"noreferrer" target=3D"_blank">https://cgit.=
FreeBSD.org/src/commit/?id=3Dce5a210997da3c4064cfe162e760379f1fa8b587</a><b=
r>
<br>
commit ce5a210997da3c4064cfe162e760379f1fa8b587<br>
Author:=C2=A0 =C2=A0 =C2=A0Kyle Evans &lt;kevans@FreeBSD.org&gt;<br>
AuthorDate: 2023-04-26 17:23:48 +0000<br>
Commit:=C2=A0 =C2=A0 =C2=A0Kyle Evans &lt;kevans@FreeBSD.org&gt;<br>
CommitDate: 2023-04-26 17:24:00 +0000<br>
<br>
=C2=A0 =C2=A0 openzfs: arm64: implement kfpu_begin/kfpu_end<br>
<br>
=C2=A0 =C2=A0 This is part one of a fix for booting with ZFS on arm64 using=
<br>
=C2=A0 =C2=A0 accelerated checksum implementations.=C2=A0 Checksum benchmar=
king will<br>
=C2=A0 =C2=A0 attempt to use the FPU, so we currently panic quickly on boot=
.=C2=A0 BLAKE3<br>
=C2=A0 =C2=A0 is still broken, as it clobbers x18 and we promptly discover =
that fact<br>
=C2=A0 =C2=A0 as soon as we attempt to fetch curthread in kfpu_end().<br>
<br>
=C2=A0 =C2=A0 Note that _STANDALONE is special-cased here, but ideally we w=
ouldn&#39;t be<br>
=C2=A0 =C2=A0 building the code that uses kfpu_begin()/kfpu_end() at all in=
 the loader<br>
=C2=A0 =C2=A0 environment.<br></blockquote><div><br></div><div>Yes. As note=
d elsewhere, the upstream is a mess. There&#39;s no way to say &quot;I only=
 want</div><div>the generic implementation&quot; for these functions. So we=
 went through some crazy hoops</div><div>to try to disable that... only to =
run into a buzz-saw of upstream changes that broke</div><div>the crazy hoop=
s so now we&#39;re (bogusly in my opinion) forced to include the accelerati=
on</div><div>routines... which we effectively disable here... but since the=
 boot loader is space constrained</div><div>and the link time doesn&#39;t k=
now that these routines will never be called, they bloat</div><div>the boot=
 loader... all because there&#39;s no effective way to turn off all the acc=
elerated</div><div>versions for arm like this happens to be for x86.</div><=
div><br></div><div>I&#39;ve not yet had a chance to implement the obvious s=
olution of having an option just to</div><div>turn every acceleration off a=
nd not even try to compile it in because the currount cross</div><div>coupl=
ing makes that insanely hard (which is how we wind up with needing the _STA=
NDALONE</div><div>stuff here: if we could just disable all smd stuff, this =
file would never be included and we&#39;d</div><div>not need this silly imp=
lementation).</div><div><br></div><div>&lt;rant off&gt;<br></div><div><br><=
/div><div>Warner<br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex">
=C2=A0 =C2=A0 Discussed with: imp (a bit)<br>
=C2=A0 =C2=A0 Differential Revision:=C2=A0 <a href=3D"https://reviews.freeb=
sd.org/D39448" rel=3D"noreferrer" target=3D"_blank">https://reviews.freebsd=
.org/D39448</a><br>
---<br>
=C2=A0.../include/os/freebsd/spl/sys/simd_aarch64.h=C2=A0 =C2=A0 =C2=A0 | 2=
8 +++++++++++++++++++++-<br>
=C2=A01 file changed, 27 insertions(+), 1 deletion(-)<br>
<br>
diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h =
b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h<br>
index 7d2e2db28017..9edbc5f40455 100644<br>
--- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h<br>
+++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_aarch64.h<br>
@@ -44,13 +44,39 @@<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FREEBSD_SIMD_AARCH64_H<br>
<br>
=C2=A0#include &lt;sys/types.h&gt;<br>
+#include &lt;sys/ucontext.h&gt;<br>
=C2=A0#include &lt;machine/elf.h&gt;<br>
+#include &lt;machine/fpu.h&gt;<br>
=C2=A0#include &lt;machine/md_var.h&gt;<br>
+#include &lt;machine/pcb.h&gt;<br>
+<br>
+#ifdef _STANDALONE<br>
<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_allowed()=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 0<br>
-#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_initialize(tsk)=C2=A0 =C2=A0 do {}=
 while (0)<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_begin()=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 do {} while (0)<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_end()=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 do {} while (0)<br>
+<br>
+#else<br>
+<br>
+/*<br>
+ * XXX kfpu_allowed() should be 1, but this is pending a fix to the BLAKE3=
<br>
+ * generated assembly to avoid clobbering x18.=C2=A0 Turn it back on after=
 that<br>
+ * lands.<br>
+ */<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_allowed()=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 0<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_begin() do {=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\<=
br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (__predict_false(!is_fpu_kern_thread(0)))=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fpu_kern_enter(curt=
hread, NULL, FPU_KERN_NOCTX);=C2=A0 =C2=A0 =C2=A0 =C2=A0 \<br>
+} while(0)<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_end() do {=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0\<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (__predict_false(curthread-&gt;td_pcb-&gt;pc=
b_fpflags &amp; PCB_FP_NOSAVE))=C2=A0 =C2=A0 \<br>
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fpu_kern_leave(curt=
hread, NULL);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 \<br>
+} while(0)<br>
+<br>
+#endif<br>
+<br>
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_initialize(tsk)=C2=A0 =C2=A0 do {}=
 while (0)<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_init()=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0(0)<br>
=C2=A0#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 kfpu_fini()=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0do {} while (0)<br>
<br>
</blockquote></div></div>

--000000000000082ee305fa410b01--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoY-ZnHp%2BH3J=iB7VS%2B1tO0ViZKWeexr0bicgTeckQCZg>