Date: Sat, 20 Oct 2012 23:07:29 +0200 From: Jilles Tjoelker <jilles@stack.nl> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-arch@freebsd.org Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121020210729.GA84086@stack.nl> In-Reply-To: <20121020181826.GE35915@deviant.kiev.zoral.com.ua> References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> <CAGE5yCoM92rU7Ca7C7_x=3vXW%2BqO9Zc0uQhPURuMbstPDvq9yg@mail.gmail.com> <20121020181826.GE35915@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 20, 2012 at 09:18:26PM +0300, Konstantin Belousov wrote: > On Sat, Oct 20, 2012 at 11:10:37AM -0700, Peter Wemm wrote: > > On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney <jmg@funkthat.com> wrote: > > > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300: > > >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: > > >> > So, the AES-NI module already uses SSE2 instructions, but it does so > > >> > only in assembly. I have improved the performance of the AES-NI > > >> > modules implementation, but this involves me using additional SSE2 > > >> > instructions. > > >> > In order to keep my sanity, I did part of the new code in C using > > >> > gcc native types and xmmintrin.h, but we do not support this header in > > >> > the kernel.. This means we cannot simply add the new code to the > > >> > kernel... > > >> > Any good ideas on how to integrate this code into the kernel build? > > > [...] > > >> The current structure of the aes-ni driver is partly enforced by the > > >> issue you noted. We cannot use sse intristics in the kernel, and > > >> huge inline assembler fragments are hard to write. > > >> I prefer to have the separate .S files with the optimized code, > > >> hand-written. If needed, I offer you a help with transition. I would > > >> need a full patch to rewrite the code. > > > Are you sure you want to do this? It'll involve writing around 500 > > > lines of assembly besides the constants... And it isn't simple like > > > the aesni_enc where we have a single loop for the rounds... I've > > > posted a tar.gz to overlay onto sys/crypto/aesni at: > > > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz > > Rather than go straight to assembler, why not use the __builtins? > > static inline __m128i > > xts_crank_lfsr(__m128i inp) > > { > > const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); > > __m128i xtweak, ret; > > > > /* set up xor mask */ > > xtweak = _mm_shuffle_epi32(inp, 0x93); > > xtweak = _mm_srai_epi32(xtweak, 31); > > xtweak &= alphamask; > > > > /* next term */ > > ret = _mm_slli_epi32(inp, 1); > > ret ^= xtweak; > > > > return ret; > > } > > --> > > static inline __m128i > > xts_crank_lfsr(__m128i inp) > > { > > const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA }; > > __m128i xtweak, ret; > > > > /* set up xor mask */ > > xtweak = __builtin_ia32_pshufd (inp, 0x93); > > xtweak = __builtin_ia32_psradi128(xtweak, 31); > > xtweak &= alphamask; > > > > /* next term */ > > ret = __builtin_ia32_pslldi128(inp, 1); > > ret ^= xtweak; > > > > return ret; > > } > > I know I skipped the details like data types, but most of the meat of > > those functions collapses to a simple wrapper around a __builtin. As far as I understand, the __builtins are mostly a compiler implementation detail. They are not as standardized as the intrinsics from *mmintrin.h. > Are builtins available for -mno-sse compilation ? They are not. I did notice that Clang will compile __builtin_ia32_movnti down to a regular MOV if SSE2 is not enabled, but this seems rarely useful. > I think we can try to reimplement the builtins needed with inline > assembly. This should be possible but slightly ugly. > > Or, another option.. do something like genassym or the many other > > kernel build tools. aicasm builds and runs a userland tool to > > generate something to build into the kernel. With sufficient > > cross-contamination safeguards I wonder if something similar might be > > able to be done here. Is the C compiler with additional flags -mmmx -msse2 also a possible build tool? If *mmintrin.h are made available, that should work, right? One detail is that GCC and Clang have their own versions of these header files. GCC also needs a dummy mm_malloc.h; Clang's xmmintrin.h refrains from including this in a free-standing environment. Of course, all code compiled in such a way must only be run with a valid FPU context, since the compiler may use SSE instructions anywhere. -- Jilles Tjoelker
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121020210729.GA84086>