From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 18:10:39 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A0D09B0B for ; Sat, 20 Oct 2012 18:10:39 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1AAC68FC20 for ; Sat, 20 Oct 2012 18:10:38 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so1210605lbd.13 for ; Sat, 20 Oct 2012 11:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=emNBav21AvL9imAZg6/lygDfj9VtOdh6AF2YRMUcLqM=; b=zsb7Z0i+TaUZYLlsqGVK+CW5ETBrXkxQN1xNeZ5bYtzHzEbKTNXCFUJR9gjrWUmj4r id5CjXPOWR2VCubewdrOWRsMqUkwiq4iJgfzc89vXarKPi2bqFeDuJusbfoVBCNWOkxG v2FlqyvGyFjLygz1sSrEiXaZx5Z14AW7L2u+k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=emNBav21AvL9imAZg6/lygDfj9VtOdh6AF2YRMUcLqM=; b=FUjfL4Zd8UBIvbZHgTo2A5MR8oj+me1Mv319DtNp4mFJZ43+71TEjheLr7OkTVhACy Mo8fu16/L5eMlrq5s6fgKL1g17fSNhZNjahGvxC4c4vUWPrHcmUNgaW8/sIhrIYKQVUN V3jEcz6HuegRd33gSkAX+J5fWbPqbwfdbNzL+74R2/5PM+pBJyEtkpAtkVJ9UagqooMQ /kBdh0v315bVcTqTjT5OLiyyuSjKllcM5mL/X4wsg+CnnUic2ja8hfmdsi3g3X3EK1ab wQUp8JhAz1WwD7ZWu4fltO3U5dOHLeMjYxlCNjRPNcCWLJ3zIZAxP6XmpkY2NLRjuoGS MDQA== MIME-Version: 1.0 Received: by 10.112.99.1 with SMTP id em1mr1905839lbb.31.1350756637958; Sat, 20 Oct 2012 11:10:37 -0700 (PDT) Received: by 10.112.100.230 with HTTP; Sat, 20 Oct 2012 11:10:37 -0700 (PDT) In-Reply-To: <20121020171124.GU1967@funkthat.com> References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> Date: Sat, 20 Oct 2012 11:10:37 -0700 Message-ID: Subject: Re: using SSE2 in kernel C code (improving AES-NI module) From: Peter Wemm To: Konstantin Belousov , freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmPZfpWdbnhd10EMai6Rwgb/i/YC25VoPRjO38i1TF9qapTyYHzXvXukc4ze4zIgc2fiPhl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 18:10:39 -0000 On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney wrote: > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300: >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: >> > So, the AES-NI module already uses SSE2 instructions, but it does so >> > only in assembly. I have improved the performance of the AES-NI >> > modules implementation, but this involves me using additional SSE2 >> > instructions. >> > >> > In order to keep my sanity, I did part of the new code in C using >> > gcc native types and xmmintrin.h, but we do not support this header in >> > the kernel.. This means we cannot simply add the new code to the >> > kernel... >> > >> > Any good ideas on how to integrate this code into the kernel build? > > [...] > >> >> The current structure of the aes-ni driver is partly enforced by the >> issue you noted. We cannot use sse intristics in the kernel, and >> huge inline assembler fragments are hard to write. >> >> I prefer to have the separate .S files with the optimized code, >> hand-written. If needed, I offer you a help with transition. I would >> need a full patch to rewrite the code. > > Are you sure you want to do this? It'll involve writing around 500 > lines of assembly besides the constants... And it isn't simple like > the aesni_enc where we have a single loop for the rounds... I've > posted a tar.gz to overlay onto sys/crypto/aesni at: > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz Rather than go straight to assembler, why not use the __builtins? static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); __m128i xtweak, ret; /* set up xor mask */ xtweak = _mm_shuffle_epi32(inp, 0x93); xtweak = _mm_srai_epi32(xtweak, 31); xtweak &= alphamask; /* next term */ ret = _mm_slli_epi32(inp, 1); ret ^= xtweak; return ret; } --> static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA }; __m128i xtweak, ret; /* set up xor mask */ xtweak = __builtin_ia32_pshufd (inp, 0x93); xtweak = __builtin_ia32_psradi128(xtweak, 31); xtweak &= alphamask; /* next term */ ret = __builtin_ia32_pslldi128(inp, 1); ret ^= xtweak; return ret; } I know I skipped the details like data types, but most of the meat of those functions collapses to a simple wrapper around a __builtin. Or, another option.. do something like genassym or the many other kernel build tools. aicasm builds and runs a userland tool to generate something to build into the kernel. With sufficient cross-contamination safeguards I wonder if something similar might be able to be done here. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell