From owner-freebsd-arch@FreeBSD.ORG Fri Oct 19 23:38:34 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 63D1BE61 for ; Fri, 19 Oct 2012 23:38:34 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 396DF8FC0A for ; Fri, 19 Oct 2012 23:38:33 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id q9JNcXJA061035 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 19 Oct 2012 16:38:33 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id q9JNcXwT061034 for freebsd-arch@FreeBSD.org; Fri, 19 Oct 2012 16:38:33 -0700 (PDT) (envelope-from jmg) Date: Fri, 19 Oct 2012 16:38:33 -0700 From: John-Mark Gurney To: freebsd-arch@FreeBSD.org Subject: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121019233833.GS1967@funkthat.com> Mail-Followup-To: freebsd-arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 19 Oct 2012 16:38:33 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 23:38:34 -0000 So, the AES-NI module already uses SSE2 instructions, but it does so only in assembly. I have improved the perofrmance of the AES-NI modules implementation, but this involves me using additional SSE2 instructions. In order to keep my sanity, I did part of the new code in C using gcc native types and xmmintrin.h, but we do not support this header in the kernel.. This means we cannot simply add the new code to the kernel... Any good ideas on how to integrate this code into the kernel build? I have used the trick of producing assembly of the C file with gcc -S, and then compiling the assembly into the kernel, but I'm not sure if that's the best way, and even if it is the best, how I'd do the generation as part of the kernel build... Or would it be ok to commit both, and require a regeneration each time the C file is updated? In my testing in userland w/o the opencrypto framework overhead, the old code would only get about ~250MB/sec.. With the new code I get ~2200MB/sec... Sample code: static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); __m128i xtweak, ret; /* set up xor mask */ xtweak = _mm_shuffle_epi32(inp, 0x93); xtweak = _mm_srai_epi32(xtweak, 31); xtweak &= alphamask; /* next term */ ret = _mm_slli_epi32(inp, 1); ret ^= xtweak; return ret; } -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."