From owner-freebsd-arch@FreeBSD.ORG Tue Oct 23 07:04:25 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4DC016C8 for ; Tue, 23 Oct 2012 07:04:25 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 1B82F8FC0C for ; Tue, 23 Oct 2012 07:04:24 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id q9N74Irg067870 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Oct 2012 00:04:18 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id q9N74HcS067869; Tue, 23 Oct 2012 00:04:17 -0700 (PDT) (envelope-from jmg) Date: Tue, 23 Oct 2012 00:04:17 -0700 From: John-Mark Gurney To: Konstantin Belousov Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121023070417.GD1563@funkthat.com> Mail-Followup-To: Konstantin Belousov , Peter Wemm , freebsd-arch@freebsd.org References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> <20121021024726.GA1563@funkthat.com> <20121021061011.GG35915@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121021061011.GG35915@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Tue, 23 Oct 2012 00:04:18 -0700 (PDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Oct 2012 07:04:25 -0000 Konstantin Belousov wrote this message on Sun, Oct 21, 2012 at 09:10 +0300: > On Sat, Oct 20, 2012 at 07:47:26PM -0700, John-Mark Gurney wrote: > > Peter Wemm wrote this message on Sat, Oct 20, 2012 at 11:10 -0700: > > > Or, another option.. do something like genassym or the many other > > > kernel build tools. aicasm builds and runs a userland tool to > > > generate something to build into the kernel. With sufficient > > > cross-contamination safeguards I wonder if something similar might be > > > able to be done here. > > > > Well, looks like I may this working... Turns out I can't name the file > > .s otherwise config puts it in SFILES which causes all sorts of problems.. > > So, I went w/ .nos, does any one else have any suggestions? > > > > how does this look to people: > > aesni_wrap2.nos optional aesni \ > > dependency "$S/crypto/aesni/aesni_wrap2.c" \ > > compile-with "${CC} -O3 -fPIC -S -o aesni_wrap2.nos $S/crypto/aesni/aesni_wrap2.c" \ > > no-obj no-implicit-rule before-depend \ > > clean "aesni_wrap2.nos" > > aesni_wrap2.o optional aesni \ > > dependency "aesni_wrap2.nos" \ > > compile-with "${NORMAL_S} aesni_wrap2.nos" \ > > no-implicit-rule \ > > clean "aesni_wrap2.o" > > > > We'll have to do something similar in the module Makefile, but that is > > easier... > > > > Also, I thought we had a better way to note that some devices depend > > upon others than just throwing a depend error... If you include aesni > > w/o crypto, you get error about missing cryptodev_if.h... > > > Hm, if such thing is possible, why do you need to compile through the > .S at all ? All you need is to specify the special compiling flags, > including -msse and -msse2. Thanks, I managed to get it down to one... > Note, you shall not need -fPIC, at least for amd64. I would suggest to use > -O2, as well as to try to honour the -g settings. If I don't do -fpic I get: aesni_wrap2.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_32 against `.text' when linking the kernel... If you can explain to me how to get rid of this error, I'll do it.. > Most likely, you can put the ${CFLAGS} on the command line, followed > by -msse -msse2. I can't use CFLAGS because it removes access to the xmmintrin.h header file... It looks like an option is to use: -fpic ${OPTFLAGS:C/^-O2$/-O3/} ${DEBUG} In my testing, -O2 is significantly slower, hence the bump to -O3: x O2.txt + O3.txt N Min Max Median Avg Stddev x 20 1741.3491 1754.987 1752.9267 1751.5602 3.5616947 + 20 2223.217 2244.4501 2242.7028 2240.3183 5.7020691 Difference at 95.0% confidence 488.758 +/- 3.04271 27.9042% +/- 0.173715% (Student's t, pooled s = 4.75391) Those are MB/sec... Index: files.amd64 =================================================================== --- files.amd64 (revision 241041) +++ files.amd64 (working copy) @@ -137,6 +137,11 @@ crypto/aesni/aeskeys_amd64.S optional aesni crypto/aesni/aesni.c optional aesni crypto/aesni/aesni_wrap.c optional aesni +aesni_wrap2.o optional aesni \ + dependency "$S/crypto/aesni/aesni_wrap2.c" \ + compile-with "${CC} -c -fpic ${COPTFLAGS:C/^-O2$/-O3/} ${DEBUG} -o aesni_wrap2.o $S/crypto/aesni/aesni_wrap2.c" \ + no-implicit-rule \ + clean "aesni_wrap2.o" crypto/blowfish/bf_enc.c optional crypto | ipsec crypto/des/des_enc.c optional crypto | ipsec | netsmb crypto/via/padlock.c optional padlock I still need to fix up i386, and will let people review a full patch to address both arches before committing... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."