From owner-freebsd-arch@FreeBSD.ORG Sun Oct 14 09:53:54 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9F2DFA5C; Sun, 14 Oct 2012 09:53:54 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9C1408FC08; Sun, 14 Oct 2012 09:53:53 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA28687; Sun, 14 Oct 2012 12:53:51 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TNKtL-000Cbi-Dv; Sun, 14 Oct 2012 12:53:51 +0300 Message-ID: <507A8BAE.5060304@FreeBSD.org> Date: Sun, 14 Oct 2012 12:53:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121013 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org Subject: Re: x86 boot code build References: <506C385C.3020400@FreeBSD.org> <506DEB4C.5020508@andric.com> In-Reply-To: <506DEB4C.5020508@andric.com> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Dimitry Andric X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 09:53:54 -0000 on 04/10/2012 23:02 Dimitry Andric said the following: > On 2012-10-03 15:06, Andriy Gapon wrote: >> Currently we produce "slightly" different binaries for x86 boot code depending >> whether MACHINE_CPUARCH is i386 or amd64. I think that there is no good reason >> for this, since in both cases we use exactly the same code and target the same >> classes of machines. In other words, the binaries should be interchangeable[*]. >> >> The difference boils down to using -march=i386 on amd64 while i386 uses default >> compiler flags, which are equivalent to -march=i486 -mtune=generic. > > Yes, I also noticed this inconsistency during some other work in > sys/boot, and I ended up with this diff in my backlog: > > Index: sys/boot/i386/Makefile.inc > =================================================================== > --- sys/boot/i386/Makefile.inc (revision 241194) > +++ sys/boot/i386/Makefile.inc (working copy) > @@ -5,12 +5,13 @@ > BINDIR?= /boot > > LOADER_ADDRESS?=0x200000 > -CFLAGS+= -ffreestanding -mpreferred-stack-boundary=2 \ > +CFLAGS+= -march=i386 -ffreestanding -mpreferred-stack-boundary=2 \ > -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float > +NO_CPU_CFLAGS= > LDFLAGS+= -nostdlib > > .if ${MACHINE_CPUARCH} == "amd64" > -CFLAGS+= -m32 -march=i386 > +CFLAGS+= -m32 > ACFLAGS+= -m32 > LDFLAGS+= -m elf_i386_fbsd > AFLAGS+= --32 > > >> If my analysis is correct, the only thing affected by the flags in the boot code >> is use of leave instruction when -Os is _not_ specified. >> For -march=i386 our gcc prefers using leave. For -march=i486 it thinks that >> movs+pops are faster than leave and so prefers to not use it. If -Os is >> specified, then leave is always used because it results in smaller machine code. >> >> So, as it is now, on amd64 we produce slightly smaller boot binaries where size >> doesn't matter. Where size really matters (-Os) we produce identical binaries. >> >> If we decide that it makes sense to converge i386 and amd64 boot build options, >> which should we pick? > > Well, do we still officially support any real i386 machines? If so, we > should still use -march=i386 for the boot code. Otherwise, let's start > using -march=i486 explicitly. So like: Thank you for the patch! Here is a slightly larger one (including some commented out changes in userboot): http://people.freebsd.org/~avg/boot-march%3di386.diff > Index: sys/boot/i386/Makefile.inc > =================================================================== > --- sys/boot/i386/Makefile.inc (revision 241194) > +++ sys/boot/i386/Makefile.inc (working copy) > @@ -5,12 +5,13 @@ > BINDIR?= /boot > > LOADER_ADDRESS?=0x200000 > -CFLAGS+= -ffreestanding -mpreferred-stack-boundary=2 \ > +CFLAGS+= -march=i486 -ffreestanding -mpreferred-stack-boundary=2 \ > -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float > +NO_CPU_CFLAGS= > LDFLAGS+= -nostdlib > > .if ${MACHINE_CPUARCH} == "amd64" > -CFLAGS+= -m32 -march=i386 > +CFLAGS+= -m32 > ACFLAGS+= -m32 > LDFLAGS+= -m elf_i386_fbsd > AFLAGS+= --32 -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Sun Oct 14 10:34:58 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2EAA624B for ; Sun, 14 Oct 2012 10:34:58 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 76D8D8FC08 for ; Sun, 14 Oct 2012 10:34:57 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA28931; Sun, 14 Oct 2012 13:34:54 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TNLX2-000CfF-VB; Sun, 14 Oct 2012 13:34:53 +0300 Message-ID: <507A954A.4030505@FreeBSD.org> Date: Sun, 14 Oct 2012 13:34:50 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121013 Thunderbird/16.0.1 MIME-Version: 1.0 To: Poul-Henning Kamp Subject: Re: drivers for desktop hardware monitoring chips References: <1775.1349467612@critter.freebsd.dk> In-Reply-To: <1775.1349467612@critter.freebsd.dk> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 10:34:58 -0000 on 05/10/2012 23:06 Poul-Henning Kamp said the following: > In message <506F06FA.4050804@FreeBSD.org>, Andriy Gapon writes: > >> Especially I do not want to call it _the_ "Sensors Framework". > > It doesn't really matter what you call it, it still sucks :-) The code that lets me do something still sucks less than the code that doesn't exist ;-) > See also: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1863154+0+archive/2002/freebsd-current/20021006.freebsd-current Interesting read! But really, I do not have an impression that the code in question deserves any philosophical discussion. There is a famous quote about premature optimization - could there be such a thing as premature "infrastructurization"? That is, trying to generalize something to an infrastructure level when there is no compelling reason to do that. I mean that the fact that we live these many years without not much of sensors code, let alone sensors framework, is pretty telling. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Mon Oct 15 16:01:26 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DD464E5D; Mon, 15 Oct 2012 16:01:26 +0000 (UTC) (envelope-from marcel@xcllnt.net) Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4]) by mx1.freebsd.org (Postfix) with ESMTP id 9565F8FC14; Mon, 15 Oct 2012 16:01:26 +0000 (UTC) Received: from mantonsen-sslvpn-nc.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mail.xcllnt.net (8.14.5/8.14.5) with ESMTP id q9FG1Ijj025081 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Mon, 15 Oct 2012 09:01:19 -0700 (PDT) (envelope-from marcel@xcllnt.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Behavior of madvise(MADV_FREE) From: Marcel Moolenaar In-Reply-To: Date: Mon, 15 Oct 2012 09:01:19 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> To: Jason Evans X-Mailer: Apple Mail (2.1499) Cc: Poul-Henning Kamp , "freebsd-arch@freebsd.org Arch" , Tim LaBerge , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 16:01:27 -0000 On Oct 12, 2012, at 3:05 PM, Jason Evans wrote: > On Oct 12, 2012, at 1:54 PM, Marcel Moolenaar wrote: >> BTW: MADV_DONTNEED in Linux seems to behave like MADV_FREE >> in FreeBSD -- at least according to the manpage. Which makes >> me wonder how standard madvise(2) is anyway. >=20 > MADV_DONTNEED on Linux immediately dissociates the physical page from = the VM mapping, such that subsequent access results in a zero-filled = page being soft-faulted into place. >=20 > MADV_FREE is *way* nicer than MADV_DONTNEED in the context of malloc. = jemalloc has a really discouraging amount of complexity that is directly = a result of working around the performance overhead of MADV_DONTNEED. I've been letting this thread sink in -- responding to last. Vendors, like Juniper want reliable VM statistics to prevent over-provisioning. While the stats don't need to be exact at all times (i.e. instantaneous), having the stats catch up to a new steady state is very desirable. In other words: it's not that helpful to have lots of memory on the inactive queue indefinitely. Also, moving the complexity of exactly which hint to give the kernel under different scenarios isn't that appealing at all. It just doesn't scale. If some VM changes warrant a new hint to madvise(), you may end up changing multiple daemons. It seems better to have just 1 hint (i.e. MADV_FREE) and have the kernel change its behaviour depending on the situation. When there's plenty of memory, you may even ignore the hint. Under severe memory pressure you may want to free up the page right away so that you can give it to some thread that's waiting for a page. At the edge of needing to swap, complex algorithms may be worthwhile -- or maybe not. I don't know. =20 This leads to: 1. Keep MADV_FREE as it behaves in FreeBSD right now or make it even more sloppy. 2. Have an idle thread that moves inactive pages to the cache or free queue if they've been inactive for X minutes, for some tunable X. Have it back off when the pageout daemon kicks in. 3. Have MADV_FREE behave like Linux's MADV_DONTNEED when the machine is under significant/severe/some) memory pressure. Thoughts? --=20 Marcel Moolenaar marcel@xcllnt.net From owner-freebsd-arch@FreeBSD.ORG Mon Oct 15 23:21:26 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A1D2D7E for ; Mon, 15 Oct 2012 23:21:26 +0000 (UTC) (envelope-from mucous@online.telstra.com.au) Received: from p5082-ipadfx21sasajima.aichi.ocn.ne.jp (p5082-ipadfx21sasajima.aichi.ocn.ne.jp [211.17.123.82]) by mx1.freebsd.org (Postfix) with ESMTP id DE8D48FC08 for ; Mon, 15 Oct 2012 23:20:15 +0000 (UTC) Received: from nskntcmgw04p ([61.9.169.164]) by nskntmtas05p.mx.bigpond.com with ESMTP id <20126832205558.DTXZQ68PI.Y20M44WHBNM4.mx.bigpond.com@3V514I7W8B2L>; Tue, 16 Oct 2012 08:20:15 +0900 From: "Telstra BigPond Billing" To: Subject: BigPond Billing - Credit Card Authorization Failure ( CaseID - 4971918 ) Date: Tue, 16 Oct 2012 08:20:15 +0900 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Priority: 3 Message-Id: <20125186214213.D28ER0ZH8.RGKLVO0Z6NQ8.mx.bigpond.com@3IPGD141PJ8O> Content-Type: multipart/mixed; boundary="----=a__ushnyvsu_79_54_32" X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 23:21:26 -0000 ------=a__ushnyvsu_79_54_32 Content-Type: text/plain; charset="windows-1250" Content-Transfer-Encoding: quoted-printable Telstra BigPond Billing and Account Management BigPond BillingAlert - Credit Card Authorization Failure ( CaseID - 48656= 23 ) Dear BigPond valued member,During our monthly maintenance, ourbilling an= d account management department was unable to authorize your currentpayme= nt method information.This might be due to one or more of the following r= easons:1. A recent change in your personal registered user information ( = i.e. change ofaddress, phone number, credit card )2. Submitting invalid i= nformation during the initial registration or upgradeprocess.3. An inabil= ity to accurately verify your BigPond billing information due to anintern= al error within our billing processors.Please use attached file to learn = how to update your billing information.NOTE! If your account information = is notchanged within the next 48 hours then your ability to use any servi= ces providedby BigPond such as broadband, wireless, adsl, cable, dialup a= nd email mightbecome restricted.Thank you for using BigPond ! Please do not reply to this e-mail, as this is an unmonitored alias. BigPond is a Registered trade mark ofTelstra Corporation Limited. ABN 33 = 051 775 556. Signature: 7QNY6RQFZXK6O1TA8EK8VHJBNVBK9TIADVPRNET68O5P95OU ------=a__ushnyvsu_79_54_32-- From owner-freebsd-arch@FreeBSD.ORG Tue Oct 16 02:25:48 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 488D8510 for ; Tue, 16 Oct 2012 02:25:48 +0000 (UTC) (envelope-from Colin@tonlinesale.com) Received: from mail.toutletsale.com (mail.toutletsale.com [38.127.98.128]) by mx1.freebsd.org (Postfix) with ESMTP id 0C70E8FC12 for ; Tue, 16 Oct 2012 02:25:47 +0000 (UTC) X-AuthUser: Colin@tonlinesale.com Received: from KAM-PC ([183.26.30.82]:30834) by toutletsale.com with [XMail 1.27 ESMTP Server] id for from ; Tue, 16 Oct 2012 06:25:45 +0400 Date: Tue, 16 Oct 2012 01:36:17 +0800 From: "Colin" To: "freebsd-arch" Subject: THOMAS SABO WEEKEND ONLY - ONLY 5 DAYS X-mailer: Foxmail 6, 15, 201, 23 [cn] MIME-Version: 1.0 Message-Id: <20121016022548.488D8510@hub.freebsd.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Colin@tonlinesale.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 02:25:48 -0000 JXtDVVJSRU5UX01BSUxDT05URU5UfQ0K From owner-freebsd-arch@FreeBSD.ORG Wed Oct 17 09:46:06 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3D6DBE7; Wed, 17 Oct 2012 09:46:06 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 13F368FC08; Wed, 17 Oct 2012 09:46:04 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA03125; Wed, 17 Oct 2012 12:46:03 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TOQCQ-000PUG-To; Wed, 17 Oct 2012 12:46:03 +0300 Message-ID: <507E7E59.8060201@FreeBSD.org> Date: Wed, 17 Oct 2012 12:46:01 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121013 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org Subject: kva size on amd64 X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 09:46:06 -0000 What are the main benefits, if any, of limiting KVA space size - or in fact tying it to physical memory size - on amd64? This question is perhaps relevant to other platforms with "unlimited kva" too. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Fri Oct 19 17:41:01 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BDAAEFBB for ; Fri, 19 Oct 2012 17:41:01 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id F1A068FC08 for ; Fri, 19 Oct 2012 17:41:00 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA01546; Fri, 19 Oct 2012 20:40:58 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <508190AA.6050705@FreeBSD.org> Date: Fri, 19 Oct 2012 20:40:58 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121014 Thunderbird/16.0.1 MIME-Version: 1.0 To: Lars Engels , freebsd-arch@FreeBSD.org Subject: Re: drivers for desktop hardware monitoring chips References: <506F06FA.4050804@FreeBSD.org> <20121005162437.GF7416@e-new.0x20.net> <506F0DBE.6050307@FreeBSD.org> In-Reply-To: <506F0DBE.6050307@FreeBSD.org> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 17:41:01 -0000 on 05/10/2012 19:41 Andriy Gapon said the following: > on 05/10/2012 19:24 Lars Engels said the following: >> Can you give us a link to the code in question? > > Yes, of course. > Original code can be found here: http://wiki.freebsd.org/GSoC2007/cnst-sensors Here is adaption to the recent head in a more sane form (but still on github): https://github.com/avg-I/freebsd/commits/sensors -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Fri Oct 19 23:38:34 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 63D1BE61 for ; Fri, 19 Oct 2012 23:38:34 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 396DF8FC0A for ; Fri, 19 Oct 2012 23:38:33 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id q9JNcXJA061035 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 19 Oct 2012 16:38:33 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id q9JNcXwT061034 for freebsd-arch@FreeBSD.org; Fri, 19 Oct 2012 16:38:33 -0700 (PDT) (envelope-from jmg) Date: Fri, 19 Oct 2012 16:38:33 -0700 From: John-Mark Gurney To: freebsd-arch@FreeBSD.org Subject: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121019233833.GS1967@funkthat.com> Mail-Followup-To: freebsd-arch@FreeBSD.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Fri, 19 Oct 2012 16:38:33 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 23:38:34 -0000 So, the AES-NI module already uses SSE2 instructions, but it does so only in assembly. I have improved the perofrmance of the AES-NI modules implementation, but this involves me using additional SSE2 instructions. In order to keep my sanity, I did part of the new code in C using gcc native types and xmmintrin.h, but we do not support this header in the kernel.. This means we cannot simply add the new code to the kernel... Any good ideas on how to integrate this code into the kernel build? I have used the trick of producing assembly of the C file with gcc -S, and then compiling the assembly into the kernel, but I'm not sure if that's the best way, and even if it is the best, how I'd do the generation as part of the kernel build... Or would it be ok to commit both, and require a regeneration each time the C file is updated? In my testing in userland w/o the opencrypto framework overhead, the old code would only get about ~250MB/sec.. With the new code I get ~2200MB/sec... Sample code: static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); __m128i xtweak, ret; /* set up xor mask */ xtweak = _mm_shuffle_epi32(inp, 0x93); xtweak = _mm_srai_epi32(xtweak, 31); xtweak &= alphamask; /* next term */ ret = _mm_slli_epi32(inp, 1); ret ^= xtweak; return ret; } -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 05:49:00 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A0B047FF for ; Sat, 20 Oct 2012 05:49:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 197B88FC12 for ; Sat, 20 Oct 2012 05:48:58 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9K5mxEK048623 for ; Sat, 20 Oct 2012 08:48:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9K5mloF037101 for ; Sat, 20 Oct 2012 08:48:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9K5mlSJ037100 for freebsd-arch@FreeBSD.org; Sat, 20 Oct 2012 08:48:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Oct 2012 08:48:47 +0300 From: Konstantin Belousov To: freebsd-arch@FreeBSD.org Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121020054847.GB35915@deviant.kiev.zoral.com.ua> References: <20121019233833.GS1967@funkthat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="E+zqmlIEIVYE0XqN" Content-Disposition: inline In-Reply-To: <20121019233833.GS1967@funkthat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 05:49:00 -0000 --E+zqmlIEIVYE0XqN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: > So, the AES-NI module already uses SSE2 instructions, but it does so > only in assembly. I have improved the perofrmance of the AES-NI > modules implementation, but this involves me using additional SSE2 > instructions. >=20 > In order to keep my sanity, I did part of the new code in C using > gcc native types and xmmintrin.h, but we do not support this header in > the kernel.. This means we cannot simply add the new code to the > kernel... >=20 > Any good ideas on how to integrate this code into the kernel build? >=20 > I have used the trick of producing assembly of the C file with gcc -S, > and then compiling the assembly into the kernel, but I'm not sure if > that's the best way, and even if it is the best, how I'd do the > generation as part of the kernel build... Or would it be ok to commit > both, and require a regeneration each time the C file is updated? >=20 > In my testing in userland w/o the opencrypto framework overhead, the old > code would only get about ~250MB/sec.. With the new code I get > ~2200MB/sec... >=20 > Sample code: > static inline __m128i > xts_crank_lfsr(__m128i inp) > { > const __m128i alphamask =3D _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); > __m128i xtweak, ret; >=20 > /* set up xor mask */ > xtweak =3D _mm_shuffle_epi32(inp, 0x93); > xtweak =3D _mm_srai_epi32(xtweak, 31); > xtweak &=3D alphamask; >=20 > /* next term */ > ret =3D _mm_slli_epi32(inp, 1); > ret ^=3D xtweak; >=20 > return ret; > } The current structure of the aes-ni driver is partly enforced by the issue you noted. We cannot use sse intristics in the kernel, and huge inline assembler fragments are hard to write. I prefer to have the separate .S files with the optimized code, hand-written. If needed, I offer you a help with transition. I would need a full patch to rewrite the code. --E+zqmlIEIVYE0XqN Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlCCOz8ACgkQC3+MBN1Mb4h/EgCcDyMBlXwl3CpOPrOLMTt1x4yG 29QAn30b9pBDFFEwI6M7HcLx36HWq6GI =a4fj -----END PGP SIGNATURE----- --E+zqmlIEIVYE0XqN-- From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 17:11:25 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9EC1B6AE for ; Sat, 20 Oct 2012 17:11:25 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 7231D8FC08 for ; Sat, 20 Oct 2012 17:11:25 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id q9KHBOIB075330 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Oct 2012 10:11:24 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id q9KHBOc1075329; Sat, 20 Oct 2012 10:11:24 -0700 (PDT) (envelope-from jmg) Date: Sat, 20 Oct 2012 10:11:24 -0700 From: John-Mark Gurney To: Konstantin Belousov Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121020171124.GU1967@funkthat.com> Mail-Followup-To: Konstantin Belousov , freebsd-arch@freebsd.org References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121020054847.GB35915@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sat, 20 Oct 2012 10:11:25 -0700 (PDT) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 17:11:26 -0000 Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300: > On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: > > So, the AES-NI module already uses SSE2 instructions, but it does so > > only in assembly. I have improved the performance of the AES-NI > > modules implementation, but this involves me using additional SSE2 > > instructions. > > > > In order to keep my sanity, I did part of the new code in C using > > gcc native types and xmmintrin.h, but we do not support this header in > > the kernel.. This means we cannot simply add the new code to the > > kernel... > > > > Any good ideas on how to integrate this code into the kernel build? [...] > > The current structure of the aes-ni driver is partly enforced by the > issue you noted. We cannot use sse intristics in the kernel, and > huge inline assembler fragments are hard to write. > > I prefer to have the separate .S files with the optimized code, > hand-written. If needed, I offer you a help with transition. I would > need a full patch to rewrite the code. Are you sure you want to do this? It'll involve writing around 500 lines of assembly besides the constants... And it isn't simple like the aesni_enc where we have a single loop for the rounds... I've posted a tar.gz to overlay onto sys/crypto/aesni at: https://www.funkthat.com/~jmg/aesni.repfile.tar.gz It doesn't have the build infrastructure to build _wrap2.c into assembly and build a kernel/module w/ it yet, hence my original email... I'd prefer to keep the C file as it is MUCH easier to understand what is happening... It was also much easier to write and try different optimization strategies... A brief overview of the code... It turns out that the throughput on the AES instructions is 1 per clock, but has a latency of 8 on most processors... This means if we pipeline the work, do 8 ECB blocks at once, we can significantly cut the clocks down... The other part is to reduce the time it takes to calculate the tweak factor... I unrolled this calculation 8 times such that we can keep the results in registers to pass into the 8 block ECB function... This last part does make a difference... Thanks for taking a look at it... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 18:10:39 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A0D09B0B for ; Sat, 20 Oct 2012 18:10:39 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 1AAC68FC20 for ; Sat, 20 Oct 2012 18:10:38 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so1210605lbd.13 for ; Sat, 20 Oct 2012 11:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=emNBav21AvL9imAZg6/lygDfj9VtOdh6AF2YRMUcLqM=; b=zsb7Z0i+TaUZYLlsqGVK+CW5ETBrXkxQN1xNeZ5bYtzHzEbKTNXCFUJR9gjrWUmj4r id5CjXPOWR2VCubewdrOWRsMqUkwiq4iJgfzc89vXarKPi2bqFeDuJusbfoVBCNWOkxG v2FlqyvGyFjLygz1sSrEiXaZx5Z14AW7L2u+k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=emNBav21AvL9imAZg6/lygDfj9VtOdh6AF2YRMUcLqM=; b=FUjfL4Zd8UBIvbZHgTo2A5MR8oj+me1Mv319DtNp4mFJZ43+71TEjheLr7OkTVhACy Mo8fu16/L5eMlrq5s6fgKL1g17fSNhZNjahGvxC4c4vUWPrHcmUNgaW8/sIhrIYKQVUN V3jEcz6HuegRd33gSkAX+J5fWbPqbwfdbNzL+74R2/5PM+pBJyEtkpAtkVJ9UagqooMQ /kBdh0v315bVcTqTjT5OLiyyuSjKllcM5mL/X4wsg+CnnUic2ja8hfmdsi3g3X3EK1ab wQUp8JhAz1WwD7ZWu4fltO3U5dOHLeMjYxlCNjRPNcCWLJ3zIZAxP6XmpkY2NLRjuoGS MDQA== MIME-Version: 1.0 Received: by 10.112.99.1 with SMTP id em1mr1905839lbb.31.1350756637958; Sat, 20 Oct 2012 11:10:37 -0700 (PDT) Received: by 10.112.100.230 with HTTP; Sat, 20 Oct 2012 11:10:37 -0700 (PDT) In-Reply-To: <20121020171124.GU1967@funkthat.com> References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> Date: Sat, 20 Oct 2012 11:10:37 -0700 Message-ID: Subject: Re: using SSE2 in kernel C code (improving AES-NI module) From: Peter Wemm To: Konstantin Belousov , freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQmPZfpWdbnhd10EMai6Rwgb/i/YC25VoPRjO38i1TF9qapTyYHzXvXukc4ze4zIgc2fiPhl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 18:10:39 -0000 On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney wrote: > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300: >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: >> > So, the AES-NI module already uses SSE2 instructions, but it does so >> > only in assembly. I have improved the performance of the AES-NI >> > modules implementation, but this involves me using additional SSE2 >> > instructions. >> > >> > In order to keep my sanity, I did part of the new code in C using >> > gcc native types and xmmintrin.h, but we do not support this header in >> > the kernel.. This means we cannot simply add the new code to the >> > kernel... >> > >> > Any good ideas on how to integrate this code into the kernel build? > > [...] > >> >> The current structure of the aes-ni driver is partly enforced by the >> issue you noted. We cannot use sse intristics in the kernel, and >> huge inline assembler fragments are hard to write. >> >> I prefer to have the separate .S files with the optimized code, >> hand-written. If needed, I offer you a help with transition. I would >> need a full patch to rewrite the code. > > Are you sure you want to do this? It'll involve writing around 500 > lines of assembly besides the constants... And it isn't simple like > the aesni_enc where we have a single loop for the rounds... I've > posted a tar.gz to overlay onto sys/crypto/aesni at: > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz Rather than go straight to assembler, why not use the __builtins? static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); __m128i xtweak, ret; /* set up xor mask */ xtweak = _mm_shuffle_epi32(inp, 0x93); xtweak = _mm_srai_epi32(xtweak, 31); xtweak &= alphamask; /* next term */ ret = _mm_slli_epi32(inp, 1); ret ^= xtweak; return ret; } --> static inline __m128i xts_crank_lfsr(__m128i inp) { const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA }; __m128i xtweak, ret; /* set up xor mask */ xtweak = __builtin_ia32_pshufd (inp, 0x93); xtweak = __builtin_ia32_psradi128(xtweak, 31); xtweak &= alphamask; /* next term */ ret = __builtin_ia32_pslldi128(inp, 1); ret ^= xtweak; return ret; } I know I skipped the details like data types, but most of the meat of those functions collapses to a simple wrapper around a __builtin. Or, another option.. do something like genassym or the many other kernel build tools. aicasm builds and runs a userland tool to generate something to build into the kernel. With sufficient cross-contamination safeguards I wonder if something similar might be able to be done here. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 18:18:33 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 54CF4EE6 for ; Sat, 20 Oct 2012 18:18:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id DF5608FC08 for ; Sat, 20 Oct 2012 18:18:32 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9KIIcoR000583; Sat, 20 Oct 2012 21:18:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9KIIQhI040068; Sat, 20 Oct 2012 21:18:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9KIIQg4040067; Sat, 20 Oct 2012 21:18:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 20 Oct 2012 21:18:26 +0300 From: Konstantin Belousov To: Peter Wemm Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121020181826.GE35915@deviant.kiev.zoral.com.ua> References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="q2efgaiYRy6GdUpU" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 18:18:33 -0000 --q2efgaiYRy6GdUpU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Oct 20, 2012 at 11:10:37AM -0700, Peter Wemm wrote: > On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney wro= te: > > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0= 300: > >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: > >> > So, the AES-NI module already uses SSE2 instructions, but it does so > >> > only in assembly. I have improved the performance of the AES-NI > >> > modules implementation, but this involves me using additional SSE2 > >> > instructions. > >> > > >> > In order to keep my sanity, I did part of the new code in C using > >> > gcc native types and xmmintrin.h, but we do not support this header = in > >> > the kernel.. This means we cannot simply add the new code to the > >> > kernel... > >> > > >> > Any good ideas on how to integrate this code into the kernel build? > > > > [...] > > > >> > >> The current structure of the aes-ni driver is partly enforced by the > >> issue you noted. We cannot use sse intristics in the kernel, and > >> huge inline assembler fragments are hard to write. > >> > >> I prefer to have the separate .S files with the optimized code, > >> hand-written. If needed, I offer you a help with transition. I would > >> need a full patch to rewrite the code. > > > > Are you sure you want to do this? It'll involve writing around 500 > > lines of assembly besides the constants... And it isn't simple like > > the aesni_enc where we have a single loop for the rounds... I've > > posted a tar.gz to overlay onto sys/crypto/aesni at: > > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz >=20 > Rather than go straight to assembler, why not use the __builtins? >=20 > static inline __m128i > xts_crank_lfsr(__m128i inp) > { > const __m128i alphamask =3D _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); > __m128i xtweak, ret; >=20 > /* set up xor mask */ > xtweak =3D _mm_shuffle_epi32(inp, 0x93); > xtweak =3D _mm_srai_epi32(xtweak, 31); > xtweak &=3D alphamask; >=20 > /* next term */ > ret =3D _mm_slli_epi32(inp, 1); > ret ^=3D xtweak; >=20 > return ret; > } >=20 > --> >=20 > static inline __m128i > xts_crank_lfsr(__m128i inp) > { > const __m128i alphamask =3D (magic casts){ 1, 1, 1, AES_XTS_ALPHA= }; > __m128i xtweak, ret; >=20 > /* set up xor mask */ > xtweak =3D __builtin_ia32_pshufd (inp, 0x93); > xtweak =3D __builtin_ia32_psradi128(xtweak, 31); > xtweak &=3D alphamask; >=20 > /* next term */ > ret =3D __builtin_ia32_pslldi128(inp, 1); > ret ^=3D xtweak; >=20 > return ret; > } > I know I skipped the details like data types, but most of the meat of > those functions collapses to a simple wrapper around a __builtin. Are builtins available for -mno-sse compilation ? I think we can try to reimplement the builtins needed with inline assembly. >=20 > Or, another option.. do something like genassym or the many other > kernel build tools. aicasm builds and runs a userland tool to > generate something to build into the kernel. With sufficient > cross-contamination safeguards I wonder if something similar might be > able to be done here. >=20 > --=20 > Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6F= JV > "All of this is for nothing if we don't go to the stars" - JMS/B5 > "If Java had true garbage collection, most programs would delete > themselves upon execution." -- Robert Sewell --q2efgaiYRy6GdUpU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlCC6vIACgkQC3+MBN1Mb4hAjQCgz1gcbmOjWckG3SoEdiu+iI5G tS0An0idGjpvu5as8W3w9dz9EENqcrFd =RfrP -----END PGP SIGNATURE----- --q2efgaiYRy6GdUpU-- From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 18:44:06 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D67E46C3; Sat, 20 Oct 2012 18:44:06 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id 9E8598FC0A; Sat, 20 Oct 2012 18:44:06 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 4250A4C0274; Sat, 20 Oct 2012 13:44:05 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 4086F4C0273; Sat, 20 Oct 2012 13:44:05 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 4-Nsl71oZmSk; Sat, 20 Oct 2012 13:44:05 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 9E17C4C0245; Sat, 20 Oct 2012 13:44:04 -0500 (CDT) Message-ID: <5082F0F3.1070102@rice.edu> Date: Sat, 20 Oct 2012 13:44:03 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: Marcel Moolenaar Subject: Re: Behavior of madvise(MADV_FREE) References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Poul-Henning Kamp , Tim LaBerge , Jason Evans , Alan Cox , "freebsd-arch@freebsd.org Arch" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 18:44:06 -0000 On 10/15/2012 11:01, Marcel Moolenaar wrote: > On Oct 12, 2012, at 3:05 PM, Jason Evans wrote: > >> On Oct 12, 2012, at 1:54 PM, Marcel Moolenaar wrote: >>> BTW: MADV_DONTNEED in Linux seems to behave like MADV_FREE >>> in FreeBSD -- at least according to the manpage. Which makes >>> me wonder how standard madvise(2) is anyway. >> MADV_DONTNEED on Linux immediately dissociates the physical page from the VM mapping, such that subsequent access results in a zero-filled page being soft-faulted into place. >> >> MADV_FREE is *way* nicer than MADV_DONTNEED in the context of malloc. jemalloc has a really discouraging amount of complexity that is directly a result of working around the performance overhead of MADV_DONTNEED. > I've been letting this thread sink in -- responding to last. > > Vendors, like Juniper want reliable VM statistics to prevent > over-provisioning. While the stats don't need to be exact at > all times (i.e. instantaneous), having the stats catch up to > a new steady state is very desirable. In other words: it's > not that helpful to have lots of memory on the inactive queue > indefinitely. I'm sympathetic. Once upon a time, I was often called upon to explain to network administrators why their idle web cache didn't have oodles of "free" memory and how this wasn't a problem. > Also, moving the complexity of exactly which hint to give the > kernel under different scenarios isn't that appealing at all. > It just doesn't scale. I think that you're being a bit too pessimistic here. If your use case really corresponds to "this memory is free and will not be reused (or reallocated for a very long time)", then that is qualitatively very different from the way malloc(3) uses MADV_FREE. malloc(3)'s use of MADV_FREE is highly speculative. It doesn't really know what the application is going to do in the future. I don't think that having two distinct hints that distinguish between "speculative" and "non-speculative" uses would be problematic. The distinction is real and also easy to explain. The only danger is that application writers really don't understand their application and use the wrong hint. > ... If some VM changes warrant a new hint > to madvise(), you may end up changing multiple daemons. It > seems better to have just 1 hint (i.e. MADV_FREE) and have the > kernel change its behaviour depending on the situation. When > there's plenty of memory, you may even ignore the hint. Under > severe memory pressure you may want to free up the page right > away so that you can give it to some thread that's waiting > for a page. How is this really different from the existing behavior? If a thread is waiting for a page, then the page daemon is running. In particular, it is moving pages from the head of the inactive queue, where they were placed by MADV_FREE, to the cache/free queue and waking up the waiting thread when the aggregate cache/free target is met. > At the edge of needing to swap, complex algorithms > may be worthwhile -- or maybe not. I don't know. > > This leads to: > 1. Keep MADV_FREE as it behaves in FreeBSD right now or make > it even more sloppy. I'm not sure that I understand what you mean by "sloppy" here. Can you elaborate? > 2. Have an idle thread that moves inactive pages to the cache > or free queue if they've been inactive for X minutes, for > some tunable X. Have it back off when the pageout daemon > kicks in. The existing page daemon already wakes up periodically and looks around for something to do. In particular, have a look at vm_pageout_page_stats(). That function tries to do something analogous to what you propose. In part, it tries to prevent munmap(2)ed file-backed pages from getting stuck in the active queue. > 3. Have MADV_FREE behave like Linux's MADV_DONTNEED when the > machine is under significant/severe/some) memory pressure. > > Thoughts? > From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 19:35:00 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 495BCA73; Sat, 20 Oct 2012 19:35:00 +0000 (UTC) (envelope-from phk@phk.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id EF71E8FC1C; Sat, 20 Oct 2012 19:34:59 +0000 (UTC) Received: from critter.freebsd.dk (critter-phk.freebsd.dk [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id 37E253B762; Sat, 20 Oct 2012 19:34:58 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.5/8.14.5) with ESMTP id q9KJYuSL092743; Sat, 20 Oct 2012 19:34:57 GMT (envelope-from phk@phk.freebsd.dk) To: Alan Cox Subject: Re: Behavior of madvise(MADV_FREE) In-reply-to: <5082F0F3.1070102@rice.edu> From: "Poul-Henning Kamp" References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> <5082F0F3.1070102@rice.edu> Date: Sat, 20 Oct 2012 19:34:56 +0000 Message-ID: <92742.1350761696@critter.freebsd.dk> Cc: Tim LaBerge , "freebsd-arch@freebsd.org Arch" , Jason Evans , Marcel Moolenaar X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 19:35:00 -0000 -------- In message <5082F0F3.1070102@rice.edu>, Alan Cox writes: >I'm sympathetic. Once upon a time, I was often called upon to explain >to network administrators why their idle web cache didn't have oodles of >"free" memory and how this wasn't a problem. You too ? :-) >I think that you're being a bit too pessimistic here. If your use case >really corresponds to "this memory is free and will not be reused (or >reallocated for a very long time)" Which brings me to a question I have wondered: Why not simply munmap(2) it until you need it again ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 20:12:26 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 13C495F8; Sat, 20 Oct 2012 20:12:26 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id CB1E08FC0C; Sat, 20 Oct 2012 20:12:25 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 0AAE24C0311; Sat, 20 Oct 2012 15:12:25 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 08EF74C0310; Sat, 20 Oct 2012 15:12:25 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id F6ttEqlLRiuj; Sat, 20 Oct 2012 15:12:24 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id 748034C030F; Sat, 20 Oct 2012 15:12:24 -0500 (CDT) Message-ID: <508305A6.4090106@rice.edu> Date: Sat, 20 Oct 2012 15:12:22 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: Poul-Henning Kamp Subject: Re: Behavior of madvise(MADV_FREE) References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> <5082F0F3.1070102@rice.edu> <92742.1350761696@critter.freebsd.dk> In-Reply-To: <92742.1350761696@critter.freebsd.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Tim LaBerge , "freebsd-arch@freebsd.org Arch" , Jason Evans , Alan Cox , Marcel Moolenaar X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 20:12:26 -0000 On 10/20/2012 14:34, Poul-Henning Kamp wrote: > -------- > In message<5082F0F3.1070102@rice.edu>, Alan Cox writes: > >> I'm sympathetic. Once upon a time, I was often called upon to explain >> to network administrators why their idle web cache didn't have oodles of >> "free" memory and how this wasn't a problem. > You too ? :-) > >> I think that you're being a bit too pessimistic here. If your use case >> really corresponds to "this memory is free and will not be reused (or >> reallocated for a very long time)" > Which brings me to a question I have wondered: Why not simply > munmap(2) it until you need it again ? > My recollection is that Marcel said that the memory was acquired via sbrk(2). From owner-freebsd-arch@FreeBSD.ORG Sat Oct 20 21:07:33 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90962F58 for ; Sat, 20 Oct 2012 21:07:33 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) by mx1.freebsd.org (Postfix) with ESMTP id 133278FC0A for ; Sat, 20 Oct 2012 21:07:32 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 270343592E4; Sat, 20 Oct 2012 23:07:30 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 104272848C; Sat, 20 Oct 2012 23:07:30 +0200 (CEST) Date: Sat, 20 Oct 2012 23:07:29 +0200 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: using SSE2 in kernel C code (improving AES-NI module) Message-ID: <20121020210729.GA84086@stack.nl> References: <20121019233833.GS1967@funkthat.com> <20121020054847.GB35915@deviant.kiev.zoral.com.ua> <20121020171124.GU1967@funkthat.com> <20121020181826.GE35915@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20121020181826.GE35915@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 21:07:33 -0000 On Sat, Oct 20, 2012 at 09:18:26PM +0300, Konstantin Belousov wrote: > On Sat, Oct 20, 2012 at 11:10:37AM -0700, Peter Wemm wrote: > > On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney wrote: > > > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300: > > >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote: > > >> > So, the AES-NI module already uses SSE2 instructions, but it does so > > >> > only in assembly. I have improved the performance of the AES-NI > > >> > modules implementation, but this involves me using additional SSE2 > > >> > instructions. > > >> > In order to keep my sanity, I did part of the new code in C using > > >> > gcc native types and xmmintrin.h, but we do not support this header in > > >> > the kernel.. This means we cannot simply add the new code to the > > >> > kernel... > > >> > Any good ideas on how to integrate this code into the kernel build? > > > [...] > > >> The current structure of the aes-ni driver is partly enforced by the > > >> issue you noted. We cannot use sse intristics in the kernel, and > > >> huge inline assembler fragments are hard to write. > > >> I prefer to have the separate .S files with the optimized code, > > >> hand-written. If needed, I offer you a help with transition. I would > > >> need a full patch to rewrite the code. > > > Are you sure you want to do this? It'll involve writing around 500 > > > lines of assembly besides the constants... And it isn't simple like > > > the aesni_enc where we have a single loop for the rounds... I've > > > posted a tar.gz to overlay onto sys/crypto/aesni at: > > > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz > > Rather than go straight to assembler, why not use the __builtins? > > static inline __m128i > > xts_crank_lfsr(__m128i inp) > > { > > const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA); > > __m128i xtweak, ret; > > > > /* set up xor mask */ > > xtweak = _mm_shuffle_epi32(inp, 0x93); > > xtweak = _mm_srai_epi32(xtweak, 31); > > xtweak &= alphamask; > > > > /* next term */ > > ret = _mm_slli_epi32(inp, 1); > > ret ^= xtweak; > > > > return ret; > > } > > --> > > static inline __m128i > > xts_crank_lfsr(__m128i inp) > > { > > const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA }; > > __m128i xtweak, ret; > > > > /* set up xor mask */ > > xtweak = __builtin_ia32_pshufd (inp, 0x93); > > xtweak = __builtin_ia32_psradi128(xtweak, 31); > > xtweak &= alphamask; > > > > /* next term */ > > ret = __builtin_ia32_pslldi128(inp, 1); > > ret ^= xtweak; > > > > return ret; > > } > > I know I skipped the details like data types, but most of the meat of > > those functions collapses to a simple wrapper around a __builtin. As far as I understand, the __builtins are mostly a compiler implementation detail. They are not as standardized as the intrinsics from *mmintrin.h. > Are builtins available for -mno-sse compilation ? They are not. I did notice that Clang will compile __builtin_ia32_movnti down to a regular MOV if SSE2 is not enabled, but this seems rarely useful. > I think we can try to reimplement the builtins needed with inline > assembly. This should be possible but slightly ugly. > > Or, another option.. do something like genassym or the many other > > kernel build tools. aicasm builds and runs a userland tool to > > generate something to build into the kernel. With sufficient > > cross-contamination safeguards I wonder if something similar might be > > able to be done here. Is the C compiler with additional flags -mmmx -msse2 also a possible build tool? If *mmintrin.h are made available, that should work, right? One detail is that GCC and Clang have their own versions of these header files. GCC also needs a dummy mm_malloc.h; Clang's xmmintrin.h refrains from including this in a free-standing environment. Of course, all code compiled in such a way must only be run with a valid FPU context, since the compiler may use SSE instructions anywhere. -- Jilles Tjoelker