From owner-freebsd-current@FreeBSD.ORG Mon Apr 6 20:18:27 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B2333DF; Mon, 6 Apr 2015 20:18:27 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8A0FAEA9; Mon, 6 Apr 2015 20:18:27 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 86C9FB915; Mon, 6 Apr 2015 16:18:26 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: SSE in libthr Date: Mon, 06 Apr 2015 15:56:35 -0400 Message-ID: <2321449.loLalmYxzs@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: References: <5515AED9.8040408@FreeBSD.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 06 Apr 2015 16:18:26 -0400 (EDT) Cc: Adrian Chadd , David Chisnall X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2015 20:18:27 -0000 On Saturday, March 28, 2015 10:41:48 AM Adrian Chadd wrote: > Ok, so how do we reduce the amount of FPU save and restores, or make > them cheaper? Or make them more useful. If you are using SSE/AVX more often between context switches in ways that are beneficial then that might offset the cost of the save and restore and result in a net win. I have variants of strlen, memcpy, and memset that use SSE. However, microbenchmarks aren't super useful as you have noted. If you would like to try these out in some real workloads I can provide a patch to libc. -- John Baldwin