From owner-svn-src-head@freebsd.org Wed Aug 5 12:53:58 2015 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 203C69B4210; Wed, 5 Aug 2015 12:53:58 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 04D3B99D; Wed, 5 Aug 2015 12:53:58 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.70]) by repo.freebsd.org (8.14.9/8.14.9) with ESMTP id t75Crv7n020398; Wed, 5 Aug 2015 12:53:57 GMT (envelope-from vangyzen@FreeBSD.org) Received: (from vangyzen@localhost) by repo.freebsd.org (8.14.9/8.14.9/Submit) id t75CruZa020389; Wed, 5 Aug 2015 12:53:56 GMT (envelope-from vangyzen@FreeBSD.org) Message-Id: <201508051253.t75CruZa020389@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: vangyzen set sender to vangyzen@FreeBSD.org using -f From: Eric van Gyzen Date: Wed, 5 Aug 2015 12:53:56 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r286317 - in head: lib/libthr/arch/amd64 lib/libthr/arch/i386 libexec/rtld-elf/amd64 libexec/rtld-elf/i386 share/mk X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Aug 2015 12:53:58 -0000 Author: vangyzen Date: Wed Aug 5 12:53:55 2015 New Revision: 286317 URL: https://svnweb.freebsd.org/changeset/base/286317 Log: Disable SSE in libthr Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. If the thread does not otherwise use SSE, this usage incurs a context-switch of the FPU/SSE state, which reduces the performance of multiple real-world applications by a non-trivial amount (3-5% in one application). Instead of this change, I experimented with eagerly switching the FPU state at context-switch time. This did not help. Most of the cost seems to be in the read/write of memory--as kib@ stated--and not in the #NM handling. I tested on machines with and without XSAVEOPT. One counter-argument to this change is that most applications already use SIMD, and the number of applications and amount of SIMD usage are only increasing. This is absolutely true. I agree that--in general and in principle--this change is in the wrong direction. However, there are applications that do not use enough SSE to offset the extra context-switch cost. SSE does not provide a clear benefit in the current libthr code with the current compiler, but it does provide a clear loss in some cases. Therefore, disabling SSE in libthr is a non-loss for most, and a gain for some. I refrained from disabling SSE in libc--as was suggested--because I can't make the above argument for libc. It provides a wide variety of code; each case should be analyzed separately. https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html Suggestions from: dim, jmg, rpaulo Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell Inc. Modified: head/lib/libthr/arch/amd64/Makefile.inc head/lib/libthr/arch/i386/Makefile.inc head/libexec/rtld-elf/amd64/Makefile.inc head/libexec/rtld-elf/i386/Makefile.inc head/share/mk/bsd.cpu.mk Modified: head/lib/libthr/arch/amd64/Makefile.inc ============================================================================== --- head/lib/libthr/arch/amd64/Makefile.inc Wed Aug 5 11:24:40 2015 (r286316) +++ head/lib/libthr/arch/amd64/Makefile.inc Wed Aug 5 12:53:55 2015 (r286317) @@ -1,3 +1,9 @@ #$FreeBSD$ SRCS+= _umtx_op_err.S + +# With the current compiler and libthr code, using SSE in libthr +# does not provide enough performance improvement to outweigh +# the extra context switch cost. This can measurably impact +# performance when the application also does not use enough SSE. +CFLAGS+=${CFLAGS_NO_SIMD} Modified: head/lib/libthr/arch/i386/Makefile.inc ============================================================================== --- head/lib/libthr/arch/i386/Makefile.inc Wed Aug 5 11:24:40 2015 (r286316) +++ head/lib/libthr/arch/i386/Makefile.inc Wed Aug 5 12:53:55 2015 (r286317) @@ -1,3 +1,9 @@ # $FreeBSD$ SRCS+= _umtx_op_err.S + +# With the current compiler and libthr code, using SSE in libthr +# does not provide enough performance improvement to outweigh +# the extra context switch cost. This can measurably impact +# performance when the application also does not use enough SSE. +CFLAGS+=${CFLAGS_NO_SIMD} Modified: head/libexec/rtld-elf/amd64/Makefile.inc ============================================================================== --- head/libexec/rtld-elf/amd64/Makefile.inc Wed Aug 5 11:24:40 2015 (r286316) +++ head/libexec/rtld-elf/amd64/Makefile.inc Wed Aug 5 12:53:55 2015 (r286317) @@ -1,6 +1,6 @@ # $FreeBSD$ -CFLAGS+= -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float +CFLAGS+= ${CFLAGS_NO_SIMD} -msoft-float # Uncomment this to build the dynamic linker as an executable instead # of a shared library: #LDSCRIPT= ${.CURDIR}/${MACHINE_CPUARCH}/elf_rtld.x Modified: head/libexec/rtld-elf/i386/Makefile.inc ============================================================================== --- head/libexec/rtld-elf/i386/Makefile.inc Wed Aug 5 11:24:40 2015 (r286316) +++ head/libexec/rtld-elf/i386/Makefile.inc Wed Aug 5 12:53:55 2015 (r286317) @@ -1,6 +1,6 @@ # $FreeBSD$ -CFLAGS+= -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float +CFLAGS+= ${CFLAGS_NO_SIMD} -msoft-float # Uncomment this to build the dynamic linker as an executable instead # of a shared library: #LDSCRIPT= ${.CURDIR}/${MACHINE_CPUARCH}/elf_rtld.x Modified: head/share/mk/bsd.cpu.mk ============================================================================== --- head/share/mk/bsd.cpu.mk Wed Aug 5 11:24:40 2015 (r286316) +++ head/share/mk/bsd.cpu.mk Wed Aug 5 12:53:55 2015 (r286317) @@ -282,6 +282,27 @@ _CPUCFLAGS += -mfloat-abi=softfp CFLAGS += ${_CPUCFLAGS} .endif +# +# Prohibit the compiler from emitting SIMD instructions. +# These flags are added to CFLAGS in areas where the extra context-switch +# cost outweighs the advantages of SIMD instructions. +# +# gcc: +# Setting -mno-mmx implies -mno-3dnow +# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3 and -mfpmath=387 +# +# clang: +# Setting -mno-mmx implies -mno-3dnow and -mno-3dnowa +# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3, -mno-sse41 and +# -mno-sse42 +# (-mfpmath= is not supported) +# +.if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64" +CFLAGS_NO_SIMD.clang= -mno-avx +CFLAGS_NO_SIMD= -mno-mmx -mno-sse +.endif +CFLAGS_NO_SIMD += ${CFLAGS_NO_SIMD.${COMPILER_TYPE}} + # Add in any architecture-specific CFLAGS. # These come from make.conf or the command line or the environment. CFLAGS += ${CFLAGS.${MACHINE_ARCH}}