From owner-cvs-src@FreeBSD.ORG Thu Jun 19 15:42:04 2008 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F1A61065674 for ; Thu, 19 Jun 2008 15:42:04 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159]) by mx1.freebsd.org (Postfix) with ESMTP id D275B8FC23 for ; Thu, 19 Jun 2008 15:42:03 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id l26so469349fgb.35 for ; Thu, 19 Jun 2008 08:42:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=XGz3cSiIlyRFBrF81kcxsXe0xj2mVb6GOv4SCG3SwSQ=; b=WDAC66/apl/pjmNAE0wfIs/2xnVcno6qCmeAs1Byg7IDR3O4PcgLdh6j7ziZKEovAX /FMY12gOxH1WtZ5xRlI7QYZzf4uZwVdxAbwS73Yoln9AriOf2g+3K7KT0K9gRda6wjN7 18FRIIiT2B8du1lcUed3M8ZDOdrPs6l2k31p4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=o/kyHef6lM+Kwg4ci2ujLRROB6TgbddD1LTmq0jg28oVel8496wtrGHjDpEpwJZRm6 Md+qGvK/ucVpxPH/rGUup6j0nhjlntI8LCkikxkktuZZFjuofAtiPWjJV2wqU/6NInS0 QvEx7iO/xs4vDM2NitXw8ULpZ5TwU0DxTPnrM= Received: by 10.86.70.8 with SMTP id s8mr2337808fga.31.1213890122524; Thu, 19 Jun 2008 08:42:02 -0700 (PDT) Received: by 10.86.2.18 with HTTP; Thu, 19 Jun 2008 08:42:02 -0700 (PDT) Message-ID: <3bbf2fe10806190842s381611del5c5dc27d2dd22a7e@mail.gmail.com> Date: Thu, 19 Jun 2008 17:42:02 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Peter Wemm" In-Reply-To: <200803232309.m2NN96Qa080896@repoman.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200803232309.m2NN96Qa080896@repoman.freebsd.org> X-Google-Sender-Auth: 129038355aaebf4a Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/amd64/amd64 cpu_switch.S X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2008 15:42:04 -0000 2008/3/24, Peter Wemm : > peter 2008-03-23 23:09:06 UTC > > FreeBSD src repository > > Modified files: > sys/amd64/amd64 cpu_switch.S > Log: > First pass at (possibly futile) microoptimizing of cpu_switch. Results > are mixed. Some pure context switch microbenchmarks show up to 29% > improvement. Pipe based context switch microbenchmarks show up to 7% > improvement. Real world tests are far less impressive as they are > dominated more by actual work than switch overheads, but depending on > the machine in question, workload, kernel options, phase of moon, etc, a > few percent gain might be seen. > > Summary of changes: > - don't reload MSR_[FG]SBASE registers when context switching between > non-threaded userland apps. These typically cost 120 clock cycles each > on an AMD cpu (less on Barcelona/Phenom). Intel cores are probably no > faster on this. > - The above change only helps unthreaded userland apps that tend to use > the same value for gsbase. Threaded apps will get no benefit from this. > - reorder things like accessing the pcb to be in memory order, to give > prefetching a better chance of working. Operations are now in increasing > memory address order, rather than reverse or random. > - Push some lesser used code out of the main code paths. Hopefully > allowing better code density in cache lines. This is probably futile. > - (part 2 of previous item) Reorder code so that branches have a more > realistic static branch prediction hint. Both Intel and AMD cpus > default to predicting branches to lower memory addresses as being > taken, and to higher memory addresses as not being taken. This is > overridden by the limited dynamic branch prediction subsystem. A trip > through userland might overflow this. > - Futule attempt at spreading the use of the results of previous operations > in new operations. Hopefully this will allow the cpus to execute in > parallel better. > - stop wasting 16 bytes at the top of kernel stack, below the PCB. > - Never load the userland fs/gsbase registers for kthreads, but preserve > curpcb->pcb_[fg]sbase as caches for the cpu. (Thanks Jeff!) > > Microbenchmarking this code seems to be really sensitive to things like > scheduling luck, timing, cache behavior, tlb behavior, kernel options, > other random code changes, etc. > > While it doesn't help heavy userland workloads much, it does help high > context switch loads a little, and should help those that involve > switching via kthreads a bit more. > > A special thanks to Kris for the testing and reality checks, and Jeff for > tormenting me into doing this. :) > > This is still work-in-progress. It looks like this patch introduces a regression. In particular, this chunk: @@ -181,82 +166,138 @@ sw1: cmpq %rcx, %rdx pause je 1b - lfence #endif is not totally right as we want to enforce an acq -- Peace can only be achieved by understanding - A. Einstein