From owner-freebsd-arch@FreeBSD.ORG Thu Jan 9 18:31:49 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BB6B0286; Thu, 9 Jan 2014 18:31:49 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 934C5112E; Thu, 9 Jan 2014 18:31:49 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B63E0B918; Thu, 9 Jan 2014 13:31:47 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? Date: Thu, 09 Jan 2014 13:31:25 -0500 Message-ID: <9508909.MMfryVDtI5@ralph.baldwin.cx> User-Agent: KMail/4.10.5 (FreeBSD/10.0-PRERELEASE; KDE/4.10.5; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 09 Jan 2014 13:31:47 -0500 (EST) Cc: Adrian Chadd X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jan 2014 18:31:49 -0000 On Friday, January 03, 2014 04:55:48 PM Adrian Chadd wrote: > Hi, > > So here's a fun one. > > When doing TCP traffic + socket affinity + thread pinning experiments, > I seem to hit this very annoying scenario that caps my performance and > scalability. > > Assume I've lined up everything relating to a socket to run on the > same CPU (ie, TX, RX, TCP timers, userland thread): Are you sure this is really the best setup? Especially if you have free CPUs in the system the time you lose in context switches fighting over the one assigned CPU for a flow when you have idle CPUs is quite wasteful. I know that tying all of the work for a given flow to a single CPU is all the rage right now, but I wonder if you had considered assigning a pair of CPUs to a flow, one CPU to do the top-half (TX and userland thread) and one CPU to do the bottom-half (RX and timers). This would remove the context switches you see and replace it with spinning in the times when the two cores actually contend. It may also be fairly well suited to SMT (which I suspect you might have turned off currently). If you do have SMT turned off, then you can get a pair of CPUs for each queue without having to reduce the number of queues you are using. I'm not sure this would work better than creating one queue for every CPU, but I think it is probably something worth trying for your use case at least. BTW, the problem with just slapping critical enter into mutexes is you will run afoul of assertions the first time you contend on a mutex and have to block. It may be that only the assertions would break and nothing else, but I'm not certain there aren't other assumptions about critical sections and not ever context switching for any reason, voluntary or otherwise. -- John Baldwin