From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  9 19:43:04 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 34199AD0;
 Thu,  9 Jan 2014 19:43:04 +0000 (UTC)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 by mx1.freebsd.org (Postfix) with ESMTP id DF9F21719;
 Thu,  9 Jan 2014 19:43:03 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1W1LVN-000Ku5-AJ; Thu, 09 Jan 2014 23:43:01 +0400
Date: Thu, 9 Jan 2014 23:43:01 +0400
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done?
Message-ID: <20140109194301.GA79282@zxy.spb.ru>
References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>
 <9508909.MMfryVDtI5@ralph.baldwin.cx>
 <CAJ-Vmo=rayYvUYsNLs2A-T=a7WbrSA+TUPgDoGCHdbQjeJ9ynw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJ-Vmo=rayYvUYsNLs2A-T=a7WbrSA+TUPgDoGCHdbQjeJ9ynw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jan 2014 19:43:04 -0000

On Thu, Jan 09, 2014 at 10:44:51AM -0800, Adrian Chadd wrote:

> On 9 January 2014 10:31, John Baldwin <jhb@freebsd.org> wrote:
> > On Friday, January 03, 2014 04:55:48 PM Adrian Chadd wrote:
> >> Hi,
> >>
> >> So here's a fun one.
> >>
> >> When doing TCP traffic + socket affinity + thread pinning experiments,
> >> I seem to hit this very annoying scenario that caps my performance and
> >> scalability.
> >>
> >> Assume I've lined up everything relating to a socket to run on the
> >> same CPU (ie, TX, RX, TCP timers, userland thread):
> >
> > Are you sure this is really the best setup?  Especially if you have free CPUs
> > in the system the time you lose in context switches fighting over the one
> > assigned CPU for a flow when you have idle CPUs is quite wasteful.  I know
> > that tying all of the work for a given flow to a single CPU is all the rage
> > right now, but I wonder if you had considered assigning a pair of CPUs to a
> > flow, one CPU to do the top-half (TX and userland thread) and one CPU to
> > do the bottom-half (RX and timers).  This would remove the context switches
> > you see and replace it with spinning in the times when the two cores actually
> > contend.  It may also be fairly well suited to SMT (which I suspect you might
> > have turned off currently).  If you do have SMT turned off, then you can get
> > a pair of CPUs for each queue without having to reduce the number of queues
> > you are using.  I'm not sure this would work better than creating one queue
> > for every CPU, but I think it is probably something worth trying for your use
> > case at least.
> >
> > BTW, the problem with just slapping critical enter into mutexes is you will
> > run afoul of assertions the first time you contend on a mutex and have to
> > block.  It may be that only the assertions would break and nothing else, but
> > I'm not certain there aren't other assumptions about critical sections and
> > not ever context switching for any reason, voluntary or otherwise.
> 
> It's the rage because it turns out it bounds the system behaviour rather nicely.
> 
> The idea is to scale upwards of 60,000 active TCP sockets. Some people
> are looking at upwards of 100,000 active concurrent sockets. The
> amount of contention is non-trivial if it's not lined up.
> 
> And yeah, I'm aware of the problem of just slapping critical sections
> around mutexes. I've faced this stuff in Linux. It's why doing this
> stuff is much more fragile on Linux.. :-P

For this setup first look to TCP timers (and locking around tcp timers), IMHO.