From owner-freebsd-hackers@FreeBSD.ORG  Sat Jun  7 20:45:05 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 89D7626F;
 Sat,  7 Jun 2014 20:45:05 +0000 (UTC)
Received: from mail-qa0-x22c.google.com (mail-qa0-x22c.google.com
 [IPv6:2607:f8b0:400d:c00::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 311042096;
 Sat,  7 Jun 2014 20:45:05 +0000 (UTC)
Received: by mail-qa0-f44.google.com with SMTP id j7so6079874qaq.17
 for <multiple recipients>; Sat, 07 Jun 2014 13:45:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=TGnHrbEvxhHPsvTTC+ZzTp24UxmdScSGHPtR5H0pKic=;
 b=uWR3uMEtpH5Wdmsj9GS0vK9J/f0B4jbYCEh+mepe7o8nD07t5UZiR/sBoRzHUD7jt5
 VWQ6OfKa9sHh60Et8P0oMKLY++TH9x8N886Kw4DqRULTefmtPyAZv3KfnIeFDtCZ+SCU
 0wx8T7VmpTWxpVt1PdOn7jYnzyOrvbUFVnCOVHhZ8nXLYJKW6ERfO0oeKRHEZjrz24BT
 jKjU8N7h06lmaGo+tmC1lbwYiQrYQPs7WqSf0x70V1Y24dJ0IDPUR32FiQJ8QBtDYmiY
 G7FJsH7qK5QGcP8ygLLflGagSZjBdFtPFtuP4ugsDJbTgT+rqtD1j4pzcJA7tlU/kpVa
 Dk5A==
MIME-Version: 1.0
X-Received: by 10.140.22.209 with SMTP id 75mr19416044qgn.4.1402173904358;
 Sat, 07 Jun 2014 13:45:04 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.224.43.134 with HTTP; Sat, 7 Jun 2014 13:45:04 -0700 (PDT)
In-Reply-To: <CADWvR2iqKFOUFz66ULrP2xk3JOjm=dbzT7YtKMbhEqyEc9WLpw@mail.gmail.com>
References: <CAAGHsvDhaqQbwir5P+oaH_Qa8VZ0aj9A2SGrn+2shJMQ21B6Jw@mail.gmail.com>
 <alpine.BSF.2.00.1406070252270.21531@erdgeist.org>
 <CADWvR2gkeNaeVPizq_VubWhEHy3ywURJOdv9C=6PNybwYyFqRg@mail.gmail.com>
 <CAJ-Vmonm3aZr=kP293x90Am7VzWQQ65cTE8fiTZ6KAECegoZGQ@mail.gmail.com>
 <1402159374.20883.160.camel@revolution.hippie.lan>
 <CADWvR2guSYMKEm2HkzXNVuO+VS6=_a9jFBmKcSE2BzjYfiaUrQ@mail.gmail.com>
 <CAJ-Vmom=QOZtn1jQADPZfV10TXD4aoNQT7jhip_sp_=zQ04jog@mail.gmail.com>
 <CADWvR2getzzd8we+YSQS6vq-7kJn2j-3WhQxWQAP9O8GNLWZOQ@mail.gmail.com>
 <CAJ-Vmo=qo5gGmD6PAeoGLTRpS=wk0=Bm+c49NHLM2BBhVHuphQ@mail.gmail.com>
 <CADWvR2iyXQ6SkA4Dhy_A3QxsN5sutbC2wS3UpkptBz48Fp8=ZA@mail.gmail.com>
 <CAJ-Vmok-2eCRuLKwrVCWbo2q-SzEMxma3jrvPFajBkqhUA2Q1Q@mail.gmail.com>
 <CADWvR2iqKFOUFz66ULrP2xk3JOjm=dbzT7YtKMbhEqyEc9WLpw@mail.gmail.com>
Date: Sat, 7 Jun 2014 16:45:04 -0400
X-Google-Sender-Auth: HqfR3Kc7yTQsCK_Jyk7-3Z8SRV0
Message-ID: <CAJ-Vmo==mz2dLZaXjKPAzAVOK8csVfYPDS3Sovnwr-0+x9k7jg@mail.gmail.com>
Subject: Re: Best practice for accepting TCP connections on multicore?
From: Adrian Chadd <adrian@freebsd.org>
To: Igor Mozolevsky <igor@hybrid-lab.co.uk>
Content-Type: text/plain; charset=UTF-8
Cc: Hackers freeBSD <freebsd-hackers@freebsd.org>,
 Daniel Janzon <janzon@gmail.com>, Dirk Engling <erdgeist@erdgeist.org>,
 Ian Lepore <ian@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Jun 2014 20:45:05 -0000

On 7 June 2014 16:37, Igor Mozolevsky <igor@hybrid-lab.co.uk> wrote:
>
>
>
> On 7 June 2014 21:18, Adrian Chadd <adrian@freebsd.org> wrote:
>>
>> > Not quite - the gist (and the point) of that slide with Rob's story was
>> > that
>> > by the time Rob wrote something that could comprehensively deal with
>> > states
>> > in an even-driven server, he ended up essentially re-inventing the
>> > wheel.
>>
>> I read the same slides you did. He didn't reinvent the wheel - threads
>> are a different concept - at any point the state can change and you
>> switch to a new thread. Event driven, asynchronous programming isn't
>> quite like that.
>
>
> Not quite- unless you're dealing with stateless HTTP, you still need to know
> what the "current" state of the "current" connection is, which is the point
> of that slide.
>
>
>> > Paul Tyma's presentation posted earlier did conclude with various models
>> > for
>> > different types of daemons, which the OP might find at least
>> > interesting.
>>
>> Agreed, but again - it's all java, it's all linux, and it's 2008.
>
>
> Agreed, but threading models are platform-agnostic.
>
>
>> The current state is that threads and thread context switching are
>> more expensive than you'd like. You really want to (a) avoid locking
>> at all, (b) keep the CPU hot with cached data, and (c) keep it from
>> changing contexts.
>
>
> Agreed, but uncontested locking should be virtually cost-free (or close to
> that), modern CPUs have plenty of L2/L3 cache to keep enough data nearby,
> and there are plenty of cores to keep cycling in the same thread-loop, and
> hyper-threading helps with ctx switching (or at least is supposed to). In
> any event, shuttling data between RAM and cache (especially with the on-die
> RAM controllers, and even if data has to go through QPI/HyperT), and the
> cost of changing contexts is tiny compared to that of disk and network IO.

I was doing 40gbit/sec testing over 2^16 connections (and was hoping
to get the chance to optimise this stuff to get to 2^17 active
streaming connections, but I ran out of CPU.) If you're not careful
about keeping work on a local CPU, you end up blowing your caches and
hitting lock contention pretty quickly.

And QPI isn't free. There's a cost going backwards and forwards with
packet data and cache lines for uncontested data. I'm not going to
worry about QPI and socket awareness just for now - that's a bigger
problem to solve. I'll first worry about getting RSS working for a
single socket setup and then convert a couple of drivers over to be
RSS aware. I'll then worry about multiple socket awareness and being
aware of whether a NIC is local to a socket.

I'm hoping that with this work and the Verisign TCP locking changes,
we'll be able to handle 40gig bulk data on single socket Sandy Bridge
Xeon hardware and/or > 100,000 TCP sessions a second with plenty of
CPU to spare. Then it's getting to 80 gig on Ivy bridge class
single-socket hardware. I'm hoping we can aim for much higher (million
+ transactions a second) on the current generation hardware but that
requires a bunch more locking work. And well, whatever hardware I can
play with. All I have at home is a 4-core ivy bridge desktop box with
igb(4). :-P


-a