From owner-freebsd-net@FreeBSD.ORG Thu Feb 13 07:48:46 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 535B0BEB; Thu, 13 Feb 2014 07:48:46 +0000 (UTC) Received: from mail-qa0-x22a.google.com (mail-qa0-x22a.google.com [IPv6:2607:f8b0:400d:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id F408E14E4; Thu, 13 Feb 2014 07:48:45 +0000 (UTC) Received: by mail-qa0-f42.google.com with SMTP id k4so15907890qaq.1 for ; Wed, 12 Feb 2014 23:48:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=GIBQgbflLf5rO4x+eVtCXQ1GsmIzB/ZnL7i8xdLYAHM=; b=gJ0GvDRFMakNvS3Dem1/wZK71qO240T2bzgSeRRqgVYKkEhhsI1uKrfFD1nCqGDaZE mYaj7aHhY0zxdOSYb/+d80OYhQQ5yWAq5YnyZFMKp6WEFu+OFEC+blcG+/sWa4HmZo6F l1ZUXofxGq8w87jtHXCKCB/AlxqzxGfcsn2xsHLT1ss6l+f3u6IPllnEj8SfpwgpOU1s mABJr1x6XcHQmIcXMo1wLkdILRrBq2Kujm+zXzsLKMrYNnFzYgmnECvS+hnlmXBRqC9Q yXfvpKUViu4SK8QX7o98y0cE3FNP6inpWu2xNsySTZWemj1xVrYzYJPybzcpYeg+W01n xyjw== MIME-Version: 1.0 X-Received: by 10.224.61.2 with SMTP id r2mr4720qah.49.1392277724680; Wed, 12 Feb 2014 23:48:44 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.16.10 with HTTP; Wed, 12 Feb 2014 23:48:44 -0800 (PST) In-Reply-To: <20140213054812.GK26785@FreeBSD.org> References: <20140213054812.GK26785@FreeBSD.org> Date: Wed, 12 Feb 2014 23:48:44 -0800 X-Google-Sender-Auth: dmXUEUTy4Ml4veCRlsMsh-q-qKA Message-ID: Subject: Re: flowtable, collisions, locking and CPU affinity From: Adrian Chadd To: Gleb Smirnoff Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Feb 2014 07:48:46 -0000 On 12 February 2014 21:48, Gleb Smirnoff wrote: > On Fri, Feb 07, 2014 at 04:12:56PM -0800, Adrian Chadd wrote: > A> I've been knee deep in the flowtable code looking at some of the less > A> .. predictable ways it behaves. > A> > A> One of them is the collisions that do pop up from time to time. > A> > A> I dug into it in quite some depth and found out what's going on. This > A> assumes it's a per-CPU flowtable. > A> > A> * A flowtable lookup is performed, on say CPU #0 > A> * the flowtable lookup fails, so it goes to do a flowtable insert > A> * .. but since in between the two, the flowtable "lock" is released so > A> it can do a route/adjacency lookup, and that grabs a lock > A> * .. then the flowtable insert is done on a totally different CPU > A> * .. which happens to _have_ the flowtable entry already, so it fails > A> as a collision which already has a matching entry. > A> > A> Now, the reason for this is primarily because there's no CPU pinning > A> in the lookup path and if there's contention during the route lookup > A> phase, the scheduler may decide to schedule the kernel thread on a > A> totally different CPU to the one that was running the code when the > A> lock was entered. > A> > A> Now, Gleb's recent changes seem to have made the instances of this > A> drop, but he didn't set out to fix it. So there's something about his > A> changes that has changed the locking/contention profile that I was > A> using to easily reproduce it. > A> > A> In any case - the reason it's happening above is because there's no > A> actual lock held over the whole lookup/insert path. It's a per-CPU > A> critical enter/exit path, so the only way to guarantee consistency is > A> to use sched_pin() for the entirety of the function. > A> > A> I'll go and test that out in a moment and see if it quietens the > A> collisions that I see in lab testing. > A> > A> Has anyone already debugged/diagnosed this? Can anyone think of an > A> alternate (better) way to fix this? > > Can't we just reuse the colliding entry? > > Can you evaluate patch attached (against head) in your testing > conditions? It's late and I'm exhausted, but I think one of the things that stood out to me was the uncertainty as to whether the thread being preempted by some work would end up being resumed on the same CPU, or whether it could resume on a different CPU. If that happened whilst the flowtable code ran on a different CPU, things could be freed from underneath the code that's about to use it. I'll look at this in some more depth tomorrow, but I think the safest thing to do right now is to sched_pin() to make the concurrency locking model consistent. The window of opportunity for things to get resumed on a different CPU may be almost-zero, but we're doing this over a million times a second on some forwarding based platforms. Thanks, -a