From owner-freebsd-net@FreeBSD.ORG Sat Feb 8 00:12:57 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC4C7522; Sat, 8 Feb 2014 00:12:57 +0000 (UTC) Received: from mail-qc0-x22f.google.com (mail-qc0-x22f.google.com [IPv6:2607:f8b0:400d:c01::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9B557185F; Sat, 8 Feb 2014 00:12:57 +0000 (UTC) Received: by mail-qc0-f175.google.com with SMTP id x13so7251336qcv.34 for ; Fri, 07 Feb 2014 16:12:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=290YDpzC24+MbRtM0BeTpWV9PQF57zhBvZhb6GDqtZg=; b=oDJ85mloMepNIkphWUVLHQIwYe63m5k5+yNmyG2atgfvUnevtOdwVk50qxARPsTa44 iye8SIP8K3GhYly2CYQrnsX1Whv8erQOeTObIBfs6v5sFYZv+bM6VAzNo9wNaeQ6NJWV RobXfsyFrAsswtcyO9kuKibBFtfvoQkJLBYLIOAzidZ+jP0FgjHUBf6HyOdqTMoBRtbj Tz2ZsfLiuoaNtbQM2gTESBLPDdvcndlGacFTUTl/82SA8xnLAWDNBsdc0OeK4PR3TfFU dB5Z6Zf+Ygz62QWjdXiz/LHwDnSfuvLuz6rbZCzhpx54ibJ4pYsjiZYagYhl4fi0xLQC zAzQ== MIME-Version: 1.0 X-Received: by 10.140.50.235 with SMTP id s98mr3479529qga.12.1391818376799; Fri, 07 Feb 2014 16:12:56 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.52.8 with HTTP; Fri, 7 Feb 2014 16:12:56 -0800 (PST) Date: Fri, 7 Feb 2014 16:12:56 -0800 X-Google-Sender-Auth: Zrp8Eb97FVS4GLRmXRn0pUIHxw4 Message-ID: Subject: flowtable, collisions, locking and CPU affinity From: Adrian Chadd To: "freebsd-arch@freebsd.org" , FreeBSD Net Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Feb 2014 00:12:58 -0000 Hi, I've been knee deep in the flowtable code looking at some of the less .. predictable ways it behaves. One of them is the collisions that do pop up from time to time. I dug into it in quite some depth and found out what's going on. This assumes it's a per-CPU flowtable. * A flowtable lookup is performed, on say CPU #0 * the flowtable lookup fails, so it goes to do a flowtable insert * .. but since in between the two, the flowtable "lock" is released so it can do a route/adjacency lookup, and that grabs a lock * .. then the flowtable insert is done on a totally different CPU * .. which happens to _have_ the flowtable entry already, so it fails as a collision which already has a matching entry. Now, the reason for this is primarily because there's no CPU pinning in the lookup path and if there's contention during the route lookup phase, the scheduler may decide to schedule the kernel thread on a totally different CPU to the one that was running the code when the lock was entered. Now, Gleb's recent changes seem to have made the instances of this drop, but he didn't set out to fix it. So there's something about his changes that has changed the locking/contention profile that I was using to easily reproduce it. In any case - the reason it's happening above is because there's no actual lock held over the whole lookup/insert path. It's a per-CPU critical enter/exit path, so the only way to guarantee consistency is to use sched_pin() for the entirety of the function. I'll go and test that out in a moment and see if it quietens the collisions that I see in lab testing. Has anyone already debugged/diagnosed this? Can anyone think of an alternate (better) way to fix this? Thanks, -a