From owner-freebsd-net@FreeBSD.ORG Fri Apr 26 16:50:29 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 675AA789 for ; Fri, 26 Apr 2013 16:50:29 +0000 (UTC) (envelope-from weiler@soe.ucsc.edu) Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by mx1.freebsd.org (Postfix) with ESMTP id 3DF69127E for ; Fri, 26 Apr 2013 16:50:29 +0000 (UTC) Received: by mail-pa0-f46.google.com with SMTP id ld11so292831pab.5 for ; Fri, 26 Apr 2013 09:50:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=0VUF74/GTgk/xMTnYcjsA8WM2h03UT42QyieuzySdaU=; b=eysRZQ3x98kC60yPGo7wjs5kVN9NgQL+X7gWX/9VjJ5qAe3nkWosekiGvmYRZPaTqV PqNteVWzVkeLpEc69sKLbFOH72P25vNpbS7BfFtWHzhE9Z1vomhR2qbh0bc+WZiQsADq wd+xuy4ncapBM84uqg/ACvkYztVg/RD1kY9gA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-gm-message-state; bh=0VUF74/GTgk/xMTnYcjsA8WM2h03UT42QyieuzySdaU=; b=V0BLqXQOcMJS7P5NN5xZ1cQMThU3/Q9aXDwkcLgCHP7cIHXSvBzFhPLH+4emWlixEk s6UIF2c/DO+EjfXwAmsFvFDm9LSkNNL58Tq6F2z2J6IEGLX2iTluyIH7wmeWBw1yztHf nmuexbXIF9MmP0z/NFs+rMkyLiDtvItGEOoOM5xfm3qBH48LcrgulvaR+Fxc6slVXX/E esQty+DnoJMF4N6oUqjlpZnlVuF2H6mLi9vuM7yxr163CuiFHqo3ceAswQyMH7KzrpWE RG6M1N+RRvEQ3USZwLHrnUu3yzxdV6fSxGuXTWdUHVMjLmsiDR3mFsR6zG7TDyYeztwD /HiQ== X-Received: by 10.66.88.38 with SMTP id bd6mr23694735pab.184.1366995028761; Fri, 26 Apr 2013 09:50:28 -0700 (PDT) Received: from [10.4.0.18] (fw01.cghub.ucsc.edu. [192.35.223.10]) by mx.google.com with ESMTPSA id zy5sm3721842pbb.43.2013.04.26.09.50.26 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 26 Apr 2013 09:50:27 -0700 (PDT) Message-ID: <517AB053.9070505@soe.ucsc.edu> Date: Fri, 26 Apr 2013 09:50:27 -0700 From: Erich Weiler User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Ryan Stone Subject: Re: pf performance? References: <5176E5C1.9090601@soe.ucsc.edu> <20130426134224.GV76816@FreeBSD.org> <517A93FE.7020209@soe.ucsc.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Gm-Message-State: ALoCoQlpiVWAtvdALWZJZ3k/HBTsbGt6ipEfboLm1QboyTSPJFFTKAjQeUSM0BIPRMs7GM3la4Er Cc: Paul Tatarsky , freebsd-net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Apr 2013 16:50:29 -0000 > In other words, until I see like 100% system usage in one core, I > would have room to grow? > > > Probably not. Mutexes in FreeBSD use "adaptive spinning". This means > that when a thread is unable to acquire a mutex, if the owner of the > mutex is still running on another CPU core then the thread will spin and > wait for the mutex to become free rather than block. This improves the > latency of acquiring the mutex and saves us many expensive calls through > the scheduler, but the downside is that you have one heavily contended > mutex you can have many cores wasted spinning on the lock. In your case > it is likely that your interrupt threads are spinning on the pf lock. > You can partially confirm this by profiling the system. The quick way is: Truly fascinating! So my cores that are sucking up CPU on interrupts may be just "spinning" waiting for a lock and not doing real work? That would explain a lot. > kldload hwpmc > pmcstat -S unhalted-cycles -T > (if unhalted-cycles does not work, try instructions). We'll try this. It doesn't look like the hwpmc module exists on this box but we'll bring it in and try that. > If _mtx_lock_sleep is the top function then you are hitting mutex > contention and more cores are unlikely to help. In this case you might > actually be able to improve performance by *reducing* the number of > threads that are contending in pf, for example by using fewer receive > queues in your NIC. > > If this machine is dedicated for pf then setting sysctl net.isr.direct=0 > might also improve performance, by forcing all packets to go through a > single netisr thread (assuming that net.isr.maxthreads is 1). Note that > this will apply to traffic that does not go through pf, so if this > machine were doing other network things (e.g. serving NFS) then it would > bottleneck your other traffic behind your pf traffic. It's only doing pf stuff so this is something we'll try. Thanks for the enlightening information!!