From owner-freebsd-net@FreeBSD.ORG Thu Mar 15 01:45:21 2007 Return-Path: X-Original-To: net@freebsd.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6506816A402 for ; Thu, 15 Mar 2007 01:45:21 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.225]) by mx1.freebsd.org (Postfix) with ESMTP id 270E313C45B for ; Thu, 15 Mar 2007 01:45:21 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: by wx-out-0506.google.com with SMTP id s18so395816wxc for ; Wed, 14 Mar 2007 18:45:20 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=pmCzNguzMqsNToDI10USYgJC2MKIJ0Lhcupv2Lsfhk1pIqA9fvVghdOack7GvqJPrbR0H2mnquaz5Wzb6/g3Um2A9TT4goYUV1bG12eeP819AgaJ93suLq9v3paN7C9WpF5SI9A/srrbROWX0sdr8JKuD0LsaXBG9jOjR2tC7Gk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=PnGtX7GuTXv7CY6MpqzQ8kL/DrOKYtUzIV4vsX06kQ0XkY3StG9x3fzE8ydpuWpkI0dIW9YltFbI8E9YfsU0GhAyshuUl55eBjvLXwrr4sb7UCF1CD7jKrE3FM3jUmGPoWKNLRiABzOvj5g54Sjj0adfUtepyCNIjSiBs2P45As= Received: by 10.90.118.12 with SMTP id q12mr26771agc.1173923120915; Wed, 14 Mar 2007 18:45:20 -0700 (PDT) Received: by 10.90.25.1 with HTTP; Wed, 14 Mar 2007 18:45:20 -0700 (PDT) Message-ID: Date: Wed, 14 Mar 2007 18:45:20 -0700 From: "Kip Macy" To: "Kris Kennaway" , net@freebsd.org In-Reply-To: <20070315011511.GA55003@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070315011511.GA55003@xor.obsecurity.org> Cc: Subject: Re: Scalability problem from route refcounting X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Mar 2007 01:45:21 -0000 Apologies in advance if you have already answered this question elsewhere - can you point me to a HOWTO for replicating the test in my local environment? -Kip On 3/14/07, Kris Kennaway wrote: > I have recently started looking at database performance over gigabit > ethernet, and there seems to be a bottleneck coming from the way route > reference counting is implemented. On an 8-core system it looks like > we spend a lot of time waiting for the rtentry mutex: > > max total wait_total count avg wait_avg cnt_hold > cnt_lock name > [...] > 408 950496 1135994 301418 3 3 24876 > 55936 net/if_ethersubr.c:397 (sleep mutex:bge1) > 974 968617 1515169 253772 3 5 14741 > 60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1) > 2415 18255976 1607511 253841 71 6 125174 > 3131 netinet/tcp_input.c:770 (sleep mutex:inp) > 233 1850252 2080506 141817 13 14 0 > 126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp) > 384 6895050 2737492 299002 23 9 92100 > 73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1) > 626 5342286 2760193 301477 17 9 47616 > 54158 net/route.c:147 (sleep mutex:radix node head) > 326 3562050 3381510 301477 11 11 133968 > 110104 net/route.c:197 (sleep mutex:rtentry) > 146 947173 5173813 301477 3 17 44578 > 120961 net/route.c:1290 (sleep mutex:rtentry) > 146 953718 5501119 301476 3 18 63285 > 121819 netinet/ip_output.c:610 (sleep mutex:rtentry) > 50 4530645 7885304 1423098 3 5 642391 > 788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain) > > i.e. during a 30 second sample we spend a total of >14 seconds (on all > cpus) waiting to acquire the rtentry mutex. > > This appears to be because (among other things), we increment and then > decrement the route refcount for each packet we send, each of which > requires acquiring the rtentry mutex for that route before adjusting > the refcount. So multiplexing traffic for lots of connections over a > single route is being partly rate-limited by those mutex operations. > > This is not the end of the story though, the bge driver is a serious > bottleneck on its own (e.g. I nulled out the route locking since it is > not relevant in my environment, at least for the purposes of this > test, and that exposed bge as the next problem -- but other drivers > may not be so bad). > > Kris > >