From owner-freebsd-arch@FreeBSD.ORG Mon Aug 18 15:03:22 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E44C8D4E; Mon, 18 Aug 2014 15:03:22 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A923831E8; Mon, 18 Aug 2014 15:03:22 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1XJKVJ-0009pe-AL; Mon, 18 Aug 2014 14:49:33 +0400 Message-ID: <53F215A9.8010708@FreeBSD.org> Date: Mon, 18 Aug 2014 19:03:05 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: arch@freebsd.org Subject: superpages for UMA Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Andrey V. Elsukov" , Gleb Smirnoff X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Aug 2014 15:03:23 -0000 Hello list. Currently UMA(9) uses PAGE_SIZE kegs to store items in. It seems fine for most usage scenarios, however there are some where very large number of items is required. I've run into this problem while using ipfw tables (radix based) with ~50k records. This is how `pmcstat -TS DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK -w1` looks like: PMC: [DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK] Samples: 2359 (100.0%) , 0 unresolved %SAMP IMAGE FUNCTION CALLERS 28.7 kernel rn_match ipfw_lookup_table:21.7 rtalloc_fib_nolock:7.0 25.5 ipfw.ko ipfw_chk ipfw_check_hook 6.0 kernel rn_lookup ipfw_lookup_table Some numbers: table entry occupies 128 bytes, so we may store no more than 30 records in single page-sized keg. 50k records require more than 1500 kegs. As far as I understand second-level TLB for modern Intel CPU may be 256 or 512 entries( for 4K pages ), so using large number of entries results in TLB cache misses constantly happening. Other examples: Route tables (in current implementation): struct rte occupies more than 128 bytes and storing full-view (> 500k routes) would result in TLB misses happening all of the time. Various stateful packet processing: modern SLB/firewall can have millions of states. Regardless of state size PAGE_SIZE'd kegs is not the best choice. All of these can be addressed: Ipwa tables/ipfw dynamic state allocation code can (and will) be rewritten to use uma+uma_zone_set_allocf (suggested by glebius), radix should simply be changed to a different lookup algo (as it is happening in ipfw tables). However, we may consider on adding another UMA flag to allocate 2M/1G-sized kegs per request. (Additionally, Intel Haswell arch has 512 entries in STLB shared? between 4k/2M so it should help the former). What do you think?