From owner-freebsd-arch@FreeBSD.ORG Wed Dec 10 02:24:34 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 683381065678 for ; Wed, 10 Dec 2008 02:24:34 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from rn-out-0910.google.com (rn-out-0910.google.com [64.233.170.191]) by mx1.freebsd.org (Postfix) with ESMTP id 2039D8FC08 for ; Wed, 10 Dec 2008 02:24:33 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: by rn-out-0910.google.com with SMTP id j71so335451rne.12 for ; Tue, 09 Dec 2008 18:24:33 -0800 (PST) Received: by 10.150.145.20 with SMTP id s20mr505217ybd.121.1228875872426; Tue, 09 Dec 2008 18:24:32 -0800 (PST) Received: from ?10.0.1.199? (udp005586uds.hawaiiantel.net [72.234.105.237]) by mx.google.com with ESMTPS id k35sm2345243rnd.3.2008.12.09.18.24.30 (version=SSLv3 cipher=RC4-MD5); Tue, 09 Dec 2008 18:24:31 -0800 (PST) Date: Tue, 9 Dec 2008 16:22:44 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: arch@freebsd.org Message-ID: <20081209155714.K960@desktop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: UMA & mbuf cache utilization. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2008 02:24:34 -0000 Hello, Nokia has graciously allowed me to release a patch which I developed to improve general mbuf and cluster cache behavior. This is based on others observations that due to simple alignment at 2k and 256k we achieve a poor cache distribution for the header area of packets and the most heavily used mbuf header fields. In addition, modern machines stripe memory access across several memories and even memory controllers. Accessing heavily aligned locations such as these can also create load imbalances among memories. To solve this problem I have added two new features to UMA. The first is the zone flag UMA_ZONE_CACHESPREAD. This flag modifies the meaning of the alignment field such that start addresses are staggered by at least align + 1 bytes. In the case of clusters and mbufs this means adding uma_cache_align + 1 bytes to the amount of storage allocated. This creates a certain constant amount of waste, 3% and 12% respectively. It also means we must use contiguous physical and virtual memory consisting of several pages to efficiently use the memory and land on as many cache lines as possible. Because contiguous physical memory is not always available, the allocator had to have a fallback mechanism. We don't simply want to have all mbuf allocations check two zones as once we deplete available contiguous memory the check on the first zone will always fail using the most expensive code path. To resolve this issue, I added the ability for secondary zones to stack on top of multiple primary zones. Secondary zones are zones which get their storage from another zone but handle their own caching, ctors, dtors, etc. By adding this feature a secondary zone can be created that can allocate either from the contiguous memory pool or the non-contiguous single-page pool depending on availability. It is also much faster to fail between them deep in the allocator because it is only required when we exhaust the already available mbuf memory. For mbufs and clusters there are now three zones each. A contigmalloc backed zone, a single-page allocator zone, and a secondary zone with the original zome_mbuf or zone_clust name. The packet zone also takes from both available mbuf zones. The individual backend zones are not exposed outside of kern_mbuf.c. Currently, each backend zone can have its own limit. The secondary zone only blocks when both are full. Statistic wise the limit should be reported as the sum of the backend limits, however, that isn't presently done. The secondary zone can not have its own limit independent of the backends at this time. I'm not sure if that's valuable or not. I have test results from nokia which show a dramatic improvement in several workloads but which I am probably not at liberty to discuss. I'm in the process of convincing Kip to help me get some benchmark data on our stack. Also as part of the patch I renamed a few functions since many were non-obvious and grew new keg abstractions to tidy things up a bit. I suspect those of you with UMA experience (robert, bosko) will find the renaming a welcome improvement. The patch is available at: http://people.freebsd.org/~jeff/mbuf_contig.diff I would love to hear any feedback you may have. I have been developing this and testing various version off and on for months, however, this is a fresh port to current and it is a little green so should be considered experimental. In particular, I'm most nervous about how the vm will respond to new pressure on contig physical pages. I'm also interested in hearing from embedded/limited memory people about how we might want to limit or tune this. Thanks, Jeff