From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 29 22:13:09 2005 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 983A016A41C for ; Wed, 29 Jun 2005 22:13:09 +0000 (GMT) (envelope-from andrit@ukr.net) Received: from storage.ukr.net (storage.ukr.net [212.42.65.69]) by mx1.FreeBSD.org (Postfix) with ESMTP id 27B6143D48 for ; Wed, 29 Jun 2005 22:13:08 +0000 (GMT) (envelope-from andrit@ukr.net) Received: from dialup-147.41.sitel.com.ua ([217.27.147.41] helo=ertpc) by storage.ukr.net with smtp ID 1Dnknq-000CN5-0n for hackers@freebsd.org; Thu, 30 Jun 2005 01:13:07 +0300 Message-ID: <000d01c57cf7$b9b6f9f0$29931bd9@ertpc> From: "ant" To: Date: Thu, 30 Jun 2005 01:08:55 +0300 MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Cc: Subject: hot path optimizations in uma_zalloc() & uma_zfree() X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2005 22:13:09 -0000 Hi folks. I just tryed to make buckets management in perCPU cache like in Solaris (see paper of Jeff Bonwick - Magazines and Vmem) and got perfomance gain around 10% in my test program. Then i made another minor code optimization and got another 10%. The program just creates and destroys sockets in loop. I suppose the reason of first gain lies in increasing of cpu cache hits. In current fbsd code allocations and freeings deal with separate buckets. Buckets are changed when one of them became full or empty first. In Solaris this work is pure LIFO: i.e. alloc() and free() work with one bucket - the current bucket (it is called magazine there), that's why cache hit rate is bigger. Another optimization is very trivial, for example: - bucket->ub_cnt--; - item = bucket->ub_bucket[bucket->ub_cnt]; + item = bucket->ub_bucket[--bucket->ub_cnt]; (see the patch) The test program: #include #include main(int argc, char *argv[]) { int *fd, n, i,j, iters=100000; n = atoi(argv[1]); fd = (int*) malloc(sizeof(*fd) * n); iters /= n; for (i=0; i time ./sockloop 1 # first arg is the number of sockets that created in one iteration 0.093u 2.650s 0:02.75 99.6% 5+180k 0+0io 0pf+0w > time ./sockloop 1 0.108u 2.298s 0:02.41 99.1% 5+181k 0+0io 0pf+0w > time ./sockloop 1 0.127u 2.278s 0:02.41 99.1% 5+177k 0+0io 0pf+0w > time ./sockloop 10 # number of iterations is changed according to arg (see code) 0.054u 2.239s 0:02.30 99.1% 5+181k 0+0io 0pf+0w > time ./sockloop 10 0.069u 2.199s 0:02.27 99.1% 6+184k 0+0io 0pf+0w > time ./sockloop 10 0.086u 2.185s 0:02.28 99.1% 5+178k 0+0io 0pf+0w > time ./sockloop 100 0.101u 2.393s 0:02.51 99.2% 5+179k 0+0io 0pf+0w > time ./sockloop 100 0.085u 2.505s 0:02.60 99.2% 5+180k 0+0io 0pf+0w > time ./sockloop 100 0.054u 2.441s 0:02.50 99.6% 5+178k 0+0io 0pf+0w > time ./sockloop 1000 0.093u 2.739s 0:02.84 99.2% 5+181k 0+0io 0pf+0w > time ./sockloop 1000 0.085u 2.797s 0:02.89 99.3% 5+180k 0+0io 0pf+0w > time ./sockloop 1000 0.117u 2.689s 0:02.82 98.9% 5+179k 0+0io 0pf+0w The results of first optimization (only buckets management) > time ./sockloop 1 0.125u 1.938s 0:02.06 99.5% 5+180k 0+0io 0pf+0w > time ./sockloop 1 0.070u 1.993s 0:02.06 100.0% 5+180k 0+0io 0pf+0w > time ./sockloop 1 0.110u 1.953s 0:02.06 100.0% 5+177k 0+0io 0pf+0w > time ./sockloop 10 0.093u 1.776s 0:01.87 99.4% 5+180k 0+0io 0pf+0w > time ./sockloop 10 0.116u 1.754s 0:01.87 99.4% 5+181k 0+0io 0pf+0w > time ./sockloop 10 0.093u 1.777s 0:01.87 99.4% 5+181k 0+0io 0pf+0w > time ./sockloop 100 0.100u 2.182s 0:02.29 99.5% 5+180k 0+0io 0pf+0w > time ./sockloop 100 0.093u 2.174s 0:02.27 99.5% 5+180k 0+0io 0pf+0w > time ./sockloop 100 0.078u 2.158s 0:02.24 99.1% 5+180k 0+0io 0pf+0w > time ./sockloop 1000 0.101u 2.403s 0:02.51 99.6% 5+180k 0+0io 0pf+0w > time ./sockloop 1000 0.124u 2.381s 0:02.52 99.2% 5+180k 0+0io 0pf+0w > time ./sockloop 1000 0.125u 2.373s 0:02.51 99.2% 5+178k 0+0io 0pf+0w The results of both optimizations > time ./sockloop 1 0.062u 1.785s 0:01.85 99.4% 5+180k 0+0io 0pf+0w > time ./sockloop 1 0.124u 1.722s 0:01.85 99.4% 5+180k 0+0io 0pf+0w > time ./sockloop 1 0.087u 1.759s 0:01.85 98.9% 5+177k 0+0io 0pf+0w > time ./sockloop 10 0.069u 1.684s 0:01.75 99.4% 5+181k 0+0io 0pf+0w > time ./sockloop 10 0.070u 1.673s 0:01.74 100.0% 5+180k 0+0io 0pf+0w > time ./sockloop 10 0.070u 1.672s 0:01.74 100.0% 5+177k 0+0io 0pf+0w > time ./sockloop 100 0.077u 2.102s 0:02.18 99.5% 5+180k 0+0io 0pf+0w > time ./sockloop 100 0.116u 2.062s 0:02.18 99.5% 5+180k 0+0io 0pf+0w > time ./sockloop 100 0.055u 2.126s 0:02.19 99.0% 5+178k 0+0io 0pf+0w > time ./sockloop 1000 0.077u 2.298s 0:02.39 98.7% 5+181k 0+0io 0pf+0w > time ./sockloop 1000 0.070u 2.340s 0:02.42 99.5% 5+178k 0+0io 0pf+0w > time ./sockloop 1000 0.054u 2.320s 0:02.39 99.1% 5+179k 0+0io 0pf+0w the patch is for uma_core.c from RELENG_5, but i checked uma_core.c in CURRENT - it's the same regarding to thiese improvements. I don't have any commit rights, so the patch is just for reviewing. Here it is: --- sys/vm/uma_core.c.orig Wed Jun 29 21:46:52 2005 +++ sys/vm/uma_core.c Wed Jun 29 23:09:32 2005 @@ -1830,8 +1830,7 @@ if (bucket) { if (bucket->ub_cnt > 0) { - bucket->ub_cnt--; - item = bucket->ub_bucket[bucket->ub_cnt]; + item = bucket->ub_bucket[--bucket->ub_cnt]; #ifdef INVARIANTS bucket->ub_bucket[bucket->ub_cnt] = NULL; #endif @@ -2252,7 +2251,7 @@ cache = &zone->uz_cpu[cpu]; zfree_start: - bucket = cache->uc_freebucket; + bucket = cache->uc_allocbucket; if (bucket) { /* @@ -2263,8 +2262,7 @@ if (bucket->ub_cnt < bucket->ub_entries) { KASSERT(bucket->ub_bucket[bucket->ub_cnt] == NULL, ("uma_zfree: Freeing to non free bucket index.")); - bucket->ub_bucket[bucket->ub_cnt] = item; - bucket->ub_cnt++; + bucket->ub_bucket[bucket->ub_cnt++] = item; #ifdef INVARIANTS ZONE_LOCK(zone); if (keg->uk_flags & UMA_ZONE_MALLOC) @@ -2275,7 +2273,7 @@ #endif CPU_UNLOCK(cpu); return; - } else if (cache->uc_allocbucket) { + } else if (cache->uc_freebucket) { #ifdef UMA_DEBUG_ALLOC printf("uma_zfree: Swapping buckets.\n"); #endif @@ -2283,8 +2281,7 @@ * We have run out of space in our freebucket. * See if we can switch with our alloc bucket. */ - if (cache->uc_allocbucket->ub_cnt < - cache->uc_freebucket->ub_cnt) { + if (cache->uc_freebucket->ub_cnt == 0) { bucket = cache->uc_freebucket; cache->uc_freebucket = cache->uc_allocbucket; cache->uc_allocbucket = bucket; if one will decide to commit first optimization (about buckets), then there must some adjustments be made also regarding correct statistics gathering. Regards, Andriy Tkachuk.