From owner-svn-src-head@freebsd.org Tue Jan 15 00:02:12 2019 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADAAE148B8C8; Tue, 15 Jan 2019 00:02:11 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2E6CF837D9; Tue, 15 Jan 2019 00:02:07 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 1ED9DA47C; Tue, 15 Jan 2019 00:02:07 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x0F027V5092947; Tue, 15 Jan 2019 00:02:07 GMT (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x0F026EW092943; Tue, 15 Jan 2019 00:02:06 GMT (envelope-from glebius@FreeBSD.org) Message-Id: <201901150002.x0F026EW092943@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: glebius set sender to glebius@FreeBSD.org using -f From: Gleb Smirnoff Date: Tue, 15 Jan 2019 00:02:06 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r343026 - in head: lib/libmemstat sys/vm X-SVN-Group: head X-SVN-Commit-Author: glebius X-SVN-Commit-Paths: in head: lib/libmemstat sys/vm X-SVN-Commit-Revision: 343026 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 2E6CF837D9 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.95 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.95)[-0.953,0]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2019 00:02:12 -0000 Author: glebius Date: Tue Jan 15 00:02:06 2019 New Revision: 343026 URL: https://svnweb.freebsd.org/changeset/base/343026 Log: o Move zone limit from keg level up to zone level. This means that now two zones sharing a keg may have different limits. Now this is going to work: zone = uma_zcreate(); uma_zone_set_max(zone, limit); zone2 = uma_zsecond_create(zone); uma_zone_set_max(zone2, limit2); Kegs no longer have uk_maxpages field, but zones have uz_items. When set, it may be rounded up to minimum possible CPU bucket cache size. For small limits bucket cache can also be reconfigured to be smaller. Counter uz_items is updated whenever items transition from keg to a bucket cache or directly to a consumer. If zone has uz_maxitems set and it is reached, then we are going to sleep. o Since new limits don't play well with multi-keg zones, remove them. The idea of multi-keg zones was introduced exactly 10 years ago, and never have had a practical usage. In discussion with Jeff we came to a wild agreement that if we ever want to reintroduce the idea of a smart allocator that would be able to choose between two (or more) totally different backing stores, that choice should be made one level higher than UMA, e.g. in malloc(9) or in mget(), or whatever and choice should be controlled by the caller. o Sleeping code is improved to account number of sleepers and wake them one by one, to avoid thundering herd problem. o Flag UMA_ZONE_NOBUCKETCACHE removed, instead uma_zone_set_maxcache() KPI added. Having no bucket cache basically means setting maxcache to 0. o Now with many fields added and many removed (no multi-keg zones!) make sure that struct uma_zone is perfectly aligned. Reviewed by: markj, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D17773 Modified: head/lib/libmemstat/memstat_uma.c head/sys/vm/uma.h head/sys/vm/uma_core.c head/sys/vm/uma_int.h head/sys/vm/vm_page.c Modified: head/lib/libmemstat/memstat_uma.c ============================================================================== --- head/lib/libmemstat/memstat_uma.c Mon Jan 14 21:46:56 2019 (r343025) +++ head/lib/libmemstat/memstat_uma.c Tue Jan 15 00:02:06 2019 (r343026) @@ -448,12 +448,7 @@ skip_percpu: mtp->mt_memalloced = mtp->mt_numallocs * mtp->mt_size; mtp->mt_memfreed = mtp->mt_numfrees * mtp->mt_size; mtp->mt_bytes = mtp->mt_memalloced - mtp->mt_memfreed; - if (kz.uk_ppera > 1) - mtp->mt_countlimit = kz.uk_maxpages / - kz.uk_ipers; - else - mtp->mt_countlimit = kz.uk_maxpages * - kz.uk_ipers; + mtp->mt_countlimit = uz.uz_max_items; mtp->mt_byteslimit = mtp->mt_countlimit * mtp->mt_size; mtp->mt_count = mtp->mt_numallocs - mtp->mt_numfrees; for (i = 0; i < ndomains; i++) { Modified: head/sys/vm/uma.h ============================================================================== --- head/sys/vm/uma.h Mon Jan 14 21:46:56 2019 (r343025) +++ head/sys/vm/uma.h Tue Jan 15 00:02:06 2019 (r343026) @@ -218,17 +218,6 @@ uma_zone_t uma_zsecond_create(char *name, uma_ctor cto uma_init zinit, uma_fini zfini, uma_zone_t master); /* - * Add a second master to a secondary zone. This provides multiple data - * backends for objects with the same size. Both masters must have - * compatible allocation flags. Presently, UMA_ZONE_MALLOC type zones are - * the only supported. - * - * Returns: - * Error on failure, 0 on success. - */ -int uma_zsecond_add(uma_zone_t zone, uma_zone_t master); - -/* * Create cache-only zones. * * This allows uma's per-cpu cache facilities to handle arbitrary @@ -285,10 +274,6 @@ uma_zone_t uma_zcache_create(char *name, int size, uma * NUMA aware Zone. Implements a best * effort first-touch policy. */ -#define UMA_ZONE_NOBUCKETCACHE 0x20000 /* - * Don't cache full buckets. Limit - * UMA to per-cpu state. - */ /* * These flags are shared between the keg and zone. In zones wishing to add @@ -510,6 +495,18 @@ int uma_zone_reserve_kva(uma_zone_t zone, int nitems); * int The effective value of nitems after rounding up based on page size */ int uma_zone_set_max(uma_zone_t zone, int nitems); + +/* + * Sets a high limit on the number of items allowed in zone's bucket cache + * + * Arguments: + * zone The zone to limit + * nitems The requested upper limit on the number of items allowed + * + * Returns: + * int The effective value of nitems set + */ +int uma_zone_set_maxcache(uma_zone_t zone, int nitems); /* * Obtains the effective limit on the number of items in a zone Modified: head/sys/vm/uma_core.c ============================================================================== --- head/sys/vm/uma_core.c Mon Jan 14 21:46:56 2019 (r343025) +++ head/sys/vm/uma_core.c Tue Jan 15 00:02:06 2019 (r343026) @@ -219,7 +219,12 @@ struct uma_bucket_zone bucket_zones[] = { /* * Flags and enumerations to be passed to internal functions. */ -enum zfreeskip { SKIP_NONE = 0, SKIP_DTOR, SKIP_FINI }; +enum zfreeskip { + SKIP_NONE = 0, + SKIP_CNT = 0x00000001, + SKIP_DTOR = 0x00010000, + SKIP_FINI = 0x00020000, +}; #define UMA_ANYDOMAIN -1 /* Special value for domain search. */ @@ -255,17 +260,17 @@ static void hash_free(struct uma_hash *hash); static void uma_timeout(void *); static void uma_startup3(void); static void *zone_alloc_item(uma_zone_t, void *, int, int); +static void *zone_alloc_item_locked(uma_zone_t, void *, int, int); static void zone_free_item(uma_zone_t, void *, void *, enum zfreeskip); static void bucket_enable(void); static void bucket_init(void); static uma_bucket_t bucket_alloc(uma_zone_t zone, void *, int); static void bucket_free(uma_zone_t zone, uma_bucket_t, void *); static void bucket_zone_drain(void); -static uma_bucket_t zone_alloc_bucket(uma_zone_t, void *, int, int); +static uma_bucket_t zone_alloc_bucket(uma_zone_t, void *, int, int, int); static uma_slab_t zone_fetch_slab(uma_zone_t, uma_keg_t, int, int); -static uma_slab_t zone_fetch_slab_multi(uma_zone_t, uma_keg_t, int, int); static void *slab_alloc_item(uma_keg_t keg, uma_slab_t slab); -static void slab_free_item(uma_keg_t keg, uma_slab_t slab, void *item); +static void slab_free_item(uma_zone_t zone, uma_slab_t slab, void *item); static uma_keg_t uma_kcreate(uma_zone_t zone, size_t size, uma_init uminit, uma_fini fini, int align, uint32_t flags); static int zone_import(uma_zone_t, void **, int, int, int); @@ -472,6 +477,7 @@ zone_try_fetch_bucket(uma_zone_t zone, uma_zone_domain zdom->uzd_nitems -= bucket->ub_cnt; if (ws && zdom->uzd_imin > zdom->uzd_nitems) zdom->uzd_imin = zdom->uzd_nitems; + zone->uz_bkt_count -= bucket->ub_cnt; } return (bucket); } @@ -482,11 +488,14 @@ zone_put_bucket(uma_zone_t zone, uma_zone_domain_t zdo { ZONE_LOCK_ASSERT(zone); + KASSERT(zone->uz_bkt_count < zone->uz_bkt_max, ("%s: zone %p overflow", + __func__, zone)); LIST_INSERT_HEAD(&zdom->uzd_buckets, bucket, ub_link); zdom->uzd_nitems += bucket->ub_cnt; if (ws && zdom->uzd_imax < zdom->uzd_nitems) zdom->uzd_imax = zdom->uzd_nitems; + zone->uz_bkt_count += bucket->ub_cnt; } static void @@ -509,15 +518,6 @@ zone_maxaction(uma_zone_t zone) taskqueue_enqueue(taskqueue_thread, &zone->uz_maxaction); } -static void -zone_foreach_keg(uma_zone_t zone, void (*kegfn)(uma_keg_t)) -{ - uma_klink_t klink; - - LIST_FOREACH(klink, &zone->uz_kegs, kl_link) - kegfn(klink->kl_keg); -} - /* * Routine called by timeout which is used to fire off some time interval * based calculations. (stats, hash size, etc.) @@ -562,8 +562,9 @@ zone_domain_update_wss(uma_zone_domain_t zdom) * Returns nothing. */ static void -keg_timeout(uma_keg_t keg) +zone_timeout(uma_zone_t zone) { + uma_keg_t keg = zone->uz_keg; KEG_LOCK(keg); /* @@ -601,20 +602,11 @@ keg_timeout(uma_keg_t keg) return; } } - KEG_UNLOCK(keg); -} -static void -zone_timeout(uma_zone_t zone) -{ - int i; - - zone_foreach_keg(zone, &keg_timeout); - - ZONE_LOCK(zone); - for (i = 0; i < vm_ndomains; i++) + for (int i = 0; i < vm_ndomains; i++) zone_domain_update_wss(&zone->uz_domain[i]); - ZONE_UNLOCK(zone); + + KEG_UNLOCK(keg); } /* @@ -744,6 +736,11 @@ bucket_drain(uma_zone_t zone, uma_bucket_t bucket) for (i = 0; i < bucket->ub_cnt; i++) zone->uz_fini(bucket->ub_bucket[i], zone->uz_size); zone->uz_release(zone->uz_arg, bucket->ub_bucket, bucket->ub_cnt); + ZONE_LOCK(zone); + zone->uz_items -= bucket->ub_cnt; + if (zone->uz_sleepers && zone->uz_items < zone->uz_max_items) + wakeup_one(zone); + ZONE_UNLOCK(zone); bucket->ub_cnt = 0; } @@ -1029,7 +1026,7 @@ zone_drain_wait(uma_zone_t zone, int waitok) * we're running. Normally the uma_rwlock would protect us but we * must be able to release and acquire the right lock for each keg. */ - zone_foreach_keg(zone, &keg_drain); + keg_drain(zone->uz_keg); ZONE_LOCK(zone); zone->uz_flags &= ~UMA_ZFLAG_DRAINING; wakeup(zone); @@ -1068,7 +1065,8 @@ keg_alloc_slab(uma_keg_t keg, uma_zone_t zone, int dom KASSERT(domain >= 0 && domain < vm_ndomains, ("keg_alloc_slab: domain %d out of range", domain)); - mtx_assert(&keg->uk_lock, MA_OWNED); + KEG_LOCK_ASSERT(keg); + MPASS(zone->uz_lockptr == &keg->uk_lock); allocf = keg->uk_allocf; KEG_UNLOCK(keg); @@ -1164,8 +1162,7 @@ startup_alloc(uma_zone_t zone, vm_size_t bytes, int do void *mem; int pages; - keg = zone_first_keg(zone); - + keg = zone->uz_keg; /* * If we are in BOOT_BUCKETS or higher, than switch to real * allocator. Zones with page sized slabs switch at BOOT_PAGEALLOC. @@ -1303,7 +1300,7 @@ noobj_alloc(uma_zone_t zone, vm_size_t bytes, int doma uma_keg_t keg; TAILQ_INIT(&alloctail); - keg = zone_first_keg(zone); + keg = zone->uz_keg; npages = howmany(bytes, PAGE_SIZE); while (npages > 0) { @@ -1525,8 +1522,6 @@ keg_large_init(uma_keg_t keg) { KASSERT(keg != NULL, ("Keg is null in keg_large_init")); - KASSERT((keg->uk_flags & UMA_ZFLAG_CACHEONLY) == 0, - ("keg_large_init: Cannot large-init a UMA_ZFLAG_CACHEONLY keg")); KASSERT((keg->uk_flags & UMA_ZONE_PCPU) == 0, ("%s: Cannot large-init a UMA_ZONE_PCPU keg", __func__)); @@ -1747,15 +1742,14 @@ zone_ctor(void *mem, int size, void *udata, int flags) zone->uz_sleeps = 0; zone->uz_count = 0; zone->uz_count_min = 0; + zone->uz_count_max = BUCKET_MAX; zone->uz_flags = 0; zone->uz_warning = NULL; /* The domain structures follow the cpu structures. */ zone->uz_domain = (struct uma_zone_domain *)&zone->uz_cpu[mp_ncpus]; + zone->uz_bkt_max = ULONG_MAX; timevalclear(&zone->uz_ratecheck); - keg = arg->keg; - ZONE_LOCK_INIT(zone, (arg->flags & UMA_ZONE_MTXCLASS)); - /* * This is a pure cache zone, no kegs. */ @@ -1768,6 +1762,7 @@ zone_ctor(void *mem, int size, void *udata, int flags) zone->uz_release = arg->release; zone->uz_arg = arg->arg; zone->uz_lockptr = &zone->uz_lock; + ZONE_LOCK_INIT(zone, (arg->flags & UMA_ZONE_MTXCLASS)); rw_wlock(&uma_rwlock); LIST_INSERT_HEAD(&uma_cachezones, zone, uz_link); rw_wunlock(&uma_rwlock); @@ -1780,6 +1775,7 @@ zone_ctor(void *mem, int size, void *udata, int flags) zone->uz_import = (uma_import)zone_import; zone->uz_release = (uma_release)zone_release; zone->uz_arg = zone; + keg = arg->keg; if (arg->flags & UMA_ZONE_SECONDARY) { KASSERT(arg->keg != NULL, ("Secondary zone on zero'd keg")); @@ -1818,12 +1814,7 @@ zone_ctor(void *mem, int size, void *udata, int flags) return (error); } - /* - * Link in the first keg. - */ - zone->uz_klink.kl_keg = keg; - LIST_INSERT_HEAD(&zone->uz_kegs, &zone->uz_klink, kl_link); - zone->uz_lockptr = &keg->uk_lock; + zone->uz_keg = keg; zone->uz_size = keg->uk_size; zone->uz_flags |= (keg->uk_flags & (UMA_ZONE_INHERIT | UMA_ZFLAG_INHERIT)); @@ -1889,12 +1880,10 @@ keg_dtor(void *arg, int size, void *udata) static void zone_dtor(void *arg, int size, void *udata) { - uma_klink_t klink; uma_zone_t zone; uma_keg_t keg; zone = (uma_zone_t)arg; - keg = zone_first_keg(zone); if (!(zone->uz_flags & UMA_ZFLAG_INTERNAL)) cache_drain(zone); @@ -1910,25 +1899,17 @@ zone_dtor(void *arg, int size, void *udata) */ zone_drain_wait(zone, M_WAITOK); /* - * Unlink all of our kegs. - */ - while ((klink = LIST_FIRST(&zone->uz_kegs)) != NULL) { - klink->kl_keg = NULL; - LIST_REMOVE(klink, kl_link); - if (klink == &zone->uz_klink) - continue; - free(klink, M_TEMP); - } - /* * We only destroy kegs from non secondary zones. */ - if (keg != NULL && (zone->uz_flags & UMA_ZONE_SECONDARY) == 0) { + if ((keg = zone->uz_keg) != NULL && + (zone->uz_flags & UMA_ZONE_SECONDARY) == 0) { rw_wlock(&uma_rwlock); LIST_REMOVE(keg, uk_link); rw_wunlock(&uma_rwlock); zone_free_item(kegs, keg, NULL, SKIP_NONE); } - ZONE_LOCK_FINI(zone); + if (zone->uz_lockptr == &zone->uz_lock) + ZONE_LOCK_FINI(zone); } /* @@ -2219,7 +2200,7 @@ uma_zsecond_create(char *name, uma_ctor ctor, uma_dtor uma_zone_t res; bool locked; - keg = zone_first_keg(master); + keg = master->uz_keg; memset(&args, 0, sizeof(args)); args.name = name; args.size = keg->uk_size; @@ -2263,90 +2244,11 @@ uma_zcache_create(char *name, int size, uma_ctor ctor, args.release = zrelease; args.arg = arg; args.align = 0; - args.flags = flags; + args.flags = flags | UMA_ZFLAG_CACHE; return (zone_alloc_item(zones, &args, UMA_ANYDOMAIN, M_WAITOK)); } -static void -zone_lock_pair(uma_zone_t a, uma_zone_t b) -{ - if (a < b) { - ZONE_LOCK(a); - mtx_lock_flags(b->uz_lockptr, MTX_DUPOK); - } else { - ZONE_LOCK(b); - mtx_lock_flags(a->uz_lockptr, MTX_DUPOK); - } -} - -static void -zone_unlock_pair(uma_zone_t a, uma_zone_t b) -{ - - ZONE_UNLOCK(a); - ZONE_UNLOCK(b); -} - -int -uma_zsecond_add(uma_zone_t zone, uma_zone_t master) -{ - uma_klink_t klink; - uma_klink_t kl; - int error; - - error = 0; - klink = malloc(sizeof(*klink), M_TEMP, M_WAITOK | M_ZERO); - - zone_lock_pair(zone, master); - /* - * zone must use vtoslab() to resolve objects and must already be - * a secondary. - */ - if ((zone->uz_flags & (UMA_ZONE_VTOSLAB | UMA_ZONE_SECONDARY)) - != (UMA_ZONE_VTOSLAB | UMA_ZONE_SECONDARY)) { - error = EINVAL; - goto out; - } - /* - * The new master must also use vtoslab(). - */ - if ((zone->uz_flags & UMA_ZONE_VTOSLAB) != UMA_ZONE_VTOSLAB) { - error = EINVAL; - goto out; - } - - /* - * The underlying object must be the same size. rsize - * may be different. - */ - if (master->uz_size != zone->uz_size) { - error = E2BIG; - goto out; - } - /* - * Put it at the end of the list. - */ - klink->kl_keg = zone_first_keg(master); - LIST_FOREACH(kl, &zone->uz_kegs, kl_link) { - if (LIST_NEXT(kl, kl_link) == NULL) { - LIST_INSERT_AFTER(kl, klink, kl_link); - break; - } - } - klink = NULL; - zone->uz_flags |= UMA_ZFLAG_MULTI; - zone->uz_slab = zone_fetch_slab_multi; - -out: - zone_unlock_pair(zone, master); - if (klink != NULL) - free(klink, M_TEMP); - - return (error); -} - - /* See uma.h */ void uma_zdestroy(uma_zone_t zone) @@ -2408,7 +2310,7 @@ uma_zalloc_arg(uma_zone_t zone, void *udata, int flags uma_bucket_t bucket; uma_cache_t cache; void *item; - int cpu, domain, lockfail; + int cpu, domain, lockfail, maxbucket; #ifdef INVARIANTS bool skipdbg; #endif @@ -2486,7 +2388,7 @@ zalloc_start: #endif zone->uz_ctor(item, zone->uz_size, udata, flags) != 0) { atomic_add_long(&zone->uz_fails, 1); - zone_free_item(zone, item, udata, SKIP_DTOR); + zone_free_item(zone, item, udata, SKIP_DTOR | SKIP_CNT); return (NULL); } #ifdef INVARIANTS @@ -2529,8 +2431,10 @@ zalloc_start: domain = UMA_ANYDOMAIN; /* Short-circuit for zones without buckets and low memory. */ - if (zone->uz_count == 0 || bucketdisable) + if (zone->uz_count == 0 || bucketdisable) { + ZONE_LOCK(zone); goto zalloc_item; + } /* * Attempt to retrieve the item from the per-CPU cache has failed, so @@ -2578,8 +2482,17 @@ zalloc_start: * We bump the uz count when the cache size is insufficient to * handle the working set. */ - if (lockfail && zone->uz_count < BUCKET_MAX) + if (lockfail && zone->uz_count < zone->uz_count_max) zone->uz_count++; + + if (zone->uz_max_items > 0) { + if (zone->uz_items >= zone->uz_max_items) + goto zalloc_item; + maxbucket = MIN(zone->uz_count, + zone->uz_max_items - zone->uz_items); + } else + maxbucket = zone->uz_count; + zone->uz_items += maxbucket; ZONE_UNLOCK(zone); /* @@ -2587,11 +2500,19 @@ zalloc_start: * works we'll restart the allocation from the beginning and it * will use the just filled bucket. */ - bucket = zone_alloc_bucket(zone, udata, domain, flags); + bucket = zone_alloc_bucket(zone, udata, domain, flags, maxbucket); CTR3(KTR_UMA, "uma_zalloc: zone %s(%p) bucket zone returned %p", zone->uz_name, zone, bucket); + ZONE_LOCK(zone); if (bucket != NULL) { - ZONE_LOCK(zone); + if (bucket->ub_cnt < maxbucket) { + MPASS(zone->uz_flags & UMA_ZFLAG_CACHE || + zone->uz_items >= maxbucket - bucket->ub_cnt); + zone->uz_items -= maxbucket - bucket->ub_cnt; + if (zone->uz_sleepers > 0 && + zone->uz_items < zone->uz_max_items) + wakeup_one(zone); + } critical_enter(); cpu = curcpu; cache = &zone->uz_cpu[cpu]; @@ -2606,7 +2527,7 @@ zalloc_start: domain == PCPU_GET(domain))) { cache->uc_allocbucket = bucket; zdom->uzd_imax += bucket->ub_cnt; - } else if ((zone->uz_flags & UMA_ZONE_NOBUCKETCACHE) != 0) { + } else if (zone->uz_bkt_count >= zone->uz_bkt_max) { critical_exit(); ZONE_UNLOCK(zone); bucket_drain(zone, bucket); @@ -2616,13 +2537,18 @@ zalloc_start: zone_put_bucket(zone, zdom, bucket, false); ZONE_UNLOCK(zone); goto zalloc_start; + } else { + zone->uz_items -= maxbucket; + if (zone->uz_sleepers > 0 && + zone->uz_items + 1 < zone->uz_max_items) + wakeup_one(zone); } /* * We may not be able to get a bucket so return an actual item. */ zalloc_item: - item = zone_alloc_item(zone, udata, domain, flags); + item = zone_alloc_item_locked(zone, udata, domain, flags); return (item); } @@ -2665,6 +2591,7 @@ keg_first_slab(uma_keg_t keg, int domain, bool rr) KASSERT(domain >= 0 && domain < vm_ndomains, ("keg_first_slab: domain %d out of range", domain)); + KEG_LOCK_ASSERT(keg); slab = NULL; start = domain; @@ -2690,7 +2617,7 @@ keg_fetch_free_slab(uma_keg_t keg, int domain, bool rr { uint32_t reserve; - mtx_assert(&keg->uk_lock, MA_OWNED); + KEG_LOCK_ASSERT(keg); reserve = (flags & M_USE_RESERVE) != 0 ? 0 : keg->uk_reserve; if (keg->uk_free <= reserve) @@ -2708,7 +2635,7 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdo bool rr; restart: - mtx_assert(&keg->uk_lock, MA_OWNED); + KEG_LOCK_ASSERT(keg); /* * Use the keg's policy if upper layers haven't already specified a @@ -2741,23 +2668,10 @@ restart: if (flags & M_NOVM) break; - if (keg->uk_maxpages && keg->uk_pages >= keg->uk_maxpages) { - keg->uk_flags |= UMA_ZFLAG_FULL; - /* - * If this is not a multi-zone, set the FULL bit. - * Otherwise slab_multi() takes care of it. - */ - if ((zone->uz_flags & UMA_ZFLAG_MULTI) == 0) { - zone->uz_flags |= UMA_ZFLAG_FULL; - zone_log_warning(zone); - zone_maxaction(zone); - } - if (flags & M_NOWAIT) - return (NULL); - zone->uz_sleeps++; - msleep(keg, &keg->uk_lock, PVM, "keglimit", 0); - continue; - } + KASSERT(zone->uz_max_items == 0 || + zone->uz_items <= zone->uz_max_items, + ("%s: zone %p overflow", __func__, zone)); + slab = keg_alloc_slab(keg, zone, domain, aflags); /* * If we got a slab here it's safe to mark it partially used @@ -2800,7 +2714,7 @@ zone_fetch_slab(uma_zone_t zone, uma_keg_t keg, int do uma_slab_t slab; if (keg == NULL) { - keg = zone_first_keg(zone); + keg = zone->uz_keg; KEG_LOCK(keg); } @@ -2815,87 +2729,6 @@ zone_fetch_slab(uma_zone_t zone, uma_keg_t keg, int do return (NULL); } -/* - * uma_zone_fetch_slab_multi: Fetches a slab from one available keg. Returns - * with the keg locked. On NULL no lock is held. - * - * The last pointer is used to seed the search. It is not required. - */ -static uma_slab_t -zone_fetch_slab_multi(uma_zone_t zone, uma_keg_t last, int domain, int rflags) -{ - uma_klink_t klink; - uma_slab_t slab; - uma_keg_t keg; - int flags; - int empty; - int full; - - /* - * Don't wait on the first pass. This will skip limit tests - * as well. We don't want to block if we can find a provider - * without blocking. - */ - flags = (rflags & ~M_WAITOK) | M_NOWAIT; - /* - * Use the last slab allocated as a hint for where to start - * the search. - */ - if (last != NULL) { - slab = keg_fetch_slab(last, zone, domain, flags); - if (slab) - return (slab); - KEG_UNLOCK(last); - } - /* - * Loop until we have a slab incase of transient failures - * while M_WAITOK is specified. I'm not sure this is 100% - * required but we've done it for so long now. - */ - for (;;) { - empty = 0; - full = 0; - /* - * Search the available kegs for slabs. Be careful to hold the - * correct lock while calling into the keg layer. - */ - LIST_FOREACH(klink, &zone->uz_kegs, kl_link) { - keg = klink->kl_keg; - KEG_LOCK(keg); - if ((keg->uk_flags & UMA_ZFLAG_FULL) == 0) { - slab = keg_fetch_slab(keg, zone, domain, flags); - if (slab) - return (slab); - } - if (keg->uk_flags & UMA_ZFLAG_FULL) - full++; - else - empty++; - KEG_UNLOCK(keg); - } - if (rflags & (M_NOWAIT | M_NOVM)) - break; - flags = rflags; - /* - * All kegs are full. XXX We can't atomically check all kegs - * and sleep so just sleep for a short period and retry. - */ - if (full && !empty) { - ZONE_LOCK(zone); - zone->uz_flags |= UMA_ZFLAG_FULL; - zone->uz_sleeps++; - zone_log_warning(zone); - zone_maxaction(zone); - msleep(zone, zone->uz_lockptr, PVM, - "zonelimit", hz/100); - zone->uz_flags &= ~UMA_ZFLAG_FULL; - ZONE_UNLOCK(zone); - continue; - } - } - return (NULL); -} - static void * slab_alloc_item(uma_keg_t keg, uma_slab_t slab) { @@ -2904,7 +2737,7 @@ slab_alloc_item(uma_keg_t keg, uma_slab_t slab) uint8_t freei; MPASS(keg == slab->us_keg); - mtx_assert(&keg->uk_lock, MA_OWNED); + KEG_LOCK_ASSERT(keg); freei = BIT_FFS(SLAB_SETSIZE, &slab->us_free) - 1; BIT_CLR(SLAB_SETSIZE, freei, &slab->us_free); @@ -2971,10 +2804,9 @@ zone_import(uma_zone_t zone, void **bucket, int max, i } static uma_bucket_t -zone_alloc_bucket(uma_zone_t zone, void *udata, int domain, int flags) +zone_alloc_bucket(uma_zone_t zone, void *udata, int domain, int flags, int max) { uma_bucket_t bucket; - int max; CTR1(KTR_UMA, "zone_alloc:_bucket domain %d)", domain); @@ -2983,7 +2815,6 @@ zone_alloc_bucket(uma_zone_t zone, void *udata, int do if (bucket == NULL) return (NULL); - max = MIN(bucket->ub_entries, zone->uz_count); bucket->ub_cnt = zone->uz_import(zone->uz_arg, bucket->ub_bucket, max, domain, flags); @@ -3038,13 +2869,45 @@ zone_alloc_bucket(uma_zone_t zone, void *udata, int do static void * zone_alloc_item(uma_zone_t zone, void *udata, int domain, int flags) { + + ZONE_LOCK(zone); + return (zone_alloc_item_locked(zone, udata, domain, flags)); +} + +/* + * Returns with zone unlocked. + */ +static void * +zone_alloc_item_locked(uma_zone_t zone, void *udata, int domain, int flags) +{ void *item; #ifdef INVARIANTS bool skipdbg; #endif - item = NULL; + ZONE_LOCK_ASSERT(zone); + if (zone->uz_max_items > 0 && zone->uz_items >= zone->uz_max_items) { + zone_log_warning(zone); + zone_maxaction(zone); + if (flags & M_NOWAIT) { + ZONE_UNLOCK(zone); + return (NULL); + } + zone->uz_sleeps++; + zone->uz_sleepers++; + while (zone->uz_items >= zone->uz_max_items) + mtx_sleep(zone, zone->uz_lockptr, PVM, "zonelimit", 0); + zone->uz_sleepers--; + if (zone->uz_sleepers > 0 && + zone->uz_items + 1 < zone->uz_max_items) + wakeup_one(zone); + } + + zone->uz_items++; + zone->uz_allocs++; + ZONE_UNLOCK(zone); + if (domain != UMA_ANYDOMAIN) { /* avoid allocs targeting empty domains */ if (VM_DOMAIN_EMPTY(domain)) @@ -3052,7 +2915,6 @@ zone_alloc_item(uma_zone_t zone, void *udata, int doma } if (zone->uz_import(zone->uz_arg, &item, 1, domain, flags) != 1) goto fail; - atomic_add_long(&zone->uz_allocs, 1); #ifdef INVARIANTS skipdbg = uma_dbg_zskip(zone, item); @@ -3065,7 +2927,7 @@ zone_alloc_item(uma_zone_t zone, void *udata, int doma */ if (zone->uz_init != NULL) { if (zone->uz_init(item, zone->uz_size, flags) != 0) { - zone_free_item(zone, item, udata, SKIP_FINI); + zone_free_item(zone, item, udata, SKIP_FINI | SKIP_CNT); goto fail; } } @@ -3075,7 +2937,7 @@ zone_alloc_item(uma_zone_t zone, void *udata, int doma zone->uz_dtor != trash_dtor) && #endif zone->uz_ctor(item, zone->uz_size, udata, flags) != 0) { - zone_free_item(zone, item, udata, SKIP_DTOR); + zone_free_item(zone, item, udata, SKIP_DTOR | SKIP_CNT); goto fail; } #ifdef INVARIANTS @@ -3091,9 +2953,13 @@ zone_alloc_item(uma_zone_t zone, void *udata, int doma return (item); fail: + ZONE_LOCK(zone); + zone->uz_items--; + zone->uz_allocs--; + ZONE_UNLOCK(zone); + atomic_add_long(&zone->uz_fails, 1); CTR2(KTR_UMA, "zone_alloc_item failed from %s(%p)", zone->uz_name, zone); - atomic_add_long(&zone->uz_fails, 1); return (NULL); } @@ -3104,7 +2970,8 @@ uma_zfree_arg(uma_zone_t zone, void *item, void *udata uma_cache_t cache; uma_bucket_t bucket; uma_zone_domain_t zdom; - int cpu, domain, lockfail; + int cpu, domain; + bool lockfail; #ifdef INVARIANTS bool skipdbg; #endif @@ -3150,7 +3017,7 @@ uma_zfree_arg(uma_zone_t zone, void *item, void *udata * The race here is acceptable. If we miss it we'll just have to wait * a little longer for the limits to be reset. */ - if (zone->uz_flags & UMA_ZFLAG_FULL) + if (zone->uz_sleepers > 0) goto zfree_item; /* @@ -3200,11 +3067,11 @@ zfree_start: if (zone->uz_count == 0 || bucketdisable) goto zfree_item; - lockfail = 0; + lockfail = false; if (ZONE_TRYLOCK(zone) == 0) { /* Record contention to size the buckets. */ ZONE_LOCK(zone); - lockfail = 1; + lockfail = true; } critical_enter(); cpu = curcpu; @@ -3233,9 +3100,9 @@ zfree_start: "uma_zfree: zone %s(%p) putting bucket %p on free list", zone->uz_name, zone, bucket); /* ub_cnt is pointing to the last free item */ - KASSERT(bucket->ub_cnt != 0, - ("uma_zfree: Attempting to insert an empty bucket onto the full list.\n")); - if ((zone->uz_flags & UMA_ZONE_NOBUCKETCACHE) != 0) { + KASSERT(bucket->ub_cnt == bucket->ub_entries, + ("uma_zfree: Attempting to insert not full bucket onto the full list.\n")); + if (zone->uz_bkt_count >= zone->uz_bkt_max) { ZONE_UNLOCK(zone); bucket_drain(zone, bucket); bucket_free(zone, bucket, udata); @@ -3248,7 +3115,7 @@ zfree_start: * We bump the uz count when the cache size is insufficient to * handle the working set. */ - if (lockfail && zone->uz_count < BUCKET_MAX) + if (lockfail && zone->uz_count < zone->uz_count_max) zone->uz_count++; ZONE_UNLOCK(zone); @@ -3279,8 +3146,6 @@ zfree_start: */ zfree_item: zone_free_item(zone, item, udata, SKIP_DTOR); - - return; } void @@ -3303,12 +3168,15 @@ uma_zfree_domain(uma_zone_t zone, void *item, void *ud } static void -slab_free_item(uma_keg_t keg, uma_slab_t slab, void *item) +slab_free_item(uma_zone_t zone, uma_slab_t slab, void *item) { + uma_keg_t keg; uma_domain_t dom; uint8_t freei; - mtx_assert(&keg->uk_lock, MA_OWNED); + keg = zone->uz_keg; + MPASS(zone->uz_lockptr == &keg->uk_lock); + KEG_LOCK_ASSERT(keg); MPASS(keg == slab->us_keg); dom = &keg->uk_domain[slab->us_domain]; @@ -3338,11 +3206,9 @@ zone_release(uma_zone_t zone, void **bucket, int cnt) uma_slab_t slab; uma_keg_t keg; uint8_t *mem; - int clearfull; int i; - clearfull = 0; - keg = zone_first_keg(zone); + keg = zone->uz_keg; KEG_LOCK(keg); for (i = 0; i < cnt; i++) { item = bucket[i]; @@ -3356,37 +3222,11 @@ zone_release(uma_zone_t zone, void **bucket, int cnt) } } else { slab = vtoslab((vm_offset_t)item); - if (slab->us_keg != keg) { - KEG_UNLOCK(keg); - keg = slab->us_keg; - KEG_LOCK(keg); - } + MPASS(slab->us_keg == keg); } - slab_free_item(keg, slab, item); - if (keg->uk_flags & UMA_ZFLAG_FULL) { - if (keg->uk_pages < keg->uk_maxpages) { - keg->uk_flags &= ~UMA_ZFLAG_FULL; - clearfull = 1; - } - - /* - * We can handle one more allocation. Since we're - * clearing ZFLAG_FULL, wake up all procs blocked - * on pages. This should be uncommon, so keeping this - * simple for now (rather than adding count of blocked - * threads etc). - */ - wakeup(keg); - } + slab_free_item(zone, slab, item); } KEG_UNLOCK(keg); - if (clearfull) { - ZONE_LOCK(zone); - zone->uz_flags &= ~UMA_ZFLAG_FULL; - wakeup(zone); - ZONE_UNLOCK(zone); - } - } /* @@ -3423,42 +3263,69 @@ zone_free_item(uma_zone_t zone, void *item, void *udat if (skip < SKIP_FINI && zone->uz_fini) zone->uz_fini(item, zone->uz_size); - atomic_add_long(&zone->uz_frees, 1); zone->uz_release(zone->uz_arg, &item, 1); + + if (skip & SKIP_CNT) + return; + + ZONE_LOCK(zone); + zone->uz_frees++; + zone->uz_items--; + if (zone->uz_sleepers > 0 && zone->uz_items < zone->uz_max_items) + wakeup_one(zone); + ZONE_UNLOCK(zone); } /* See uma.h */ int uma_zone_set_max(uma_zone_t zone, int nitems) { - uma_keg_t keg; + struct uma_bucket_zone *ubz; - keg = zone_first_keg(zone); - if (keg == NULL) - return (0); - KEG_LOCK(keg); - keg->uk_maxpages = (nitems / keg->uk_ipers) * keg->uk_ppera; - if (keg->uk_maxpages * keg->uk_ipers < nitems) - keg->uk_maxpages += keg->uk_ppera; - nitems = (keg->uk_maxpages / keg->uk_ppera) * keg->uk_ipers; - KEG_UNLOCK(keg); + /* + * If limit is very low we may need to limit how + * much items are allowed in CPU caches. + */ + ubz = &bucket_zones[0]; + for (; ubz->ubz_entries != 0; ubz++) + if (ubz->ubz_entries * 2 * mp_ncpus > nitems) + break; + if (ubz == &bucket_zones[0]) + nitems = ubz->ubz_entries * 2 * mp_ncpus; + else + ubz--; + ZONE_LOCK(zone); + zone->uz_count_max = zone->uz_count = ubz->ubz_entries; + if (zone->uz_count_min > zone->uz_count_max) + zone->uz_count_min = zone->uz_count_max; + zone->uz_max_items = nitems; + ZONE_UNLOCK(zone); + return (nitems); } /* See uma.h */ int +uma_zone_set_maxcache(uma_zone_t zone, int nitems) +{ + + ZONE_LOCK(zone); + zone->uz_bkt_max = nitems; + ZONE_UNLOCK(zone); + + return (nitems); +} + *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***