From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 29 15:27:55 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 10684EB8; Mon, 29 Sep 2014 15:27:55 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DE899AA3; Mon, 29 Sep 2014 15:27:54 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 86F91B921; Mon, 29 Sep 2014 11:27:53 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: Change uma_mtx to rwlock Date: Mon, 29 Sep 2014 11:27:16 -0400 Message-ID: <1458140.gGPpU3NGiG@ralph.baldwin.cx> User-Agent: KMail/4.12.5 (FreeBSD/10.1-BETA2; KDE/4.12.5; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 29 Sep 2014 11:27:53 -0400 (EDT) Cc: jeff@freebsd.org, Bryan Venteicher X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 15:27:55 -0000 On Saturday, September 27, 2014 07:59:47 PM Bryan Venteicher wrote: > Hi, > > I'd appreciate some comments attached patch that changes the uma_mtx to a > rwlock. > > At $JOB, we have machines with ~400GB RAM, with much of that being > allocated through UMA zones. We've observed that timeouts were sometimes > unexpectedly delayed by a half second or more. We tracked one of the > reasons for this down to when the page daemon was running, calling > uma_reclaim() -> zone_foreach(). zone_foreach() holds the uma_mtx while > zone_drain()'ing each zone. If uma_timeout() fires, it will block on the > uma_mtx when it tries to zone_timeout() each zone. The only nit I see is in zone_drain_wait(). It would be nice to not need the hack of checking for a read or write lock and just require the one it actually needs depending on the callers. However, checking the code in HEAD, this appears to just be broken. Specifically, zone_drain_wait() is called in two places: void zone_drain(uma_zone_t zone) { zone_drain_wait(zone, M_NOWAIT); } ... static void zone_dtor(void *arg, int size, void *udata) { ... mtx_lock(&uma_mtx); LIST_REMOVE(zone, uz_link); mtx_unlock(&uma_mtx); /* * XXX there are some races here where * the zone can be drained but zone lock * released and then refilled before we * remove it... we dont care for now */ zone_drain_wait(zone, M_WAITOK); ... } Neither one calls it with the uma_mtx locked! This appears to have been broken since that function was introduced in r187681. I think it might be best to first remove the unlock/lock of uma_mtx from zone_drain_wait() (so it can be MFC'd). That then simplifies that one part of your patch (which I think is otherwise fine). -- John Baldwin