From owner-svn-src-all@freebsd.org Sun Dec 29 22:18:44 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7A0081EC918; Sun, 29 Dec 2019 22:18:44 +0000 (UTC) (envelope-from oliver.pntr@gmail.com) Received: from mail-yw1-xc35.google.com (mail-yw1-xc35.google.com [IPv6:2607:f8b0:4864:20::c35]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47mFMS2BFtz4C9f; Sun, 29 Dec 2019 22:18:44 +0000 (UTC) (envelope-from oliver.pntr@gmail.com) Received: by mail-yw1-xc35.google.com with SMTP id t141so13458710ywc.11; Sun, 29 Dec 2019 14:18:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=b/8vejs0xnZxMHQEl4L24IDmb9C7W3RYxSUZzqzSVsc=; b=WYnjyR116LSPFEHdyoqR4cppSFqfNSZgR0hsBEJ67yCCuJcz/g5QDlh0cmIAmvccWk YIjiWxZvFYWfRq/li83D1FHY4WJgVy0pWCDKgl0ejNTbjkhwYFi6Z4kNwIIyFrTFKhy6 kTQapRj0xVhNCkh2ErnTOrvcKIaZYAtqMwibBIWchdQ1tILS8OiW92TYsLIvOzG9buNs KTe/hxEjAcCnaqDIpt+0Vp7G49BTpNPTfk4prfrj23nxF2aBtSPiCEnHz6CCXLmyFbBc bJnc9PG1km/g8MxQPZrd7ct5p3sMt4Hq9vvdSBHI6kSOP+a17QJfo7jgHvzpvpcJyR0x sR3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=b/8vejs0xnZxMHQEl4L24IDmb9C7W3RYxSUZzqzSVsc=; b=R4nPHB2s6zyhH/yTaCP2OpVdcPUtUPQN0a2p5JDJPJAo+LQ+kl5oDb1fRPsMgJ+VcD Ka8d/SlvjvFDnP0Ci2q5iVuNCkuGaXhyKIxrv1IiMll+fRd5VRRi+Tvr+9Y7v8NKgzBz nEJqsBZED1Vc7uYs/ztJajmOfs32GwK4CLBqqaYQQcglZ7GX4jNASuFGz9kQQykExxu6 sHvozjYQo3OYM3YRakjYuJCaSyND92cVfM8sBsWwPLAITbGQcJyFvJHj+ipDI5Qg2nCr Igk/z3QEErXISorRWqCi34bVDqQ6ck1i14rskTSIe2ZP467H44Chw9XnnvhI6KQT9LhS LEVA== X-Gm-Message-State: APjAAAXeueMcUu5nryXS6tMehKYdCYU2frtVmpuxbeWNNwzIQN93SHj5 U4I7zC7yBt9Tw4BVjR+SJCx3fCAafHdSzq/jG8Ub3A== X-Google-Smtp-Source: APXvYqxKcCz4Cze4f2VQnFcLZsnas1EIKF8AGYfg7RzyvZWZfU1Rnkf+ZJUFYJJl1aJ5RFAtbPXRgyRxO4x3fh1tLuc= X-Received: by 2002:a81:f81:: with SMTP id 123mr46504174ywp.268.1577657922971; Sun, 29 Dec 2019 14:18:42 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a25:3816:0:0:0:0:0 with HTTP; Sun, 29 Dec 2019 14:18:42 -0800 (PST) In-Reply-To: <20191229165032.GC30375@raichu> References: <201912281904.xBSJ4T19064948@repo.freebsd.org> <20191229165032.GC30375@raichu> From: Oliver Pinter Date: Sun, 29 Dec 2019 23:18:42 +0100 Message-ID: Subject: Re: svn commit: r356159 - head/sys/vm To: Mark Johnston Cc: "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" X-Rspamd-Queue-Id: 47mFMS2BFtz4C9f X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Dec 2019 22:18:44 -0000 Thanks for the detailed answer Mark! On Sunday, December 29, 2019, Mark Johnston wrote: > On Sun, Dec 29, 2019 at 03:39:55AM +0100, Oliver Pinter wrote: > > Is there any performance measurement from before and after. It would be > > nice to see them. > > I did not do extensive benchmarking. The aim of the patch set was > simply to remove the use of the hashed page lock, since it shows up > prominently in lock profiles of some workloads. The problem is that we > acquire these locks any time a page's LRU state is updated, and the use > of the hash lock means that we get false sharing. The solution is to > implement these state updates using atomic operations on the page > structure itself, making data contention much less likely. Another > option was to embed a mutex into the vm_page structure, but this would > bloat a structure which is already too large. > > A secondary goal was to reduce the number of locks held during page > queue scans. Such scans frequently call pmap_ts_referenced() to collect > info about recent references to the page. This operation can be > expensive since it may require a TLB shootdown, and it can block for a > long time on the pmap lock, for example if the lock holder is copying > the page tables as part of a fork(). Now, the active queue scan body is > executed without any locks held, so a page daemon thread blocked on a > pmap lock no longer has the potential to block other threads by holding > on to a shared page lock. Before, the page daemon could block faulting > threads for a long time, hurting latency. I don't have any benchmarks > that capture this, but it's something that I've observed in production > workloads. > > I used some microbenchmarks to verify that the change did not penalize > the single-threaded case. Here are some results on a 64-core arm64 > system I have been playing with: > https://people.freebsd.org/~markj/arm64_page_lock/ > > The benchmark from will-it-scale simply maps 128MB of anonymous memory, > faults on each page, and unmaps it, in a loop. In the fault handler we > allocate a page and insert it into the active queue, and the unmap > operation removes all of those pages from the queue. I collected the > throughput for 1, 2, 4, 8, 16 and 32 concurrent processes. > > With my patches we see some modest gains at low concurrency. At higher > levels of concurrency we actually get lower throughput than before as > contention moves from the page locks and the page queue lock to just the > page queue lock. I don't believe this is a real regression: first, the > benchmark is quite extreme relative to any useful workload, and second, > arm64 suffers from using a much smaller batch size than amd64 for > batched page queue operations. Changing that pushes the results out > somewhat. Some earlier testing on a 2-socket Xeon system showed a > similar pattern with smaller differences. >