From owner-svn-src-head@freebsd.org Mon Sep 11 07:30:12 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A2A2BE20D2C; Mon, 11 Sep 2017 07:30:12 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com [IPv6:2607:f8b0:400d:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5A5EA74878; Mon, 11 Sep 2017 07:30:12 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-qk0-x235.google.com with SMTP id o129so16878852qkd.0; Mon, 11 Sep 2017 00:30:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=z40iQcK1NPHxWlF7Hc+DnTezziUcNTiDUQac3eIJM/k=; b=PjGEXDyhbcXoN4q2R2bbiiiqHgIBzAWKZSCaUGdV0/24qFZrR2YXrs+N7GpF6pEnTW 3N8HKHdyNyYk9A2JH/hjYIrntr8AX0zXEB/70fwW2JxaI1afvPWnTJ4YoWQAs1UBCnMS 1kyPQQVfeGoTzOwdah5IKb1taT41aIV9gfJ9L1GKqnVPXGyePTMV7ZOnEsiJGWi/goOV xGixKkzsoVg/eSr8cOjGLP4w9OM+UNz7iPVFMivwfyCFFzkL8obzMcYOyzvpD6JANOHB aUyra8CsNPI9qlyV5kBKQQgQ5YEGXQKnoAiGe9n6R6SvEPk77ybyRoafXn3i4yLp6I2n yXZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=z40iQcK1NPHxWlF7Hc+DnTezziUcNTiDUQac3eIJM/k=; b=ULoznsd/y2M0ajMaujjuNOs4F0JMliJyeatANCSeFKi5JgFsMZfIV7gM3VNOLcfmgN XE28rUAy78fICHM85bNKQ8dhzltacKt09LMHWbwmekqb2kKatJAE9DuNkztpa7MuyKQH v5RlN5T+wjOCoFWu3J3YGljzBmmvOztHmKY2FwOfPJYz3u0IdRoO7uYQ8LHCj29BwHyJ dZiqen4KvHbAhOhm6W2+YkFjne5bsWVob4cyenAniSn6g/mRTqGEpT/k7uFCIVaqutNY W64lQ/9g6d019tgLDloFBoWs1S49XB6sDrn2IVtOcUHLK+UfR2Q2Zxxk6iOIAhTw2YsI mAGw== X-Gm-Message-State: AHPjjUjKljtkgjxs9XFoGOud11v8au3aaVe26ithvWML/4i/y08uMFGn 4jhX9M/EzZ4wxgeSbQ1G6KYTirF38Q== X-Google-Smtp-Source: AOwi7QCQTgXpuyz6kN5zFWYZUVMm55Dw5lChm5lVAsADlpdChCoaRoAWEngHxQAALnxhffMP5NjUy8ufbKyP/wOvap8= X-Received: by 10.55.151.135 with SMTP id z129mr14730772qkd.119.1505115011399; Mon, 11 Sep 2017 00:30:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.63.54 with HTTP; Mon, 11 Sep 2017 00:30:10 -0700 (PDT) In-Reply-To: <20170911111127.B870@besplex.bde.org> References: <201709101900.v8AJ0c2N059845@repo.freebsd.org> <20170911111127.B870@besplex.bde.org> From: Mateusz Guzik Date: Mon, 11 Sep 2017 09:30:10 +0200 Message-ID: Subject: Re: svn commit: r323393 - in head/sys: sys vm To: Bruce Evans Cc: Mateusz Guzik , "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 07:30:12 -0000 On Mon, Sep 11, 2017 at 11:56:21AM +1000, Bruce Evans wrote: > On Sun, 10 Sep 2017, Mateusz Guzik wrote: > > > Log: > > Move vmmeter atomic counters into dedicated cache lines > > > > Prior to the change they were subject to extreme false sharing. > > In particular this change shaves about 3 seconds real time of -j 80 buildkernel. > > Changes that small are hard to measure. Kernel builds take about 3 seconds > real time altogether (cheating slightly -- this is for a FreeBSD-4 kernel > where the sources are held constant to provide a benchmark and a bloatometer > for newer kernels and userlands which take 10-20 times longer). > Well, for me building 11.0 on tmpfs with 11.0 world reliably takes 1:25 (and now 1:22). > > Modified: head/sys/sys/vmmeter.h > > ============================================================================== > > --- head/sys/sys/vmmeter.h Sun Sep 10 18:08:25 2017 (r323392) > > +++ head/sys/sys/vmmeter.h Sun Sep 10 19:00:38 2017 (r323393) > > @@ -60,6 +60,12 @@ struct vmtotal { > > #if defined(_KERNEL) || defined(_WANT_VMMETER) > > #include > > > > +#ifdef _KERNEL > > +#define VMMETER_ALIGNED __aligned(CACHE_LINE_SIZE) > > +#else > > +#define VMMETER_ALIGNED > > +#endif > > + > > Using this breaks the "ABI" which is used by at least vmstat(1). > > This has some style bugs: > - verbose macro name. Using this unimproves the formatting later. > - space instead of tab after #define. This vfile is vaguely KNF-conformant > and had only 1 instance of this style bug before (for the idempotency > ifndef, which has 3 other style bugs: > - missing blank line before its #endif > - tab instead of space before the comment on its #endif > - backwards comment on its #endif). > > > /* > > * System wide statistics counters. > > * Locking: > > @@ -126,14 +132,15 @@ struct vmmeter { > > u_int v_free_target; /* (c) pages desired free */ > > u_int v_free_min; /* (c) pages desired free */ > > u_int v_free_count; /* (f) pages free */ > > - u_int v_wire_count; /* (a) pages wired down */ > > - u_int v_active_count; /* (q) pages active */ > > u_int v_inactive_target; /* (c) pages desired inactive */ > > - u_int v_inactive_count; /* (q) pages inactive */ > > - u_int v_laundry_count; /* (q) pages eligible for laundering */ > > Moving these also breaks the "ABI". > The original patched attempted to preserve it, but I was told it does not matter. Apparently that's indeed true, as evidenced by another change made few months ago which introduce counter type. > > u_int v_pageout_free_min; /* (c) min pages reserved for kernel */ > > u_int v_interrupt_free_min; /* (c) reserved pages for int code */ > > u_int v_free_severe; /* (c) severe page depletion point */ > > + u_int v_wire_count VMMETER_ALIGNED; /* (a) pages wired down */ > > The more complicated declaration unimproves the formatting, especially > since the macro name is so long. See way below. > Is there a technical reason to move to the end? style(9) requires > sorting on alignment, with the most aligned fields first, not last > (it says to sort on size, but means alignment). Is there a reason > to reverse this here. > There is. The annotation only guarantees the address of the field will be aligned. Whatever follows in the struct will start filling the same cacheline, unless annotated as well. i.e. in order to prevent the next field from collapsing into the line, the top of the struct would have to look like this: u_int v_wire_count VMMETER_ALIGNED; /* (a) pages wired down */ u_int v_active_count VMMETER_ALIGNED; /* (a) pages active */ u_int v_inactive_count VMMETER_ALIGNED; /* (a) pages inactive */ u_int v_laundry_count VMMETER_ALIGNED; /* (a) pages eligible for counter_u64_t v_swtch VMMETER_ALIGNED; That's significantly crappier. __exclusive_cache_line annotation of the vm_cnt object itself puts it into a dedicated section where the linker guarantees whatever folllows will not share the line. > > + u_int v_active_count VMMETER_ALIGNED; /* (a) pages active */ > > + u_int v_inactive_count VMMETER_ALIGNED; /* (a) pages inactive */ > > + u_int v_laundry_count VMMETER_ALIGNED; /* (a) pages eligible for > > + laundering */ > > This does more than move the other fields and unimprove their formatting. > It also changes the lock annotations from (q) to (a). > > The formatting is ugliest for the last field. > Agreed. > Naming the macro something like VMMA would things fit better. It is > stupid to abbreviate VIRTUAL_MEMORY to VM but spell out ALIGNED, > especially when the former is global and the latter is local. If > the namespace were documented, then the documentation would say that > vm, v_ and VM* are reserved, perhaps without underscores, so it would > not be namespace pollution to use VMMA. > I'm beyond indifferent on how to name the macro. This name was suggested in the review and I just rolled with it. If you want it changed, please argue with alc and/or markj. That said, looking now at the struct I think its use should be retired from the kernel. It can remain in headers for userspace use. First, there is a bunch of counter(9) fields. I don't know the original reasoning. I would expect these counters to be statically defined in a per-cpu struct. Then we are left with a bunch of read-only fields. They can be grouped together in a __read_mostly annotated object. One field with the free page queue lock is v_free_count. It can be annotated by hand in whatever way makes sense. And then there are the 4 atomics. They can just be regular globals with __exclusive_cache_line. However, I'm not interested in pursuing this. My primary motivation for the change here was to get rid of an obvious & important bottleneck and to stabilize performance over time. Interestingly, as different kernel versions produce different binaries, the vm_cnt object's offset within whatever cacheline it used was changing. This in turn was altering false sharing, making performance fluctuate a little bit. -- Mateusz Guzik