From owner-svn-src-head@freebsd.org Sat Dec 10 21:11:40 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E3A89C71149; Sat, 10 Dec 2016 21:11:40 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x243.google.com (mail-wm0-x243.google.com [IPv6:2a00:1450:400c:c09::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 789381E2E; Sat, 10 Dec 2016 21:11:40 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x243.google.com with SMTP id a20so3044546wme.2; Sat, 10 Dec 2016 13:11:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=V5NAYGeD9vHZ7kGiCrCbfAtGQlpR3pRhuNqqk0/8neg=; b=0sgg3VjJV8/RWiV3A9oTxjUHvBmTHeLCcefH9Eta7tWLks/LMIiUG1QMjzDQp9AHbk j1QRMtFTkRInZJY2s+l/mQwqqcSSTQg1zKol4g9ROWA0UP2ixfWiPvdrn4UA2MDdpYct wmqTe+1T5IhrrS/5uxh6dDwtvKQKUwWKhnfaHQPXKmT/Wbht2WD8VUgxiZ6A2RyGCi4X IBCnW9fC5ArOBmNLL5GAw1XOMFzCSWwOj11KfnqrLynRbSm7D9flmU18VqWMAM+6xr8W Z23AVd/jS/fUBPMkfY7KUXmjd7TTac7/EOCsyFG7f1EBVGjNoseke+QWV+7h9WCigotU JliQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=V5NAYGeD9vHZ7kGiCrCbfAtGQlpR3pRhuNqqk0/8neg=; b=Ha/Z5wHLndo268+PfHQsZZyPfyUHfYZybunYAmyoGdSytcPmyu/El0eD30C8bjyyzU Dq1WzLcNjv4emBSmf+qqlTrCJBWVuKcg43g8wRsQ5+lffCD4ejcGEPCfyJIQwVwSi3I0 0loqYDExHgMHaV4hwisrLadxB9kfq80q9QJ93DcdEn84ew+vTk5R6Ulpa0DRCvAUEIq4 HTeGBGxtWifais38lPbTQhHTLnbIhbsAGTfkPO3uQbSM9qn4algK9553BL25K0U5E3Pk 0sNCHmrVtowlSiIpOUZtQIl/hC18UcRYciTDLTpi0Army3JMED15lo5BS5GmPJHwC5q/ eMRA== X-Gm-Message-State: AKaTC018omPejbP0jLbwv39Doj3y1PEzmNzfxjSUSFSkn1KWXPEonq7Lmnp2fS20QFjMUQ== X-Received: by 10.28.146.201 with SMTP id u192mr12584109wmd.142.1481404298736; Sat, 10 Dec 2016 13:11:38 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id n5sm27465037wmf.0.2016.12.10.13.11.37 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sat, 10 Dec 2016 13:11:38 -0800 (PST) Date: Sat, 10 Dec 2016 22:11:35 +0100 From: Mateusz Guzik To: Gleb Smirnoff Cc: Fabien Thomas , svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r309143 - head/sys/libkern Message-ID: <20161210211135.GB29677@dft-labs.eu> References: <201611251349.uAPDnX09042110@repo.freebsd.org> <20161125160010.GA3307@dft-labs.eu> <20161207194121.GM27748@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20161207194121.GM27748@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Dec 2016 21:11:41 -0000 On Wed, Dec 07, 2016 at 11:41:21AM -0800, Gleb Smirnoff wrote: > On Fri, Nov 25, 2016 at 05:00:10PM +0100, Mateusz Guzik wrote: > M> On Fri, Nov 25, 2016 at 01:49:33PM +0000, Fabien Thomas wrote: > M> > Author: fabient > M> > Date: Fri Nov 25 13:49:33 2016 > M> > New Revision: 309143 > M> > URL: https://svnweb.freebsd.org/changeset/base/309143 > M> > > M> > Log: > M> > In a dual processor system (2*6 cores) during IPSec throughput tests, > M> > we see a lot of contention on the arc4 lock, used to generate the IV > M> > of the ESP output packets. > M> > > M> > The idea of this patch is to split this mutex in order to reduce the > M> > contention on this lock. > M> > > M> > +MALLOC_DEFINE(M_ARC4RANDOM, "arc4random", "arc4random structures"); > M> > > M> > -static u_int8_t arc4_randbyte(void); > M> > +struct arc4_s { > M> > + u_int8_t i, j; > M> > + int numruns; > M> > + u_int8_t sbox[256]; > M> > + time_t t_reseed; > M> > + > M> > + struct mtx mtx; > M> > +}; > M> > + > M> > M> Why is the lock at the end? Right now you got false sharing with the > M> next entry. > M> > M> That said, I suggest: > M> 1. moving thelock to the beginning > M> 2. annotating the struct with __aligned(CACHE_LINE_SIZE) > > It could be even better not to allocate them with regular malloc at all, > but to put them into the per cpu areas. I haven't tried, but looks like > the DPCPU mechanism defined in sys/pcpu.h would fit that well. > Oh. Now that I read your comment I re-read the patch and it looks like indeed the patch allocate the struct for each online cpu, so perhaps it should indeed migrate and automagically get numa friendly at some point. This prompts me to look some more and I think this does not really scale, just got rid of an issue immediately visible with lock profiling. arc4rand contains: [..] if (reseed || atomic_cmpset_int(&arc4rand_iniseed_state, ARC4_ENTR_HAVE, ARC4_ENTR_SEED)) { ARC4_FOREACH(arc4) arc4_randomstir(arc4); } Which on amd64 always dirties the cacheline containing arc4rand_iniseed_state. In my kernel the relevant cacheline is: ffffffff81583f80 b vn_io_fault_prefault ffffffff81583f88 b vn_io_faults_cnt ffffffff81583f90 B arc4rand_iniseed_state ffffffff81583f98 b arc4inst ^^^^^^ the array ffffffff81583fa0 b inet_ntoa.buf ffffffff81583fb0 b static_ltree That is, not only competing cpus do the atomic op, they additionally ping pong the cacheline when computing the address of the target object. At the very least, attempting atomic_cmpset_int only if it the initial condition holds should improve things. The lock array pointer can be __read_mostly-ified after the macro gets sorted out. -- Mateusz Guzik