From owner-freebsd-amd64@freebsd.org Wed Sep 25 17:03:02 2019 Return-Path: Delivered-To: freebsd-amd64@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7B43F12D0D9; Wed, 25 Sep 2019 17:03:02 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-io1-xd42.google.com (mail-io1-xd42.google.com [IPv6:2607:f8b0:4864:20::d42]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46dks15hZGz49g6; Wed, 25 Sep 2019 17:03:01 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-io1-xd42.google.com with SMTP id u8so655589iom.5; Wed, 25 Sep 2019 10:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=/BkNTpm4qrfGlFg6uC/F5bwBq53Emn6fZEETXEOqqo8=; b=Nn3RvPcBgfUx4U1YlBZJan8Cd6+/zcarv9BCBZSbusnJjuv6DLgVufbm34X56J4edT tAjlYj8Gqlkvh899+KPfrUukuFKQ8FMMxIQLB3aWMo7Rm8iE/0yMF/FnNPC/ktNqbVkI nnRGiqSDGl82oTbJsAnS4FWrrzCDJnz+SVk5aqHt3A5Ex4coiCMbk5PeXhe6MPGEOwJ3 boBwCWlLtr1d0WkUzEkMIrDTFZPDD4XmLl3F1OY3Ru/cssTzfnzNV9BXCoJ3rMxoOLkp fexl6bi1PZ1DkC0vYosySdVRkImjLxsDz/lpdXhf2jse0NC3+fh2WSil18epBV/rNVGd 2acw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=/BkNTpm4qrfGlFg6uC/F5bwBq53Emn6fZEETXEOqqo8=; b=Au1uk5pzHcWaPdvsZA5cb9m3BcT5IeUz5tyyTxB4Fp+g4byf/ar6XjjA0sGwA6Tmnr /SocDH+dejCM2HIg1/hkgmOupLeez/vllb9uOWk5KK7CdFQpYCO9JW3oqKqAoTgZkQzL ycf/FAext0ej5NBX3P2tQNSRe5hv3jkZWw/Fk+btTEEnCUHmyqcQ8UEeazvw7/IZzXDY nZSOQ6wZPXe3NMNHRnmxbBM3R5H9gLjfestheoPjlk+2lmCDCnSh4ZhiFwEMTZ6mYp86 WENbtBU/FWzwusbitlmB9XhrKTUTs1WvlbZymYa++htg5DmdqOKROsfPCM/n7r5wt2mm z4QQ== X-Gm-Message-State: APjAAAUtmkFn7Py6d6NBvGVUWNkKWk8W+n7vWtrW/H9ijfCmfy6YYjVm oUJ5Uf8MKxdkY6tz60406ss= X-Google-Smtp-Source: APXvYqxVKEEPSN/Jdi3TlK8ghtBQqqlewNBT2yaPFRt9fHfuagoGgoRhR8TwNAVqhvz348HGaXNF+w== X-Received: by 2002:a92:d146:: with SMTP id t6mr1168664ilg.187.1569430980501; Wed, 25 Sep 2019 10:03:00 -0700 (PDT) Received: from raichu (toroon0560w-lp140-01-69-159-39-167.dsl.bell.ca. [69.159.39.167]) by smtp.gmail.com with ESMTPSA id h70sm170197iof.48.2019.09.25.10.02.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Sep 2019 10:02:59 -0700 (PDT) Sender: Mark Johnston Date: Wed, 25 Sep 2019 13:02:55 -0400 From: Mark Johnston To: Mark Millard Cc: freebsd-amd64@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: head -r352341 example context on ThreadRipper 1950X: cpuset -n prefer:1 with -l 0-15 vs. -l 16-31 odd performance? Message-ID: <20190925170255.GA43643@raichu> References: <704D4CE4-865E-4C3C-A64E-9562F4D9FC4E@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <704D4CE4-865E-4C3C-A64E-9562F4D9FC4E@yahoo.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-Rspamd-Queue-Id: 46dks15hZGz49g6 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=Nn3RvPcB; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::d42 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-1.22 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; NEURAL_HAM_LONG(-1.00)[-0.999,0]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[freebsd.org]; TO_DN_SOME(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_IN_DNSWL_NONE(0.00)[2.4.d.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; NEURAL_HAM_MEDIUM(-0.98)[-0.980,0]; IP_SCORE(-0.54)[ip: (2.12), ipnet: 2607:f8b0::/32(-2.61), asn: 15169(-2.18), country: US(-0.05)]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; FREEMAIL_TO(0.00)[yahoo.com]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_NOT_FQDN(0.50)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com] X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Sep 2019 17:03:02 -0000 On Mon, Sep 23, 2019 at 01:28:15PM -0700, Mark Millard via freebsd-amd64 wrote: > Note: I have access to only one FreeBSD amd64 context, and > it is also my only access to a NUMA context: 2 memory > domains. A Threadripper 1950X context. Also: I have only > a head FreeBSD context on any architecture, not 12.x or > before. So I have limited compare/contrast material. > > I present the below basically to ask if the NUMA handling > has been validated, or if it is going to be, at least for > contexts that might apply to ThreadRipper 1950X and > analogous contexts. My results suggest they are not (or > libc++'s now times get messed up such that it looks like > NUMA mishandling since this is based on odd benchmark > results that involve mean time for laps, using a median > of such across multiple trials). > > I ran a benchmark on both Fedora 30 and FreeBSD 13 on this > 1950X got got expected results on Fedora but odd ones on > FreeBSD. The benchmark is a variation on the old HINT > benchmark, spanning the old multi-threading variation. I > later tried Fedora because the FreeBSD results looked odd. > The other architectures I tried FreeBSD benchmarking with > did not look odd like this. (powerpc64 on a old PowerMac 2 > socket with 2 cores per socket, aarch64 Cortex-A57 Overdrive > 1000, CortextA53 Pine64+ 2GB, armv7 Cortex-A7 Orange Pi+ 2nd > Ed. For these I used 4 threads, not more.) > > I tend to write in terms of plots made from the data instead > of the raw benchmark data. > > FreeBSD testing based on: > cpuset -l0-15 -n prefer:1 > cpuset -l16-31 -n prefer:1 > > Fedora 30 testing based on: > numactl --preferred 1 --cpunodebind 0 > numactl --preferred 1 --cpunodebind 1 > > While I have more results, I reference primarily DSIZE > and ISIZE being unsigned long long and also both being > unsigned long as examples. Variations in results are not > from the type differences for any LP64 architectures. > (But they give an idea of benchmark variability in the > test context.) > > The Fedora results solidly show the bandwidth limitation > of using one memory controller. They also show the latency > consequences for the remote memory domain case vs. the > local memory domain case. There is not a lot of > variability between the examples of the 2 type-pairs used > for Fedora. > > Not true for FreeBSD on the 1950X: > > A) The latency-constrained part of the graph looks to > normally be using the local memory domain when > -l0-15 is in use for 8 threads. > > B) Both the -l0-15 and the -l16-31 parts of the > graph for 8 threads that should be bandwidth > limited show mostly examples that would have to > involve both memory controllers for the bandwidth > to get the results shown as far as I can tell. > There is also wide variability ranging between the > expected 1 controller result and, say, what a 2 > controller round-robin would be expected produce. > > C) Even the single threaded result shows a higher > result for larger total bytes for the kernel > vectors. Fedora does not. > > I think that (B) is the most solid evidence for > something being odd. The implication seems to be that your benchmark program is using pages from both domains despite a policy which preferentially allocates pages from domain 1, so you would first want to determine if this is actually what's happening. As far as I know we currently don't have a good way of characterizing per-domain memory usage within a process. If your benchmark uses a large fraction of the system's memory, you could use the vm.phys_free sysctl to get a sense of how much memory from each domain is free. Another possibility is to use DTrace to trace the requested domain in vm_page_alloc_domain_after(). For example, the following DTrace one-liner counts the number of pages allocated per domain by ls(1): # dtrace -n 'fbt::vm_page_alloc_domain_after:entry /progenyof($target)/{@[args[2]] = count();}' -c "cpuset -n rr ls" ... 0 71 1 72 # dtrace -n 'fbt::vm_page_alloc_domain_after:entry /progenyof($target)/{@[args[2]] = count();}' -c "cpuset -n prefer:1 ls" ... 1 143 # dtrace -n 'fbt::vm_page_alloc_domain_after:entry /progenyof($target)/{@[args[2]] = count();}' -c "cpuset -n prefer:0 ls" ... 0 143 This approach might not work for various reasons depending on how exactly your benchmark program works.