Date: Thu, 11 Aug 2016 20:39:32 +0200 From: Ben RUBSON <ben.rubson@gmail.com> To: FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: Unstable local network throughput Message-ID: <D1BEE28D-BFF0-486A-BFA9-095CD39267B8@gmail.com> In-Reply-To: <CAJ-VmongwvbY3QqKBV%2BFJCHOfSdr-=v9CmLH1z=Tqwz19AtUpg@mail.gmail.com> References: <3C0D892F-2BE8-4650-B9FC-93C8EE0443E1@gmail.com> <bed13ae3-0b8f-b1af-7418-7bf1b9fc74bc@selasky.org> <3B164B7B-CBFB-4518-B57D-A96EABB71647@gmail.com> <5D6DF8EA-D9AA-4617-8561-2D7E22A738C3@gmail.com> <BD0B68D1-CDCD-4E09-AF22-34318B6CEAA7@gmail.com> <CAJ-VmomW0Wth-uQU-OPTfRAsXW1kTDy-VyO2w-pgNosb-N1o=Q@mail.gmail.com> <B4D77A84-8F02-43E7-AD65-5B92423FC344@gmail.com> <CAJ-Vmo=Mfcvd41gtrt8GJfEtP-DQFfXt7pZ8eRLQzu73M=sX4A@mail.gmail.com> <7DD30CE7-32E6-4D26-91D4-C1D4F2319655@gmail.com> <CAJ-VmongwvbY3QqKBV%2BFJCHOfSdr-=v9CmLH1z=Tqwz19AtUpg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 11 Aug 2016, at 18:36, Adrian Chadd <adrian.chadd@gmail.com> wrote: >=20 > Hi! >=20 > mlx4_core0: <mlx4_core> mem > 0xfbe00000-0xfbefffff,0xfb000000-0xfb7fffff irq 64 at device 0.0 > numa-domain 1 on pci16 > mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6 > (Aug 11 2016) >=20 > so the NIC is in numa-domain 1. Try pinning the worker threads to > numa-domain 1 when you run the test: >=20 > numactl -l first-touch-rr -m 1 -c 1 ./test-program >=20 > You can also try pinning the NIC threads to numa-domain 1 versus 0 (so > the second set of CPUs, not the first set.) >=20 > vmstat -ia | grep mlx (get the list of interrupt thread ids) > then for each: >=20 > cpuset -d 1 -x <irq id> >=20 > Run pcm-memory.x each time so we can see the before and after effects > on local versus remote memory access. >=20 > Thanks! Waiting for the correct commands to use, I made some tests with : cpuset -l 0-11 <iperf_command> or cpuset -l 12-23 <iperf_command> and : c=3D0 vmstat -ia | grep mlx | sed 's/^irq\(.*\):.*/\1/' | while read i do cpuset -l $c -x $i ; ((c++)) ; [[ $c -gt 11 ]] && c=3D0 done or=20 c=3D12 vmstat -ia | grep mlx | sed 's/^irq\(.*\):.*/\1/' | while read i do cpuset -l $c -x $i ; ((c++)) ; [[ $c -gt 23 ]] && c=3D12 done Results : No pinning http://pastebin.com/raw/CrK1CQpm Pinning workers to 0-11 Pinning NIC IRQ to 0-11 http://pastebin.com/raw/kLEQ6TKL Pinning workers to 12-23 Pinning NIC IRQ to 12-23 http://pastebin.com/raw/qGxw9KL2 Pinning workers to 12-23 Pinning NIC IRQ to 0-11 http://pastebin.com/raw/tFjii629 Comments : Strangely, the best iPer throughput results are when there is no = pinning. Whereas before running kernel with your new options, the best results = were with everything pinned to 0-11. Feel free to ask me further testing. Ben
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1BEE28D-BFF0-486A-BFA9-095CD39267B8>