Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Aug 2016 20:39:32 +0200
From:      Ben RUBSON <ben.rubson@gmail.com>
To:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: Unstable local network throughput
Message-ID:  <D1BEE28D-BFF0-486A-BFA9-095CD39267B8@gmail.com>
In-Reply-To: <CAJ-VmongwvbY3QqKBV%2BFJCHOfSdr-=v9CmLH1z=Tqwz19AtUpg@mail.gmail.com>
References:  <3C0D892F-2BE8-4650-B9FC-93C8EE0443E1@gmail.com> <bed13ae3-0b8f-b1af-7418-7bf1b9fc74bc@selasky.org> <3B164B7B-CBFB-4518-B57D-A96EABB71647@gmail.com> <5D6DF8EA-D9AA-4617-8561-2D7E22A738C3@gmail.com> <BD0B68D1-CDCD-4E09-AF22-34318B6CEAA7@gmail.com> <CAJ-VmomW0Wth-uQU-OPTfRAsXW1kTDy-VyO2w-pgNosb-N1o=Q@mail.gmail.com> <B4D77A84-8F02-43E7-AD65-5B92423FC344@gmail.com> <CAJ-Vmo=Mfcvd41gtrt8GJfEtP-DQFfXt7pZ8eRLQzu73M=sX4A@mail.gmail.com> <7DD30CE7-32E6-4D26-91D4-C1D4F2319655@gmail.com> <CAJ-VmongwvbY3QqKBV%2BFJCHOfSdr-=v9CmLH1z=Tqwz19AtUpg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

> On 11 Aug 2016, at 18:36, Adrian Chadd <adrian.chadd@gmail.com> wrote:
>=20
> Hi!
>=20
> mlx4_core0: <mlx4_core> mem
> 0xfbe00000-0xfbefffff,0xfb000000-0xfb7fffff irq 64 at device 0.0
> numa-domain 1 on pci16
> mlx4_core: Initializing mlx4_core: Mellanox ConnectX VPI driver v2.1.6
> (Aug 11 2016)
>=20
> so the NIC is in numa-domain 1. Try pinning the worker threads to
> numa-domain 1 when you run the test:
>=20
> numactl -l first-touch-rr -m 1 -c 1 ./test-program
>=20
> You can also try pinning the NIC threads to numa-domain 1 versus 0 (so
> the second set of CPUs, not the first set.)
>=20
> vmstat -ia | grep mlx (get the list of interrupt thread ids)
> then for each:
>=20
> cpuset -d 1 -x <irq id>
>=20
> Run pcm-memory.x each time so we can see the before and after effects
> on local versus remote memory access.
>=20
> Thanks!

Waiting for the correct commands to use, I made some tests with :

  cpuset -l 0-11 <iperf_command>
or
  cpuset -l 12-23 <iperf_command>

and :

  c=3D0
  vmstat -ia | grep mlx | sed 's/^irq\(.*\):.*/\1/' | while read i
  do
    cpuset -l $c -x $i ; ((c++)) ; [[ $c -gt 11 ]] && c=3D0
  done
or=20
  c=3D12
  vmstat -ia | grep mlx | sed 's/^irq\(.*\):.*/\1/' | while read i
  do
    cpuset -l $c -x $i ; ((c++)) ; [[ $c -gt 23 ]] && c=3D12
  done

Results :

No pinning
http://pastebin.com/raw/CrK1CQpm

Pinning workers to 0-11
Pinning NIC IRQ to 0-11
http://pastebin.com/raw/kLEQ6TKL

Pinning workers to 12-23
Pinning NIC IRQ to 12-23
http://pastebin.com/raw/qGxw9KL2

Pinning workers to 12-23
Pinning NIC IRQ to 0-11
http://pastebin.com/raw/tFjii629

Comments :

Strangely, the best iPer throughput results are when there is no =
pinning.
Whereas before running kernel with your new options, the best results =
were with everything pinned to 0-11.

Feel free to ask me further testing.

Ben




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D1BEE28D-BFF0-486A-BFA9-095CD39267B8>