FreeBSD Mail Archives

Date:      Tue, 2 Jan 2018 23:07:01 +0000
From:      Charlie Smurthwaite <charlie@atech.media>
To:        Vincenzo Maffione <v.maffione@gmail.com>
Cc:        "freebsd-net@freebsd.org" <net@freebsd.org>
Subject:   Re: Linux netmap memory allocation
Message-ID:  <f3f94485-2f71-26d0-5a81-10e3166d3538@atech.media>
In-Reply-To: <CA%2B_eA9hs-GUCRH%2B5FAs1SPyR8S8GFndq_ScgDAmJ8njgOsQBCQ@mail.gmail.com>
References:  <7b85fc73-9cc8-0a60-5264-d26f47af5eae@atech.media> <CA%2B_eA9hthoig%2B_UZQNZhM-aBndM44f0wz-NKqWUoYpBA8Ss0jQ@mail.gmail.com> <6c5de1ed-0545-31b3-d0e2-4258fa4ccf1c@atech.media> <CA%2B_eA9hxQuej8L3SdY%2BhgpnDH3tccgsqOBtw1S=RkvURxu=Ktg@mail.gmail.com> <da1e5904-30c8-b06b-6e7f-0bf26fc99a17@atech.media> <CA%2B_eA9hs-GUCRH%2B5FAs1SPyR8S8GFndq_ScgDAmJ8njgOsQBCQ@mail.gmail.com>

Hi Vincenzo,

I am using poll(), and I am not specifying NETMAP_NO_TX_POLL, and have foun=
d that sometimes frames and sent only when the TX buffer is full, and somet=
imes they are not sent at all. They are never sent as expected on every inv=
ocation of poll(). If I run ioctl(NIOCTXSYNC) manually, everything works co=
rrectly. I assume I have simply missed something from my nmreq.

I don't think you have missed anything within nmreq.  I see that you are wa=
iting for POLLIN only (and this is right in your router case), so poll() wi=
ll actually invoke txsync on interface #i only when netmap intercepts an RX=
 or TX interrupt on interface #i. This means that packets may stall for lon=
g time in the TX rings if you don't call ioctl(TXSYNC). The manual is not w=
rong, however. You can look at the apps/bridge/bridge.c example to understa=
nd where this "poll automatically calls txsync" thing is useful.
Thank you for the clarification. I have now altered my code to call TXSYNC =
after each iteration, but only if I have modified the TX ring for that inte=
rface. This seems to work perfectly. The patch can be seen at https://githu=
b.com/catphish/netmap-router/commit/2961ab16f14a8b2a2561c9d73f73857e523cc17=
7



You also mentioned: "whether netmap calls or does not call txsync/rxsync on=
 certain rings depends on the parameters passed to nm_open()". I do not use=
 the nm_open helper method, but I am extremely interested to know what para=
meters would affect this bahaviour, as this would seem very relevant to my =
problem.

Yes, we do not normally use the low level interface (ioctl(REGIF)), because=
 it's just simpler to use the nm_open() interface. Within the first paramet=
er of nm_open() you can specify to open just one RX/TX rings couple, e.g. w=
ith "enp1f0s1-3". Then you usually want to mmap() just once (as you do in y=
our program); with nm_open(), you do that with the NM_OPEN_NO_MMAP flag.
I did look at nm_open, and even read the source of nm_open to discover how =
to implement the shared memory, but (for no good reason) I preferred to set=
 up the interface manually.

If you are interested or if it helps explain my question, my complete code =
(hopefully well commented but far from complete) can be found here: https:/=
/github.com/catphish/netmap-router/blob/58a9b957c19b0a012088c491bd58bc3161a=
56ff1/router.c

Specifically, if the ioctl call at line 92 is removed, the code does not wo=
rk (packets are not transmitted, or are only transmitted when the buffer is=
 full, which of these 2 behaviours seems to be random), however I would exp=
ect it to work because I do not specify NETMAP_NO_TX_POLL, and I would ther=
efore hope that the poll() call on line 80 would have the same effect.

Yes, that depends on when netmap_poll() is called by the kernel, that depen=
ds on when something is ready for receive on the file descriptor.
Looking at your program, I think you need to call ioctl(TXSYNC), at least b=
ecause you don't want to introduce artificial/unbounded latency. However, s=
ince these calls are expensive, you could use them only when necessary (e.g=
. when you nm_ring_space(txring) =3D=3D 0 or when you actually forwarded so=
me packets on txring.
Per the patch above I now call TXSYNC on an interface only after pushing a =
batch of packets to it and this seems to work perfectly, at least with a go=
od balance between performance and latency. If nm_ring_space(txring) =3D=3D=
 0 I just drop frames until the next batch. I don't TXSYNC part way through=
 a batch, it hasn't yet seemed necessary, but I may need to look into this =
later.

I'm running this on a 6-core 2.8GHz Xeon with a 4-port i350-T4 NIC. I thoug=
ht I'd just post some stats of the performance I observe using my code (exc=
luding the routing table lookup as this isn't relevant to netmap). Not real=
ly looking for any advice here, just thought I'd share my results.

All examples are with 1.488Mpps (1 x 1Gbps) input and no packet loss observ=
ed:
1 thread - CPU usage =3D 100%, batch size =3D 4
2 thread - CPU usage =3D 54% (27% x 2), batch size =3D 12
4 thread - CPU usage =3D 98% (25% x 4), batch size =3D 8
6 thread - CPU usage =3D 124% (21% x 6), batch size =3D 8

And again with 2.976Mpps (2 x 1Gbps) input and no packet loss observed:
1 thread - CPU usage =3D 100%, batch size =3D 12
2 thread - CPU usage =3D 68% (34% x 2), batch size =3D 21
4 thread - CPU usage =3D 100% (25% x 4), batch size =3D 17
6 thread - CPU usage =3D 105% (18% x 6), batch size =3D 16

These results seem excellent and demonstrate that netmap is scaling as expe=
cted with both threads and packet volume. The higher thread count will be m=
ore beneficial when I am doing more processing on each packet.


I hope this all makes sense, and again, I hope I have simply missed somethi=
ng from the nmreq i pass to NIOCREGIF.

It is worth mentioning that with the exception of this problem / confusion,=
 I am getting extremely good results from this code and netmap in general.

That's nice to hear :)
Your program looks simple enough that we could even add it to the examples =
(as an example of routing logic).
I'd be very happy to contribute to the documentation in any way that may be=
 helpful. I have added a permissive licence to my Github repository just in=
 case my code of of use to anyone else. It is currently somewhat incomplete=
 as an IPv4 router as it doesn't update MAC addresses on frames before forw=
arding them, and because the interface names are hardcoded, but when it's m=
ore complete I'd be very happy for it to be contributed to the examples. Of=
 course anyone is free to use my code for any purpose too.

Thanks for all your assistance! I'm happy enough with this that I will move=
 on to looking at my IP routing code.

Charlie



Charlie Smurthwaite
Technical Director

tel. email. charlie@atech.media<mailto:charlie@atech.media> web. https://at=
ech.media

This e-mail has been sent by aTech Media Limited (or one of its assoicated =
group companys, Dial 9 Communications Limited or Viaduct Hosting Limited). =
Its contents are confidential therefore if you have received this message i=
n error, we would appreciate it if you could let us know and delete the mes=
sage. aTech Media Limited is a UK limited company, registration number 5523=
199. Dial 9 Communications Limited is a UK limited company, registration nu=
mber 7740921. Viaduct Hosting Limited is a UK limited company, registration=
 number 8514362. All companies are registered at Unit 9 Winchester Place, N=
orth Street, Poole, Dorset, BH15 1NX.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f3f94485-2f71-26d0-5a81-10e3166d3538>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation