From owner-freebsd-net@freebsd.org  Wed Jan  3 08:48:02 2018
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1DE6DEB45F2
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Wed,  3 Jan 2018 08:48:02 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id EA0EF6A276
 for <freebsd-net@freebsd.org>; Wed,  3 Jan 2018 08:48:01 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id E960AEB45F1; Wed,  3 Jan 2018 08:48:01 +0000 (UTC)
Delivered-To: net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8E8FEB45F0
 for <net@mailman.ysv.freebsd.org>; Wed,  3 Jan 2018 08:48:01 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: from mail-qt0-x232.google.com (mail-qt0-x232.google.com
 [IPv6:2607:f8b0:400d:c0d::232])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9DD936A275
 for <net@freebsd.org>; Wed,  3 Jan 2018 08:48:01 +0000 (UTC)
 (envelope-from v.maffione@gmail.com)
Received: by mail-qt0-x232.google.com with SMTP id g10so1288758qtj.12
 for <net@freebsd.org>; Wed, 03 Jan 2018 00:48:01 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=jqM5VbpbsnlrVAZ7Bwur45KC8ph7HeaFDL2XsGZNQIE=;
 b=fJdIA808ErUSBtac7Fz/G9QsFCHJ7/UpGtiRdVYy8yS4Z57TXHmtZJTeuslX6YAbJa
 Ca8BK4yThTDDXTdx3wLi5Y4yLoQq6hFISkrIR5xWRk3lcPZL/d90zotsZweRG7/O0Bs/
 uabWo25BFvcBbf0A5BZ+Bk+42oaLXN9P7NTYANuTJyqpFtIfvBnO4El5VV5rjsrdx6rj
 FEBZ9FwLZurt1QayXYMmU8EdPVOHOwZNhKsjZLafON8C8FDuuyshuk7H0QUUeGQIUzrR
 SWjNV1BiOpfH8M4BnLpSncZEenf6NBkHsdCgyh48pZx31Pf+YuwlyFU9PEryIeY2B7vB
 NUkA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=jqM5VbpbsnlrVAZ7Bwur45KC8ph7HeaFDL2XsGZNQIE=;
 b=kBN8usQ/SEFIzc77pisQuxEaLjPHrZaIAGtkPDWFKZkBlZbcD2DI5yER8QMpyZzwqe
 g7JFq5vA/u5CP6Agw5O+2Wj2Q0qQsuJWIATPG0C/2ow6M18wPit9ILE1utYgABjik4bf
 CrcCJZ7Q4XH9PdTdpT0da2ubrKDGyTK74LcWbyaB3VPRddVejiutYWB8isUro3vpjTVS
 u3c5bGzO3PEvrWqsGSrHRAXO0V128O7AKwFd8IDsMJtw0k9ZX0XubW5l48oAevyQ8SXG
 bTVEBhg+Ltkz46f+8QXWSJqlzo2YenoqRv1PGrrrg/Gg1DUnCbIsAIBXspqm/2Ojjg8u
 g5AQ==
X-Gm-Message-State: AKGB3mLY6KLuGA/GjnRwENAC5cSuGbmxGc+NwS2l50kpoRKLowY+gu7b
 YD9Ic3Amj+swc83AIqemIMy35nbXc+hGHvFm0bg=
X-Google-Smtp-Source: ACJfBotM7LUxGtTiq2GFQQzQHRa3T81rdufVxUHuLtnyAB0u0rhNyBi/EBrGLAlwnf4436ABeeyVrqjUWLb8YPAFfEc=
X-Received: by 10.237.55.226 with SMTP id j89mr801530qtb.173.1514969280571;
 Wed, 03 Jan 2018 00:48:00 -0800 (PST)
MIME-Version: 1.0
Received: by 10.12.174.5 with HTTP; Wed, 3 Jan 2018 00:48:00 -0800 (PST)
In-Reply-To: <f3f94485-2f71-26d0-5a81-10e3166d3538@atech.media>
References: <7b85fc73-9cc8-0a60-5264-d26f47af5eae@atech.media>
 <CA+_eA9hthoig+_UZQNZhM-aBndM44f0wz-NKqWUoYpBA8Ss0jQ@mail.gmail.com>
 <6c5de1ed-0545-31b3-d0e2-4258fa4ccf1c@atech.media>
 <CA+_eA9hxQuej8L3SdY+hgpnDH3tccgsqOBtw1S=RkvURxu=Ktg@mail.gmail.com>
 <da1e5904-30c8-b06b-6e7f-0bf26fc99a17@atech.media>
 <CA+_eA9hs-GUCRH+5FAs1SPyR8S8GFndq_ScgDAmJ8njgOsQBCQ@mail.gmail.com>
 <f3f94485-2f71-26d0-5a81-10e3166d3538@atech.media>
From: Vincenzo Maffione <v.maffione@gmail.com>
Date: Wed, 3 Jan 2018 09:48:00 +0100
Message-ID: <CA+_eA9g5HxE9VVFEsKW-yXAtr_8-_qSQMpyaRLNUy0zApOXydw@mail.gmail.com>
Subject: Re: Linux netmap memory allocation
To: Charlie Smurthwaite <charlie@atech.media>
Cc: "freebsd-net@freebsd.org" <net@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.25
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Jan 2018 08:48:02 -0000

Hi Charlie,

2018-01-03 0:07 GMT+01:00 Charlie Smurthwaite <charlie@atech.media>:

> Hi Vincenzo,
>
>
>> I am using poll(), and I am not specifying NETMAP_NO_TX_POLL, and have
>> found that sometimes frames and sent only when the TX buffer is full, and
>> sometimes they are not sent at all. They are never sent as expected on
>> every invocation of poll(). If I run ioctl(NIOCTXSYNC) manually, everything
>> works correctly. I assume I have simply missed something from my nmreq.
>>
>
> I don't think you have missed anything within nmreq.  I see that you are
> waiting for POLLIN only (and this is right in your router case), so poll()
> will actually invoke txsync on interface #i only when netmap intercepts an
> RX or TX interrupt on interface #i. This means that packets may stall for
> long time in the TX rings if you don't call ioctl(TXSYNC). The manual is
> not wrong, however. You can look at the apps/bridge/bridge.c example to
> understand where this "poll automatically calls txsync" thing is useful.
>
> Thank you for the clarification. I have now altered my code to call TXSYNC
> after each iteration, but only if I have modified the TX ring for that
> interface. This seems to work perfectly. The patch can be seen at
> https://github.com/catphish/netmap-router/commit/
> 2961ab16f14a8b2a2561c9d73f73857e523cc177
>

I see, it looks good.

>
>
>
>> You also mentioned: "whether netmap calls or does not call txsync/rxsync
>> on certain rings depends on the parameters passed to nm_open()". I do not
>> use the nm_open helper method, but I am extremely interested to know what
>> parameters would affect this bahaviour, as this would seem very relevant to
>> my problem.
>>
>
> Yes, we do not normally use the low level interface (ioctl(REGIF)),
> because it's just simpler to use the nm_open() interface. Within the first
> parameter of nm_open() you can specify to open just one RX/TX rings couple,
> e.g. with "enp1f0s1-3". Then you usually want to mmap() just once (as you
> do in your program); with nm_open(), you do that with the NM_OPEN_NO_MMAP
> flag.
>
> I did look at nm_open, and even read the source of nm_open to discover how
> to implement the shared memory, but (for no good reason) I preferred to set
> up the interface manually.
>

That's ok.

>
>> If you are interested or if it helps explain my question, my complete
>> code (hopefully well commented but far from complete) can be found here:
>> https://github.com/catphish/netmap-router/blob/58a9b957c19b0
>> a012088c491bd58bc3161a56ff1/router.c
>>
>> Specifically, if the ioctl call at line 92 is removed, the code does not
>> work (packets are not transmitted, or are only transmitted when the buffer
>> is full, which of these 2 behaviours seems to be random), however I would
>> expect it to work because I do not specify NETMAP_NO_TX_POLL, and I would
>> therefore hope that the poll() call on line 80 would have the same effect.
>>
>
> Yes, that depends on when netmap_poll() is called by the kernel, that
> depends on when something is ready for receive on the file descriptor.
> Looking at your program, I think you need to call ioctl(TXSYNC), at least
> because you don't want to introduce artificial/unbounded latency. However,
> since these calls are expensive, you could use them only when necessary
> (e.g. when you nm_ring_space(txring) == 0 or when you actually forwarded
> some packets on txring.
>
> Per the patch above I now call TXSYNC on an interface only after pushing a
> batch of packets to it and this seems to work perfectly, at least with a
> good balance between performance and latency. If nm_ring_space(txring) == 0
> I just drop frames until the next batch. I don't TXSYNC part way through a
> batch, it hasn't yet seemed necessary, but I may need to look into this
> later.
>

Right, there are some heuristics you can try. Calling TXSYNC if you find
nm_ring_space(txring) == 0 while forwarding is a common one, as you
suggest. It can be beneficial or not, depending on your machine, NIC and
workload, so one should just try.


>
> I'm running this on a 6-core 2.8GHz Xeon with a 4-port i350-T4 NIC. I
> thought I'd just post some stats of the performance I observe using my code
> (excluding the routing table lookup as this isn't relevant to netmap). Not
> really looking for any advice here, just thought I'd share my results.
>
> All examples are with 1.488Mpps (1 x 1Gbps) input and no packet loss
> observed:
> 1 thread - CPU usage = 100%, batch size = 4
> 2 thread - CPU usage = 54% (27% x 2), batch size = 12
> 4 thread - CPU usage = 98% (25% x 4), batch size = 8
> 6 thread - CPU usage = 124% (21% x 6), batch size = 8
>
> And again with 2.976Mpps (2 x 1Gbps) input and no packet loss observed:
> 1 thread - CPU usage = 100%, batch size = 12
> 2 thread - CPU usage = 68% (34% x 2), batch size = 21
> 4 thread - CPU usage = 100% (25% x 4), batch size = 17
> 6 thread - CPU usage = 105% (18% x 6), batch size = 16
>
> These results seem excellent and demonstrate that netmap is scaling as
> expected with both threads and packet volume. The higher thread count will
> be more beneficial when I am doing more processing on each packet.
>

Yes, as you can see the batch size is very beneficial to CPU utilization
and packet rate, because poll/ioctl are kind of expensive. You could try to
achieve higher batch to possibly better results. If you don't mind adding a
controlled latency you could experiment with adding something like
"usleep(30)" in your forwarding loop: this should lead to larger batches.


>
>
>> I hope this all makes sense, and again, I hope I have simply missed
>> something from the nmreq i pass to NIOCREGIF.
>>
>> It is worth mentioning that with the exception of this problem /
>> confusion, I am getting extremely good results from this code and netmap in
>> general.
>>
>
> That's nice to hear :)
> Your program looks simple enough that we could even add it to the examples
> (as an example of routing logic).
>
> I'd be very happy to contribute to the documentation in any way that may
> be helpful. I have added a permissive licence to my Github repository just
> in case my code of of use to anyone else. It is currently somewhat
> incomplete as an IPv4 router as it doesn't update MAC addresses on frames
> before forwarding them, and because the interface names are hardcoded, but
> when it's more complete I'd be very happy for it to be contributed to the
> examples. Of course anyone is free to use my code for any purpose too.
>
> Thanks for all your assistance! I'm happy enough with this that I will
> move on to looking at my IP routing code.
>

Ok, thanks!

Vincenzo

>
> Charlie
>
>
>
> *Charlie Smurthwaite*
> Technical Director
>
> *tel.* *email.* charlie@atech.media *web.* https://atech.media
>
> *This e-mail has been sent by aTech Media Limited (or one of its
> assoicated group companys, Dial 9 Communications Limited or Viaduct Hosting
> Limited). Its contents are confidential therefore if you have received this
> message in error, we would appreciate it if you could let us know and
> delete the message. aTech Media Limited is a UK limited company,
> registration number 5523199. Dial 9 Communications Limited is a UK limited
> company, registration number 7740921. Viaduct Hosting Limited is a UK
> limited company, registration number 8514362. All companies are registered
> at Unit 9 Winchester Place, North Street, Poole, Dorset, BH15 1NX.*
>


-- 
Vincenzo Maffione