From owner-freebsd-net@freebsd.org Tue Jan 2 11:36:20 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E383AE86F55 for ; Tue, 2 Jan 2018 11:36:20 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id BB4BB7B610 for ; Tue, 2 Jan 2018 11:36:20 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id BA8CAE86F4C; Tue, 2 Jan 2018 11:36:20 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BA260E86F4A for ; Tue, 2 Jan 2018 11:36:20 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: from mail-qk0-x235.google.com (mail-qk0-x235.google.com [IPv6:2607:f8b0:400d:c09::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7DF987B60F for ; Tue, 2 Jan 2018 11:36:20 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: by mail-qk0-x235.google.com with SMTP id j137so43211425qke.10 for ; Tue, 02 Jan 2018 03:36:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wVdZ1CzOEUETiBUBtW7Jap7tCi0jffg6UtrrNso3Pag=; b=UurbqyTQ3xjwhMCMxxho8j88v4V9oTk/kRlBFylvCMCisUgK3bSRZ6TCBozSJsXmD6 8YesMpqJzIDWYVwRK6pGUFrWMAN7iRxY1Dc4FZzOpeSxQn9AABqW4qkhlCsJKQ0QoIwx jT4YLIQeV3EaBa9TOWE5pffDR1SH7qiQI5dvVlEZPrQQVyiF3V53l/Nik2jzjNloLsg0 87SxH912YGjJAoUQfaKb2xynw+CgkpSeWCC/aT2/v6KJfGBX7rx4RI7Icv4HxpzuNFEe z/9LVB79ddQv7ip5x/2iSvqhaoBi/zg2nMQDkHv+UQ+NjTxTdtUpkP7qFlM7QCtJgH+I EAdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wVdZ1CzOEUETiBUBtW7Jap7tCi0jffg6UtrrNso3Pag=; b=ODEzD52usQq2ijn7l2yUwP7NQ7aziIr1cX0Rt/XCIe3sZ4GP6WXe/YQWdLojKo8pqS KGo+d0Coyzc/zuCw9ZaAJE4wHLZyvoHB6OlWrBB9nY05WrI5vTgRpppQJNAiSvpBMtkQ zyl74BZuEkHYdOASKj7sUHUjTnfBzk9ZQa2LEd+uTgksbZdEJlrhNz8XlungVqHRcVsP 1E4Yng08dvevo/R0Zf/rFrs2zA2Aam7WNH8oqmR+BEOwGteaIZ1ezjUwnG/PTTZC4NnX bxc8poxBxsTNXwRwxDDZjViAC1LmpyvvRdqdcRYDUWm0EJ7ycLfj5+p6Vovoacte9zZt xi2A== X-Gm-Message-State: AKGB3mJiFDshRGK8K90dNiMk/ubburG6R2VQyqWBNRr99we4qdvQIMYC DRAmWQQTcA8Eh0yl8BQWNJvtQxg62ANpmKW8jqJvXA== X-Google-Smtp-Source: ACJfBotQu7XgVlUW+zZK9lmL/+z+JYNcwt0ajc7KbU6ygeO3XmwbHbUCrBUHa8Aq9/PZtHm8D85RrUDi7TaeHOZQpXs= X-Received: by 10.55.212.204 with SMTP id s73mr49083519qks.142.1514892979264; Tue, 02 Jan 2018 03:36:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.174.5 with HTTP; Tue, 2 Jan 2018 03:36:18 -0800 (PST) In-Reply-To: References: <7b85fc73-9cc8-0a60-5264-d26f47af5eae@atech.media> <6c5de1ed-0545-31b3-d0e2-4258fa4ccf1c@atech.media> From: Vincenzo Maffione Date: Tue, 2 Jan 2018 12:36:18 +0100 Message-ID: Subject: Re: Linux netmap memory allocation To: Charlie Smurthwaite Cc: "freebsd-net@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jan 2018 11:36:21 -0000 2018-01-01 23:05 GMT+01:00 Charlie Smurthwaite : > > On 01/01/18 21:05, Vincenzo Maffione wrote: > > > > 2018-01-01 17:14 GMT+01:00 Charlie Smurthwaite : > >> Hi, >> >> Thank you for your reply. I was able to resolve this. >> >> 1) I do indeed open one FD per NIC >> 2) I no longer specify nr_arg1, nr_arg2 nor nr_arg3. Instead I just >> verify that all NICs return with identical nr_arg2 so that the memory is >> shared between them. >> 3) I properly initialized my memory, my failure to do so was causing me a >> lot of confusion, >> >> The resulting memory space is large enough for all the NICs, and >> everything works perfectly with zero-copy forwarding, great! >> >> The only thing I am still having trouble with is the ability to >> simultaneously trigger a TX and an RX sync on all NICs. I have tried >> select, poll, and epoll, and in all cases, RX rings are updated but TX >> rings are not and TX packets are not pushed out (this occurs using both >> native and emulated netmap modes). I notice the documentation says "Note >> that on epoll and kqueue, NETMAP_NO_TX_POLL and NETMAP_DO_RX_POLL only have >> an effect when some event is posted for the file descriptor.", but the >> behaviour seems the same on poll and select as well as epoll, perhaps this >> is a linux-specific implementation detail? >> > I have also found that all of these mechanisms seem to incur a very high >> cost in terms of CPU time (making them no more efficient than busy waiting >> at 1Mpps+). My current approach is as follows, but I feel like there should >> be a better option: >> >> for(int n=0; n> // usleep(10); // More CPU time seems to be saved with a careful >> sleep than with select/poll/epoll >> ioctl(fds[n], NIOCTXSYNC); >> ioctl(fds[n], NIOCRXSYNC); >> rxring = rxrings[n]; >> while (!nm_ring_empty(rxring)) { >> // Forward any packets waiting in this NIC's RX ring to the >> appropriate TX ring >> } >> } >> > > If you are using poll() or select() you should not use ioctl(NIOC*XSYNC), > as the txsync/rxsync operations are automatically performed within the > poll()/select() syscall (at least assuming you did not specify > NETMAP_NO_TX_POLL). > Also, whether netmap calls or does not call txsync/rxsync on certain rings > depends on the parameters passed to nm_open(). > Make sure you check for nm_ring_space(txring) when forwarding. > > Cheers, > Vincenzo > > > > Hi Vincenzo, > > Thanks again for your assistance. You state the following (as does the > manual): > > "If you are using poll() or select() you should not use ioctl(NIOC*XSYNC), > as the txsync/rxsync operations are automatically performed within the > poll()/select() syscall (at least assuming you did not specify > NETMAP_NO_TX_POLL)." > > However, this is not happening for me :( > > I am using poll(), and I am not specifying NETMAP_NO_TX_POLL, and have > found that sometimes frames and sent only when the TX buffer is full, and > sometimes they are not sent at all. They are never sent as expected on > every invocation of poll(). If I run ioctl(NIOCTXSYNC) manually, everything > works correctly. I assume I have simply missed something from my nmreq. > I don't think you have missed anything within nmreq. I see that you are waiting for POLLIN only (and this is right in your router case), so poll() will actually invoke txsync on interface #i only when netmap intercepts an RX or TX interrupt on interface #i. This means that packets may stall for long time in the TX rings if you don't call ioctl(TXSYNC). The manual is not wrong, however. You can look at the apps/bridge/bridge.c example to understand where this "poll automatically calls txsync" thing is useful. > You also mentioned: "whether netmap calls or does not call txsync/rxsync > on certain rings depends on the parameters passed to nm_open()". I do not > use the nm_open helper method, but I am extremely interested to know what > parameters would affect this bahaviour, as this would seem very relevant to > my problem. > Yes, we do not normally use the low level interface (ioctl(REGIF)), because it's just simpler to use the nm_open() interface. Within the first parameter of nm_open() you can specify to open just one RX/TX rings couple, e.g. with "enp1f0s1-3". Then you usually want to mmap() just once (as you do in your program); with nm_open(), you do that with the NM_OPEN_NO_MMAP flag. > > If you are interested or if it helps explain my question, my complete code > (hopefully well commented but far from complete) can be found here: > https://github.com/catphish/netmap-router/blob/ > 58a9b957c19b0a012088c491bd58bc3161a56ff1/router.c > > Specifically, if the ioctl call at line 92 is removed, the code does not > work (packets are not transmitted, or are only transmitted when the buffer > is full, which of these 2 behaviours seems to be random), however I would > expect it to work because I do not specify NETMAP_NO_TX_POLL, and I would > therefore hope that the poll() call on line 80 would have the same effect. > Yes, that depends on when netmap_poll() is called by the kernel, that depends on when something is ready for receive on the file descriptor. Looking at your program, I think you need to call ioctl(TXSYNC), at least because you don't want to introduce artificial/unbounded latency. However, since these calls are expensive, you could use them only when necessary (e.g. when you nm_ring_space(txring) == 0 or when you actually forwarded some packets on txring. > > I hope this all makes sense, and again, I hope I have simply missed > something from the nmreq i pass to NIOCREGIF. > > It is worth mentioning that with the exception of this problem / > confusion, I am getting extremely good results from this code and netmap in > general. > That's nice to hear :) Your program looks simple enough that we could even add it to the examples (as an example of routing logic). Cheers, VIncenzo > > Charlie > > > *Charlie Smurthwaite* > Technical Director > > *tel.* *email.* charlie@atech.media *web.* https://atech.media > > *This e-mail has been sent by aTech Media Limited (or one of its > assoicated group companys, Dial 9 Communications Limited or Viaduct Hosting > Limited). Its contents are confidential therefore if you have received this > message in error, we would appreciate it if you could let us know and > delete the message. aTech Media Limited is a UK limited company, > registration number 5523199. Dial 9 Communications Limited is a UK limited > company, registration number 7740921. Viaduct Hosting Limited is a UK > limited company, registration number 8514362. All companies are registered > at Unit 9 Winchester Place, North Street, Poole, Dorset, BH15 1NX.* > -- Vincenzo Maffione