From owner-freebsd-net@freebsd.org Tue Nov 21 06:51:04 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 426C4DE50B9 for ; Tue, 21 Nov 2017 06:51:04 +0000 (UTC) (envelope-from sunxiaoye07@gmail.com) Received: from mail-oi0-x22c.google.com (mail-oi0-x22c.google.com [IPv6:2607:f8b0:4003:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E521F2DB3 for ; Tue, 21 Nov 2017 06:51:03 +0000 (UTC) (envelope-from sunxiaoye07@gmail.com) Received: by mail-oi0-x22c.google.com with SMTP id d93so142311oic.4 for ; Mon, 20 Nov 2017 22:51:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=bid6bPhf4LH6PwF9oU5F0FyS2HqpAC+vOVVjFz5fw+k=; b=U8Xdh7ZAGWSIQodQEs7+q9JFox3/0QCwAAjY1ts2wddSoDrm47qB6Em3IfPMWz6seM fMA/BKGmhCsI0L0tYUj4XcUp9CvxgYGaPf+U57h+vLzyZw5NitadgHrMa8sHAz9AH4fl HoNZw1O2sx7JM7kbcwuxGWabGvZ8AeIBxTAtic9MQXZP2jxZZuq9+OCrDy4dfRGD7ZSd a7pyg7q/MokNj7ydldYEgEj6cY/yis/ZyfAemxH9kOavjmq+lvyZDIniAQzV5eJmiBwt o7MiAeZOjB6SKHp34ytxrZC6uflOW0ZUsyAfxlzo8728Kkvv4F7XvL+WnhbOSGwvoymZ hK4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=bid6bPhf4LH6PwF9oU5F0FyS2HqpAC+vOVVjFz5fw+k=; b=dOXT5rIC/AIGeL7wK48euVpq+snBR1lHXv8Z8HGDbMB/ED3+MyYpXwsBM4R/eOZSdG 3qUmJTdkOLtLeKnek9mYza7HO/feRUTBFEGdcqOMvnmqMg6aRYDSG+ZpsQiRrFs0dE2X usc99EwZxLIG14ocOiDf8iixMmh60MVZjVHqKsTjf3KVOZQ5gDBZFT44hTRFb6k/Xo9A 2Y8ElG5C8hq+HnW1xY3gt4sz+bGUCvnv/ullc+BPA7HgtSpeQoK/ooQVG+NP8VDQsveM g1y4HW9wJbWcKX49KrVmqbPbV0TUDD+BvKO16HJ9Cg6zvWPB4+IynCbUO0HLEYrTkGvw Oo+g== X-Gm-Message-State: AJaThX7R45n6YgtEUXZ6cXGJBtZXWTvO2dVD4mZFZzrNEnzYab6Autsf ylckJfLNaLyUNG99G20cu8pgeeC5ow5E7DdpRlYMbg== X-Google-Smtp-Source: AGs4zMYoDFoOHPlPtSW+lugc2pBRs8FJLVuN2Qsa3DLFNERFmPyleT8UaFGuaIfnjshT1PPA+9H29TFlvwPez7uMMr0= X-Received: by 10.202.173.207 with SMTP id w198mr640170oie.12.1511247062718; Mon, 20 Nov 2017 22:51:02 -0800 (PST) MIME-Version: 1.0 Sender: sunxiaoye07@gmail.com Received: by 10.157.14.167 with HTTP; Mon, 20 Nov 2017 22:51:02 -0800 (PST) In-Reply-To: References: From: Xiaoye Sun Date: Tue, 21 Nov 2017 00:51:02 -0600 X-Google-Sender-Auth: wpRnucjkSHAA1jwatk8lmpQ9MaI Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success To: Luigi Rizzo Cc: Victor Detoni , Pavel Odintsov , "freebsd-net@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Nov 2017 06:51:04 -0000 Hi, Recently I found another problem with netmap. I think this new problem could be related to the problems in this threads so I just post the new problem here. In my setup, I have a sender program having a netmap ring (a pair of RX/TX ring) for the NIC and a ring for the host stack. The sender program puts customized packets (each packet has a unique sequence number and the sender sends the packet in a sequence number increasing order) to the NIC TX ring directly and also forwards the packets from the host RX ring to the NIC TX ring using "zerocopy" by swapping the buffer indices. However, the receiver sees duplicated customized packets. For example, in the case where the ring size is 32 (32 slots in a ring) the order of the sequence numbers the receiver see is 1,2,3,4,5,...,68,69,*70* ,71,72,73,...,99,100,*70*,101,102,103,... . An interesting thing I found is that the "gaps" between these two duplicated packets (70 in the example) are always a number very close to the ring size, 32 in this example. In my experiment, I use a ring with 4096 slots and the gap is always more than 4090 and close to 4096. I verified that this duplication happens due to the sender, not the receiver. Assuming my sender's implementation is correct, then this duplication may happen in netmap and the NIC driver (ixgbe). Thinking back to the original problem in this post, I think these problems may be related. It seems to me that there could be multiple threads pulling the packets from the NIC TX ring (or the thread moved to other CPUs when the problem occurs) and these threads may run on different cores so that the outdated content in the buffer may be sent out when new content is written to the buffer. I am wondering if there is a way to pin the NIC driver of the netmap module to a specific core. or is there a way to know the root of such problem? Best, Xiaoye On Wed, Feb 10, 2016 at 10:18 AM, Xiaoye Sun wrote: > Hi Luigi, > > Thanks Luigi! > Pinning the process to one CPU core solves the reorder problem!!! > Let me check if the duplicated packet problem is solved also. > > Thanks! > > Best, > Xiaoye > > On Wed, Feb 10, 2016 at 7:21 AM, Luigi Rizzo wrote: > >> On Tue, Feb 9, 2016 at 1:12 PM, Xiaoye Sun wrote: >> > Hi Luigi, >> > >> > Have you seen the previous email. any comments? >> >> Hi, >> to summarize, you are seeing reordering when >> reinjecting packets into the host stack from bridge.c >> >> On Linux, the NIOCTXSYNC towards the host stack calls netif_rx() one >> packet at a time (on freebsd that would be ifp->if_input()), and >> the calls are synchronous. >> >> In order to get reordering, you should have the following >> sequence of events: >> >> 1. bridge.c calls ioctl(NIOCTXSYNC) >> 2. netif_rx() queues packet instead of dispatching them to the socket >> 3. bridge.c builds another batch and calls ioctl(NIOCTXSYNC) >> 4. netif_rx() passes packets to the socket overtaking those in #2 >> >> I don't know whether netif_rx() can defer >> processing and how to prevent that. >> If this is the problem, >> one thing you could try is pin the bridge process >> to a specific core and see if that makes the problem >> disappear. >> >> cheers >> luigi >> >> > >> > On Fri, Feb 5, 2016 at 3:29 PM, Xiaoye Sun wrote: >> >> >> >> Hi Victor, >> >> Thanks for the help. The command you provided worked perfectly for me. >> >> >> >> Hi Luigi, >> >> >> >> Thanks for your clarification. >> >> >> >> The experiment I did was NOT running on 3 nodes. They ran on two nodes. >> >> node 1 ran [1. sender]; node 2 ran [2. bridge.c] and [3. receiver (not >> using >> >> netmap)]; [2. bridge.c ] saw packets inorder. [3. receiver] saw packets >> >> out-of-order. I saw replication packets (even corrupted packets) in the >> >> setup I mentioned in my first email in this threads. I did not see >> >> replication packet in the sender-bridge-receiver setup. Let's solve the >> >> reorder problem first and then solve the replication packet problem. >> >> >> >> I also tried the experiment setup having 3 nodes running sender, >> bridge, >> >> receiver( both non-netmap based and netmap based XYZ) respectively. In >> the 3 >> >> nodes experiment, there is NO packet reorder no any node. The >> difference >> >> between the 2 nodes experiment and the 3 nodes experiment is that in >> the >> >> bridge of node 2 in the 2-nodes experiment the bridge interact with >> the host >> >> stack, while netmap does not interact with host stack in the 3-node >> >> experiment. >> >> >> >> This makes me make the conclusion that there might be some problem with >> >> the interaction between netmap and host stack. What is your opinion? >> >> >> >> Thanks! >> >> >> >> Xiaoye >> >> >> >> On Thu, Feb 4, 2016 at 6:26 PM, Victor Detoni >> >> wrote: >> >>> >> >>> I'm sorry, I made mistake. To workaround this try `ip link set $IFACE >> >>> promisc on` >> >>> >> >>> >> >>> >> >>> On Thu, Feb 4, 2016 at 10:04 PM, Xiaoye Sun >> wrote: >> >>>> >> >>>> Yes. all the interfaces are up. Are you able to get ARP request when >> the >> >>>> interfaces are down? >> >>>> >> >>>> >> >>>> On Thursday, February 4, 2016, Victor Detoni > > >> >>>> wrote: >> >>>>> >> >>>>> Both interfaces are up? Like ifconfig... up >> >>>>> >> >>>>> I had this the same problem and I solve with commands above >> >>>>> >> >>>>> Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun >> >>>>> escreveu: >> >>>>>> >> >>>>>> Hi Luigi, >> >>>>>> >> >>>>>> Thanks for your explanation. >> >>>>>> >> >>>>>> I used three machines to do this experiment. They are directly >> >>>>>> connected. >> >>>>>> >> >>>>>> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)]. >> >>>>>> >> >>>>>> First, I tried to run bridge.c on machine2 using the command >> *bridge >> >>>>>> -i >> >>>>>> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not >> running >> >>>>>> on >> >>>>>> machine 1or3) >> >>>>>> >> >>>>>> For my understanding, in this setup, machine2 will be transparent >> to >> >>>>>> machine1&3 since it forwards packet from its eth2 to eth3 and vice >> >>>>>> versa >> >>>>>> without any modification to the packets. >> >>>>>> >> >>>>>> I tried to ping machine 3 from machine 1 using the command like >> *ping >> >>>>>> 10.11.10.3*. However, it still does not success. >> >>>>>> This is because that before machine1 sends ping message to >> machine3, >> >>>>>> it >> >>>>>> will first send a ARP request message to get the mac address of >> >>>>>> machine3. >> >>>>>> machine3 gets that ARP request, and send the reply back (I use >> tcpdump >> >>>>>> to >> >>>>>> verify that machine3 gets the ARP request and send out the ARP >> reply). >> >>>>>> However, machine1 does not get the ARP reply. >> >>>>>> >> >>>>>> I checked that the bridge can only forwarding packet in one >> direction >> >>>>>> at >> >>>>>> the same time. it gets the ARP request but doesn't see the ARP >> reply >> >>>>>> (*pkt_queued* always returns 0 for one nic...). >> >>>>>> >> >>>>>> This behavior looks very weird to me. Do you think there is a >> >>>>>> compatibility >> >>>>>> issues between netmap and the os I am using? Is there a verified >> linux >> >>>>>> distribution (also the version) that perfectly works well with >> netmap? >> >>>>>> >> >>>>>> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 >> >>>>>> (2015-05-24) >> >>>>>> x86_64 GNU/Linux. >> >>>>>> Linux kernel version is *3.16.0-4-amd64* >> >>>>>> >> >>>>>> >> >>>>>> Thanks! >> >>>>>> Xiaoye >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo >> >>>>>> wrote: >> >>>>>> >> >>>>>> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun > > >> >>>>>> > wrote: >> >>>>>> > > >> >>>>>> > > >> >>>>>> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo < >> rizzo@iet.unipi.it> >> >>>>>> > > wrote: >> >>>>>> > >> >> >>>>>> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun < >> Xiaoye.Sun@rice.edu> >> >>>>>> > >> wrote: >> >>>>>> > >> > Hi Luigi, >> >>>>>> > >> > >> >>>>>> > >> > I have to clarify about the *jumping issue* about the slot >> >>>>>> > >> > indexes. >> >>>>>> > >> > In the bridge.c program, the slot index never jumps and it >> >>>>>> > >> > increases >> >>>>>> > >> > sequentially. >> >>>>>> > >> > In the receiver.c program, the udp packet seq jumps and I >> >>>>>> > >> > showed the >> >>>>>> > >> > slot >> >>>>>> > >> > index that each udp packet uses. So the slot index jumps >> >>>>>> > >> > together with >> >>>>>> > >> > the >> >>>>>> > >> > udp seq (at the receiver program only). >> >>>>>> > >> >> >>>>>> > >> So let me understand, is the "slot" some information written >> >>>>>> > >> in the packet by bridge.c (referring to the rx or tx slot, >> >>>>>> > >> I am not sure) and then read and printed by receiver.c >> >>>>>> > >> (which gets the packet through recvfrom so there isn't >> >>>>>> > >> really any slot index) ? >> >>>>>> > >> >> >>>>>> > > It works in the other way: >> >>>>>> > > The bridge.c checks the seq numbers of the udp packets in >> netmap >> >>>>>> > > slots >> >>>>>> > (in >> >>>>>> > > nic rx ring) before the swap; then it records the seq number, >> slot >> >>>>>> > > number(both rx and tx (tx indexes were not shown in the >> previous >> >>>>>> > > email >> >>>>>> > since >> >>>>>> > > they all look correct)) and buf_idx (rx and tx). The bridge.c >> does >> >>>>>> > > not >> >>>>>> > > change anything in the buffer and it knows the slot and buf_idx >> >>>>>> > > that a >> >>>>>> > > packet uses. Please refer to the added code in *process_rings* >> >>>>>> > > function >> >>>>>> > > http://www.owlnet.rice.edu/~xs6/bridge.c >> >>>>>> > > The receiver.c checks the seq numbers only and print out the >> seq >> >>>>>> > > numbers >> >>>>>> > it >> >>>>>> > > receive sequentially. >> >>>>>> > > With these information, I manually match the seq number I got >> from >> >>>>>> > > receiver.c and the seq number I got from bridge.c. So we know >> what >> >>>>>> > > is the >> >>>>>> > > seq order the receiver sees and which slot a packet uses when >> >>>>>> > > bridge.c >> >>>>>> > swaps >> >>>>>> > > the buf_idxs. >> >>>>>> > > >> >>>>>> > >> Do you see any ordering inversion when the receiver >> >>>>>> > >> gets packets through the NETMAP API (e.g. using bridge.c >> >>>>>> > >> instead of receiver.c) ? >> >>>>>> > >> >> >>>>>> > > There is no ordering inversion seen by bridge.c (As I said in >> the >> >>>>>> > previous >> >>>>>> > > paragraph, the bridge.c checks the seq number and I did not see >> >>>>>> > > any order >> >>>>>> > > inversion in THIS simple experiment (In my multicast protocol >> >>>>>> > > (mentioned >> >>>>>> > in >> >>>>>> > > the first email), there is ordering inversion. But let us solve >> >>>>>> > > the >> >>>>>> > simple >> >>>>>> > > bridge.c's problem first. I think they are two relatively >> >>>>>> > > independent >> >>>>>> > > issues.)). >> >>>>>> > >> >>>>>> > Sorry there was a misunderstanding. >> >>>>>> > I wanted you to check the following setup: >> >>>>>> > >> >>>>>> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ] >> >>>>>> > >> >>>>>> > where in XYZ you replace your receiver.c with some >> >>>>>> > netmap-based receiver (it could be pkt-gen in rx mode, >> >>>>>> > or possibly even another instance of bridge.c where >> >>>>>> > you connect the output port to a vale switch so >> >>>>>> > traffic is dropped), and then in XYZ print the content >> >>>>>> > of the packets. >> >>>>>> > >> >>>>>> > From your previous report we know that node 2: sees packets >> >>>>>> > in order, and node 3: sees packets out of order. >> >>>>>> > However, if the problem were due to bridge.c sending >> >>>>>> > the old buffer and not the new one, you'd see not only >> >>>>>> > reordering but also replication of packets. >> >>>>>> > >> >>>>>> > The fact that you see only the reordering in 3: makes >> >>>>>> > me think that the problem is in that node, and it could >> >>>>>> > be the network stack in 3: that does something strange. >> >>>>>> > So if you can run something netmap based in 3: and make >> >>>>>> > sure there is only one queue to read from, we could >> >>>>>> > at least figure out what is going on. >> >>>>>> > >> >>>>>> > cheers >> >>>>>> > luigi >> >>>>>> > >> >>>>>> > >> >>>>>> > is that >> >>>>>> > > >> >>>>>> > >> >> >>>>>> > >> Are you using native netmap drivers or the emulated mode ? >> >>>>>> > >> You can check that by playing with the "admode" sysctl entry >> >>>>>> > >> (or sysfs on linux) - try setting to 1 and 2 and see if >> >>>>>> > >> the behaviour changes. >> >>>>>> > >> >> >>>>>> > >> dev.netmap.admode: 0 >> >>>>>> > >> Controls the use of native or emulated adapter >> mode. >> >>>>>> > >> 0 uses the best available option, >> >>>>>> > >> 1 forces native and fails if not available, >> >>>>>> > >> 2 forces emulated hence never fails. >> >>>>>> > >> >> >>>>>> > > I was using admode 0. I changed the admode to 1 and 2 using the >> >>>>>> > > command >> >>>>>> > like >> >>>>>> > > *echo 1 > /sys/module/netmap/parameters/admode* and restart >> the >> >>>>>> > > bridge >> >>>>>> > > program. The behavior keeps the same. >> >>>>>> > > >> >>>>>> > >> >> >>>>>> > >> cheers >> >>>>>> > >> luigi >> >>>>>> > >> >> >>>>>> > >> > >> >>>>>> > >> > There is really one ring (tx and rx) for NIC and one ring >> (tx >> >>>>>> > >> > and rx) >> >>>>>> > >> > for >> >>>>>> > >> > the host. >> >>>>>> > >> > I also doubt that there might be multiple tx rings for the >> >>>>>> > >> > host. It >> >>>>>> > >> > seems >> >>>>>> > >> > like that bridge program swap packet to multiple host rings >> and >> >>>>>> > >> > the >> >>>>>> > udp >> >>>>>> > >> > recv >> >>>>>> > >> > program drains packets from these rings. But this is not the >> >>>>>> > >> > case >> >>>>>> > here. >> >>>>>> > >> > >> >>>>>> > >> > The bridge program prints a line like this >> >>>>>> > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 >> 0x0/1.* >> >>>>>> > >> > this is printed by the following line the original program >> >>>>>> > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", >> pa->req.nr_name, >> >>>>>> > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, >> >>>>>> > >> > pb->first_rx_ring, >> >>>>>> > >> > pb->req.nr_rx_rings);* >> >>>>>> > >> > >> >>>>>> > >> > I think this shows that there is really one NIC ring and one >> >>>>>> > >> > HOST >> >>>>>> > ring. >> >>>>>> > >> > >> >>>>>> > >> > Is there another way to verify the number of ring that >> netmap >> >>>>>> > >> > has? >> >>>>>> > >> > >> >>>>>> > >> > Thanks! >> >>>>>> > >> > Xiaoye >> >>>>>> > >> > >> >>>>>> > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo >> >>>>>> > >> > >> >>>>>> > wrote: >> >>>>>> > >> >> >> >>>>>> > >> >> Hi, >> >>>>>> > >> >> there must be some wrong with your setting because >> >>>>>> > >> >> slot indexes must be sequential and in your case they >> >>>>>> > >> >> are not (see the jump from 295 to 474 and then >> >>>>>> > >> >> back from 485 to 296, and the numerous interleavings >> >>>>>> > >> >> that you are seeing later). >> >>>>>> > >> >> >> >>>>>> > >> >> I have no idea of the cause but typically this pattern >> >>>>>> > >> >> is what you see when there are multiple input rings and >> >>>>>> > >> >> not just one. >> >>>>>> > >> >> >> >>>>>> > >> >> Cheers >> >>>>>> > >> >> Luigi >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun >> >>>>>> > >> >> >> >>>>>> > >> >> wrote: >> >>>>>> > >> >> > Hi Luigi, >> >>>>>> > >> >> > >> >>>>>> > >> >> > Thanks for the detailed advice. >> >>>>>> > >> >> > >> >>>>>> > >> >> > With more detailed experiments, actually I found that the >> >>>>>> > >> >> > udp >> >>>>>> > >> >> > sender/receiver packet reorder issue *might* be >> irrelevant >> >>>>>> > >> >> > to the >> >>>>>> > >> >> > original >> >>>>>> > >> >> > issue I posted. However, I think we should solve the udp >> >>>>>> > >> >> > sender/receiver >> >>>>>> > >> >> > issue first. >> >>>>>> > >> >> > I run the experiment with more detailed log. Here is my >> >>>>>> > >> >> > findings. >> >>>>>> > >> >> > >> >>>>>> > >> >> > 1. I am running a netmap version available since about >> Oct >> >>>>>> > >> >> > 13rd >> >>>>>> > from >> >>>>>> > >> >> > github >> >>>>>> > >> >> > (https://github.com/luigirizzo/netmap). So I think this >> is >> >>>>>> > >> >> > not the >> >>>>>> > >> >> > one >> >>>>>> > >> >> > related to the buffer allocation issue. I tried to >> running >> >>>>>> > >> >> > the >> >>>>>> > newest >> >>>>>> > >> >> > version, however, that version causes problem when I exit >> >>>>>> > >> >> > the >> >>>>>> > bridge >> >>>>>> > >> >> > program >> >>>>>> > >> >> > (something like kernel error which make the os crash). >> >>>>>> > >> >> > >> >>>>>> > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can >> get >> >>>>>> > >> >> > more >> >>>>>> > >> >> > information (more detailed log). >> >>>>>> > >> >> > The reorder happens multiple times (about 10 times) >> within a >> >>>>>> > second. >> >>>>>> > >> >> > Here is >> >>>>>> > >> >> > one example trace collected from the above two programs. >> >>>>>> > (remembering >> >>>>>> > >> >> > that >> >>>>>> > >> >> > we have udp sender running on one machine; netmap bridge >> and >> >>>>>> > >> >> > udp >> >>>>>> > >> >> > receiver >> >>>>>> > >> >> > are running on another machine). >> >>>>>> > >> >> > There is only one pair of rings each with 512 slots (511 >> >>>>>> > >> >> > slot >> >>>>>> > usable) >> >>>>>> > >> >> > on >> >>>>>> > >> >> > the >> >>>>>> > >> >> > receiver machine. >> >>>>>> > >> >> > >> >>>>>> > >> >> > =================== packet trace collected from >> receiver.c >> >>>>>> > >> >> > =================== >> >>>>>> > >> >> > ===== together with the slot and buf_idx of the >> >>>>>> > >> >> > corresponding >> >>>>>> > netmap >> >>>>>> > >> >> > ring >> >>>>>> > >> >> > slots ====== >> >>>>>> > >> >> > [seq] [slot] [buf_idx] >> >>>>>> > >> >> > 8208 294 1833 >> >>>>>> > >> >> > 8209 295 1834 >> >>>>>> > >> >> > 8388 474 2013 >> >>>>>> > >> >> > ... (packet received in order) >> >>>>>> > >> >> > 8398 484 2023 >> >>>>>> > >> >> > 8399 485 2024 >> >>>>>> > >> >> > 8210 296 1835 >> >>>>>> > >> >> > 8211 297 1836 >> >>>>>> > >> >> > ... (packet received in order) >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > 8222 308 1847 >> >>>>>> > >> >> > 8400 486 2025 >> >>>>>> > >> >> > 8223 309 1848 >> >>>>>> > >> >> > 8401 487 2026 >> >>>>>> > >> >> > 8224 310 1849 >> >>>>>> > >> >> > 8402 488 2027 >> >>>>>> > >> >> > 8225 311 1850 >> >>>>>> > >> >> > 8403 489 2028 >> >>>>>> > >> >> > 8226 312 1851 >> >>>>>> > >> >> > 8404 450 2029 >> >>>>>> > >> >> > 8227 313 1852 >> >>>>>> > >> >> > 8228 314 1853 >> >>>>>> > >> >> > >> >>>>>> > >> >> > ============================== >> ===================================== >> >>>>>> > >> >> > As we can see that the udp receiver got packet 8210 >> after it >> >>>>>> > >> >> > got >> >>>>>> > >> >> > 8399, >> >>>>>> > >> >> > which >> >>>>>> > >> >> > is the first reorder. Then, the receiver got 8211 to 8222 >> >>>>>> > >> >> > sequentially. >> >>>>>> > >> >> > Then >> >>>>>> > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved. >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > ==================== event order seen by netmap bridge >> >>>>>> > >> >> > ================== >> >>>>>> > >> >> > get 8209 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8210 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8228 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8229 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8383 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8384 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8387 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8388 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8393 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8394 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8399 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8400 >> >>>>>> > >> >> > ... >> >>>>>> > >> >> > get 8404 >> >>>>>> > >> >> > poll called >> >>>>>> > >> >> > get 8405 >> >>>>>> > >> >> > >> >>>>>> > >> >> > ============================== >> ===================================== >> >>>>>> > >> >> > As we can see, from the event ordering see by the >> bridge.c, >> >>>>>> > >> >> > all the >> >>>>>> > >> >> > packets >> >>>>>> > >> >> > are receiver in order, which means the the reorder >> happens >> >>>>>> > >> >> > when the >> >>>>>> > >> >> > bridge >> >>>>>> > >> >> > code swap the buf_idx between the nic ring(slot) and the >> >>>>>> > >> >> > host >> >>>>>> > >> >> > ring(slot). >> >>>>>> > >> >> > The reordered seq usually right before or after the poll >> >>>>>> > >> >> > function >> >>>>>> > >> >> > call. >> >>>>>> > >> >> > >> >>>>>> > >> >> > Best, >> >>>>>> > >> >> > Xiaoye >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > >> >>>>>> > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo >> >>>>>> > >> >> > >> >>>>>> > >> >> > wrote: >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> wrote: >> >>>>>> > >> >> >> > Hi Luigi, >> >>>>>> > >> >> >> > >> >>>>>> > >> >> >> > Thanks for your advice. >> >>>>>> > >> >> >> > I forgot to mention that I use the command "ethtool -L >> >>>>>> > >> >> >> > eth1 >> >>>>>> > >> >> >> > combined >> >>>>>> > >> >> >> > 1" >> >>>>>> > >> >> >> > to >> >>>>>> > >> >> >> > set the number of rings of the nic to 1. The host >> also >> >>>>>> > >> >> >> > only has >> >>>>>> > >> >> >> > one >> >>>>>> > >> >> >> > ring. >> >>>>>> > >> >> >> > I understand the situation where the first tx ring is >> >>>>>> > >> >> >> > full so >> >>>>>> > the >> >>>>>> > >> >> >> > bridge >> >>>>>> > >> >> >> > will swap the packets to the second tx ring and then >> the >> >>>>>> > host/nic >> >>>>>> > >> >> >> > might >> >>>>>> > >> >> >> > drain either rings. But this is not the case in the >> >>>>>> > >> >> >> > experiment. >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> ok good to know that. >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> So if we have ruled out multiqueue and iommu, let's >> look at >> >>>>>> > >> >> >> the internal allocator and at bridge.c >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> 1. are you running the most recent version of netmap ? >> >>>>>> > >> >> >> Some older version (probably 1-2 years ago) had a bug >> >>>>>> > >> >> >> in the buffer allocator and some buffers were >> allocated >> >>>>>> > >> >> >> twice. >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> 2. can you tweak your receiver.c to report some more >> info >> >>>>>> > >> >> >> on how often you get out of sequence packets, how >> much >> >>>>>> > >> >> >> out of sequence they are ? >> >>>>>> > >> >> >> Also it would be useful to report gaps on the >> increasing >> >>>>>> > >> >> >> side >> >>>>>> > >> >> >> (i.e. new_seq != old_seq +1 ) >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> 3. can you tweak bridge.c so that it writes into the >> packet >> >>>>>> > >> >> >> the netmap buffer indexes and slots on the rx and tx >> >>>>>> > >> >> >> side, >> >>>>>> > >> >> >> so when you detect a sequence error we can figure out >> >>>>>> > >> >> >> where it is happening. >> >>>>>> > >> >> >> Ideally you could also add the sequence number >> detection >> >>>>>> > >> >> >> code in bridge.c so we can check whether the errors >> >>>>>> > >> >> >> appear >> >>>>>> > >> >> >> on the input or output sides. >> >>>>>> > >> >> >> >> >>>>>> > >> >> >> cheers >> >>>>>> > >> >> >> luigi >> >>>>>> > >> >> >> >> >>>>>> > >> >> > >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >> -- >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > >> >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> >>>>>> > >> >> dell'Informazione >> >>>>>> > >> >> http://www.iet.unipi.it/~luigi/ . Universita` di >> Pisa >> >>>>>> > >> >> TEL +39-050-2217533 . via Diotisalvi 2 >> >>>>>> > >> >> Mobile +39-338-6809875 . 56122 PISA >> (Italy) >> >>>>>> > >> >> >> >>>>>> > >> >> >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > >> >> >> >>>>>> > >> > >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> >> >>>>>> > >> -- >> >>>>>> > >> >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> >>>>>> > dell'Informazione >> >>>>>> > >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> >>>>>> > >> TEL +39-050-2217533 . via Diotisalvi 2 >> >>>>>> > >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> >>>>>> > >> >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > >> >> >>>>>> > > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > -- >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> >>>>>> > dell'Informazione >> >>>>>> > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> >>>>>> > TEL +39-050-2217533 . via Diotisalvi 2 >> >>>>>> > Mobile +39-338-6809875 . 56122 PISA (Italy) >> >>>>>> > >> >>>>>> > -----------------------------------------+------------------ >> ------------- >> >>>>>> > >> >>>>>> > >> >>>>>> _______________________________________________ >> >>>>>> freebsd-net@freebsd.org mailing list >> >>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> >>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@ >> freebsd.org" >> >>> >> >>> >> >> >> > >> >> >> >> -- >> -----------------------------------------+------------------------------- >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> TEL +39-050-2217533 . via Diotisalvi 2 >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> -----------------------------------------+------------------------------- >> >> >