From owner-freebsd-net@freebsd.org Fri Feb 5 00:26:03 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 151DAA9C5B6 for ; Fri, 5 Feb 2016 00:26:03 +0000 (UTC) (envelope-from victordetoni@gmail.com) Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CAD51BBC for ; Fri, 5 Feb 2016 00:26:02 +0000 (UTC) (envelope-from victordetoni@gmail.com) Received: by mail-ig0-x235.google.com with SMTP id xg9so3397718igb.1 for ; Thu, 04 Feb 2016 16:26:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=couDqgVekbWQZkfWc+xI2XWoBeE33MKrxstyCNOYeWg=; b=ahQ/cqR5L3JY2wM4R+GcZkAsVQG2FESJPwPGfjTkXzu1Iv2+wr5zd6h4DBKyr0PKcF /Cfj5K+2mGWN85j8dFY/oOPRqXQQV3V/RMmgGC3eYHcGJIrlqG6pXhahXpO2a/NZIi9q 25tE/Em3Ll65E8hTXU9xsVork3SgJgbAz1IgejidhdR/eibiS/X/3RKNXvpca+GM3xkN sV+8mo4vdGiCwQ9hiej3P14Rdgz1yPrnrlcZyMCY7QR4cZPclqBvtHP4s1wFmQY24QDM GuV57XX/ZaCVqFe553YWP/xz7Bqx8immzGGzNNgloOcNMaYw1YRAn6MIQBbvmQw1FL2W 7JTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=couDqgVekbWQZkfWc+xI2XWoBeE33MKrxstyCNOYeWg=; b=SnsQagUaFQECfoDoIdFBCJjsW5e2WHaujWAJbCcrnwuPGh+JSNPfJrJwDCBu4Vuswb BAyEsH9a15EK1Izj2bL2OZP+Nfdp1HpRIel8RTGzwXsCFQdo2ZqWw1Z3+ZMyGwajhv10 uV4alauwMtO9VAO6BrYMpBLLMyf6eyP7c8S1y8yEQk8ihYJ/pjCrr1hFy9tF1v+5XzNz iLeXy65oXyhoataNdnHqkxUndldwyHWxjddSB9DLi1dyh5HB9rwbAflnahZNXj2i8tXY BhHTp49/aDwbMjBt6nVOCkua4De0eLeFNcHEHv/AGL4xL9z3acjrx0BqHnXUTWXvoXEL G6gQ== X-Gm-Message-State: AG10YOQmjZVdIOLOolMu1SdCqYybtZf/BeV84CSySveU9uoW0gwetVhRR6154+wp3H2Ew13cBs6XpdjcHVEYlg== MIME-Version: 1.0 X-Received: by 10.50.102.40 with SMTP id fl8mr12685562igb.85.1454631962026; Thu, 04 Feb 2016 16:26:02 -0800 (PST) Received: by 10.107.52.205 with HTTP; Thu, 4 Feb 2016 16:26:01 -0800 (PST) In-Reply-To: References: Date: Thu, 4 Feb 2016 22:26:01 -0200 Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success From: Victor Detoni To: Xiaoye Sun Cc: Luigi Rizzo , Pavel Odintsov , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2016 00:26:03 -0000 I'm sorry, I made mistake. To workaround this try `ip link set $IFACE promisc on` On Thu, Feb 4, 2016 at 10:04 PM, Xiaoye Sun wrote: > Yes. all the interfaces are up. Are you able to get ARP request when the > interfaces are down? > > > On Thursday, February 4, 2016, Victor Detoni > wrote: > >> Both interfaces are up? Like ifconfig... up >> >> I had this the same problem and I solve with commands above >> >> Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun >> escreveu: >> >>> Hi Luigi, >>> >>> Thanks for your explanation. >>> >>> I used three machines to do this experiment. They are directly connected. >>> >>> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)]. >>> >>> First, I tried to run bridge.c on machine2 using the command *bridge -i >>> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on >>> machine 1or3) >>> >>> For my understanding, in this setup, machine2 will be transparent to >>> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa >>> without any modification to the packets. >>> >>> I tried to ping machine 3 from machine 1 using the command like *ping >>> 10.11.10.3*. However, it still does not success. >>> This is because that before machine1 sends ping message to machine3, it >>> will first send a ARP request message to get the mac address of machine3. >>> machine3 gets that ARP request, and send the reply back (I use tcpdump to >>> verify that machine3 gets the ARP request and send out the ARP reply). >>> However, machine1 does not get the ARP reply. >>> >>> I checked that the bridge can only forwarding packet in one direction at >>> the same time. it gets the ARP request but doesn't see the ARP reply >>> (*pkt_queued* always returns 0 for one nic...). >>> >>> This behavior looks very weird to me. Do you think there is a >>> compatibility >>> issues between netmap and the os I am using? Is there a verified linux >>> distribution (also the version) that perfectly works well with netmap? >>> >>> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) >>> x86_64 GNU/Linux. >>> Linux kernel version is *3.16.0-4-amd64* >>> >>> >>> Thanks! >>> Xiaoye >>> >>> >>> >>> >>> >>> >>> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo wrote: >>> >>> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun >>> wrote: >>> > > >>> > > >>> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo >>> wrote: >>> > >> >>> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun >>> wrote: >>> > >> > Hi Luigi, >>> > >> > >>> > >> > I have to clarify about the *jumping issue* about the slot >>> indexes. >>> > >> > In the bridge.c program, the slot index never jumps and it >>> increases >>> > >> > sequentially. >>> > >> > In the receiver.c program, the udp packet seq jumps and I showed >>> the >>> > >> > slot >>> > >> > index that each udp packet uses. So the slot index jumps together >>> with >>> > >> > the >>> > >> > udp seq (at the receiver program only). >>> > >> >>> > >> So let me understand, is the "slot" some information written >>> > >> in the packet by bridge.c (referring to the rx or tx slot, >>> > >> I am not sure) and then read and printed by receiver.c >>> > >> (which gets the packet through recvfrom so there isn't >>> > >> really any slot index) ? >>> > >> >>> > > It works in the other way: >>> > > The bridge.c checks the seq numbers of the udp packets in netmap >>> slots >>> > (in >>> > > nic rx ring) before the swap; then it records the seq number, slot >>> > > number(both rx and tx (tx indexes were not shown in the previous >>> email >>> > since >>> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does >>> not >>> > > change anything in the buffer and it knows the slot and buf_idx that >>> a >>> > > packet uses. Please refer to the added code in *process_rings* >>> function >>> > > http://www.owlnet.rice.edu/~xs6/bridge.c >>> > > The receiver.c checks the seq numbers only and print out the seq >>> numbers >>> > it >>> > > receive sequentially. >>> > > With these information, I manually match the seq number I got from >>> > > receiver.c and the seq number I got from bridge.c. So we know what >>> is the >>> > > seq order the receiver sees and which slot a packet uses when >>> bridge.c >>> > swaps >>> > > the buf_idxs. >>> > > >>> > >> Do you see any ordering inversion when the receiver >>> > >> gets packets through the NETMAP API (e.g. using bridge.c >>> > >> instead of receiver.c) ? >>> > >> >>> > > There is no ordering inversion seen by bridge.c (As I said in the >>> > previous >>> > > paragraph, the bridge.c checks the seq number and I did not see any >>> order >>> > > inversion in THIS simple experiment (In my multicast protocol >>> (mentioned >>> > in >>> > > the first email), there is ordering inversion. But let us solve the >>> > simple >>> > > bridge.c's problem first. I think they are two relatively independent >>> > > issues.)). >>> > >>> > Sorry there was a misunderstanding. >>> > I wanted you to check the following setup: >>> > >>> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ] >>> > >>> > where in XYZ you replace your receiver.c with some >>> > netmap-based receiver (it could be pkt-gen in rx mode, >>> > or possibly even another instance of bridge.c where >>> > you connect the output port to a vale switch so >>> > traffic is dropped), and then in XYZ print the content >>> > of the packets. >>> > >>> > From your previous report we know that node 2: sees packets >>> > in order, and node 3: sees packets out of order. >>> > However, if the problem were due to bridge.c sending >>> > the old buffer and not the new one, you'd see not only >>> > reordering but also replication of packets. >>> > >>> > The fact that you see only the reordering in 3: makes >>> > me think that the problem is in that node, and it could >>> > be the network stack in 3: that does something strange. >>> > So if you can run something netmap based in 3: and make >>> > sure there is only one queue to read from, we could >>> > at least figure out what is going on. >>> > >>> > cheers >>> > luigi >>> > >>> > >>> > is that >>> > > >>> > >> >>> > >> Are you using native netmap drivers or the emulated mode ? >>> > >> You can check that by playing with the "admode" sysctl entry >>> > >> (or sysfs on linux) - try setting to 1 and 2 and see if >>> > >> the behaviour changes. >>> > >> >>> > >> dev.netmap.admode: 0 >>> > >> Controls the use of native or emulated adapter mode. >>> > >> 0 uses the best available option, >>> > >> 1 forces native and fails if not available, >>> > >> 2 forces emulated hence never fails. >>> > >> >>> > > I was using admode 0. I changed the admode to 1 and 2 using the >>> command >>> > like >>> > > *echo 1 > /sys/module/netmap/parameters/admode* and restart the >>> bridge >>> > > program. The behavior keeps the same. >>> > > >>> > >> >>> > >> cheers >>> > >> luigi >>> > >> >>> > >> > >>> > >> > There is really one ring (tx and rx) for NIC and one ring (tx and >>> rx) >>> > >> > for >>> > >> > the host. >>> > >> > I also doubt that there might be multiple tx rings for the host. >>> It >>> > >> > seems >>> > >> > like that bridge program swap packet to multiple host rings and >>> the >>> > udp >>> > >> > recv >>> > >> > program drains packets from these rings. But this is not the case >>> > here. >>> > >> > >>> > >> > The bridge program prints a line like this >>> > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.* >>> > >> > this is printed by the following line the original program >>> > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, >>> > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, >>> > >> > pb->first_rx_ring, >>> > >> > pb->req.nr_rx_rings);* >>> > >> > >>> > >> > I think this shows that there is really one NIC ring and one HOST >>> > ring. >>> > >> > >>> > >> > Is there another way to verify the number of ring that netmap has? >>> > >> > >>> > >> > Thanks! >>> > >> > Xiaoye >>> > >> > >>> > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo >>> > wrote: >>> > >> >> >>> > >> >> Hi, >>> > >> >> there must be some wrong with your setting because >>> > >> >> slot indexes must be sequential and in your case they >>> > >> >> are not (see the jump from 295 to 474 and then >>> > >> >> back from 485 to 296, and the numerous interleavings >>> > >> >> that you are seeing later). >>> > >> >> >>> > >> >> I have no idea of the cause but typically this pattern >>> > >> >> is what you see when there are multiple input rings and >>> > >> >> not just one. >>> > >> >> >>> > >> >> Cheers >>> > >> >> Luigi >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun >> > >>> > >> >> wrote: >>> > >> >> > Hi Luigi, >>> > >> >> > >>> > >> >> > Thanks for the detailed advice. >>> > >> >> > >>> > >> >> > With more detailed experiments, actually I found that the udp >>> > >> >> > sender/receiver packet reorder issue *might* be irrelevant to >>> the >>> > >> >> > original >>> > >> >> > issue I posted. However, I think we should solve the udp >>> > >> >> > sender/receiver >>> > >> >> > issue first. >>> > >> >> > I run the experiment with more detailed log. Here is my >>> findings. >>> > >> >> > >>> > >> >> > 1. I am running a netmap version available since about Oct 13rd >>> > from >>> > >> >> > github >>> > >> >> > (https://github.com/luigirizzo/netmap). So I think this is >>> not the >>> > >> >> > one >>> > >> >> > related to the buffer allocation issue. I tried to running the >>> > newest >>> > >> >> > version, however, that version causes problem when I exit the >>> > bridge >>> > >> >> > program >>> > >> >> > (something like kernel error which make the os crash). >>> > >> >> > >>> > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get >>> more >>> > >> >> > information (more detailed log). >>> > >> >> > The reorder happens multiple times (about 10 times) within a >>> > second. >>> > >> >> > Here is >>> > >> >> > one example trace collected from the above two programs. >>> > (remembering >>> > >> >> > that >>> > >> >> > we have udp sender running on one machine; netmap bridge and >>> udp >>> > >> >> > receiver >>> > >> >> > are running on another machine). >>> > >> >> > There is only one pair of rings each with 512 slots (511 slot >>> > usable) >>> > >> >> > on >>> > >> >> > the >>> > >> >> > receiver machine. >>> > >> >> > >>> > >> >> > =================== packet trace collected from receiver.c >>> > >> >> > =================== >>> > >> >> > ===== together with the slot and buf_idx of the corresponding >>> > netmap >>> > >> >> > ring >>> > >> >> > slots ====== >>> > >> >> > [seq] [slot] [buf_idx] >>> > >> >> > 8208 294 1833 >>> > >> >> > 8209 295 1834 >>> > >> >> > 8388 474 2013 >>> > >> >> > ... (packet received in order) >>> > >> >> > 8398 484 2023 >>> > >> >> > 8399 485 2024 >>> > >> >> > 8210 296 1835 >>> > >> >> > 8211 297 1836 >>> > >> >> > ... (packet received in order) >>> > >> >> > ... >>> > >> >> > 8222 308 1847 >>> > >> >> > 8400 486 2025 >>> > >> >> > 8223 309 1848 >>> > >> >> > 8401 487 2026 >>> > >> >> > 8224 310 1849 >>> > >> >> > 8402 488 2027 >>> > >> >> > 8225 311 1850 >>> > >> >> > 8403 489 2028 >>> > >> >> > 8226 312 1851 >>> > >> >> > 8404 450 2029 >>> > >> >> > 8227 313 1852 >>> > >> >> > 8228 314 1853 >>> > >> >> > >>> =================================================================== >>> > >> >> > As we can see that the udp receiver got packet 8210 after it >>> got >>> > >> >> > 8399, >>> > >> >> > which >>> > >> >> > is the first reorder. Then, the receiver got 8211 to 8222 >>> > >> >> > sequentially. >>> > >> >> > Then >>> > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved. >>> > >> >> > >>> > >> >> > >>> > >> >> > ==================== event order seen by netmap bridge >>> > >> >> > ================== >>> > >> >> > get 8209 >>> > >> >> > poll called >>> > >> >> > get 8210 >>> > >> >> > ... >>> > >> >> > ... >>> > >> >> > get 8228 >>> > >> >> > poll called >>> > >> >> > get 8229 >>> > >> >> > ... >>> > >> >> > ... >>> > >> >> > get 8383 >>> > >> >> > poll called >>> > >> >> > get 8384 >>> > >> >> > ... >>> > >> >> > get 8387 >>> > >> >> > poll called >>> > >> >> > get 8388 >>> > >> >> > ... >>> > >> >> > get 8393 >>> > >> >> > poll called >>> > >> >> > get 8394 >>> > >> >> > ... >>> > >> >> > get 8399 >>> > >> >> > poll called >>> > >> >> > get 8400 >>> > >> >> > ... >>> > >> >> > get 8404 >>> > >> >> > poll called >>> > >> >> > get 8405 >>> > >> >> > >>> =================================================================== >>> > >> >> > As we can see, from the event ordering see by the bridge.c, >>> all the >>> > >> >> > packets >>> > >> >> > are receiver in order, which means the the reorder happens >>> when the >>> > >> >> > bridge >>> > >> >> > code swap the buf_idx between the nic ring(slot) and the host >>> > >> >> > ring(slot). >>> > >> >> > The reordered seq usually right before or after the poll >>> function >>> > >> >> > call. >>> > >> >> > >>> > >> >> > Best, >>> > >> >> > Xiaoye >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > >>> > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo < >>> rizzo@iet.unipi.it> >>> > >> >> > wrote: >>> > >> >> >> >>> > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun < >>> Xiaoye.Sun@rice.edu> >>> > >> >> >> wrote: >>> > >> >> >> > Hi Luigi, >>> > >> >> >> > >>> > >> >> >> > Thanks for your advice. >>> > >> >> >> > I forgot to mention that I use the command "ethtool -L eth1 >>> > >> >> >> > combined >>> > >> >> >> > 1" >>> > >> >> >> > to >>> > >> >> >> > set the number of rings of the nic to 1. The host also >>> only has >>> > >> >> >> > one >>> > >> >> >> > ring. >>> > >> >> >> > I understand the situation where the first tx ring is full >>> so >>> > the >>> > >> >> >> > bridge >>> > >> >> >> > will swap the packets to the second tx ring and then the >>> > host/nic >>> > >> >> >> > might >>> > >> >> >> > drain either rings. But this is not the case in the >>> experiment. >>> > >> >> >> >>> > >> >> >> ok good to know that. >>> > >> >> >> >>> > >> >> >> So if we have ruled out multiqueue and iommu, let's look at >>> > >> >> >> the internal allocator and at bridge.c >>> > >> >> >> >>> > >> >> >> 1. are you running the most recent version of netmap ? >>> > >> >> >> Some older version (probably 1-2 years ago) had a bug >>> > >> >> >> in the buffer allocator and some buffers were allocated >>> > >> >> >> twice. >>> > >> >> >> >>> > >> >> >> 2. can you tweak your receiver.c to report some more info >>> > >> >> >> on how often you get out of sequence packets, how much >>> > >> >> >> out of sequence they are ? >>> > >> >> >> Also it would be useful to report gaps on the increasing >>> side >>> > >> >> >> (i.e. new_seq != old_seq +1 ) >>> > >> >> >> >>> > >> >> >> 3. can you tweak bridge.c so that it writes into the packet >>> > >> >> >> the netmap buffer indexes and slots on the rx and tx side, >>> > >> >> >> so when you detect a sequence error we can figure out >>> > >> >> >> where it is happening. >>> > >> >> >> Ideally you could also add the sequence number detection >>> > >> >> >> code in bridge.c so we can check whether the errors appear >>> > >> >> >> on the input or output sides. >>> > >> >> >> >>> > >> >> >> cheers >>> > >> >> >> luigi >>> > >> >> >> >>> > >> >> > >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> -- >>> > >> >> >>> > >> >> >>> > >>> -----------------------------------------+------------------------------- >>> > >> >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >>> > >> >> dell'Informazione >>> > >> >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>> > >> >> TEL +39-050-2217533 . via Diotisalvi 2 >>> > >> >> Mobile +39-338-6809875 . 56122 PISA (Italy) >>> > >> >> >>> > >> >> >>> > >>> -----------------------------------------+------------------------------- >>> > >> >> >>> > >> > >>> > >> >>> > >> >>> > >> >>> > >> -- >>> > >> >>> > >>> -----------------------------------------+------------------------------- >>> > >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >>> > dell'Informazione >>> > >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>> > >> TEL +39-050-2217533 . via Diotisalvi 2 >>> > >> Mobile +39-338-6809875 . 56122 PISA (Italy) >>> > >> >>> > >>> -----------------------------------------+------------------------------- >>> > >> >>> > > >>> > >>> > >>> > >>> > -- >>> > >>> -----------------------------------------+------------------------------- >>> > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >>> dell'Informazione >>> > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>> > TEL +39-050-2217533 . via Diotisalvi 2 >>> > Mobile +39-338-6809875 . 56122 PISA (Italy) >>> > >>> -----------------------------------------+------------------------------- >>> > >>> > >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >>> >>