From owner-freebsd-net@freebsd.org Thu Feb 4 23:31:37 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4665BA9DFE9 for ; Thu, 4 Feb 2016 23:31:37 +0000 (UTC) (envelope-from victordetoni@gmail.com) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 094E9B4F for ; Thu, 4 Feb 2016 23:31:37 +0000 (UTC) (envelope-from victordetoni@gmail.com) Received: by mail-ig0-x22b.google.com with SMTP id 5so2767444igt.0 for ; Thu, 04 Feb 2016 15:31:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=L/DKyOzasvjR7Igk+MHP0SRcL4/dO/lM/1tcw1TDH4s=; b=VdtvwkvkFVzcAu2uqCrPlTX//2rdyWSveL2zafgIYqTignUdvf2usunhbz0ZJy/qfJ omfYGF55kY5Y+AgLzgO/0SsSQkVR1kYDk9Fj3A/QJjt9dutghRIxz0xxc+bxlaM5ciys QQiyvf+PeBBEFlPqsK86Zl+gCj9yCE58jz7f61BZdbkYRJOoklmTeC1SebkMkSShKHmA sLnXUmkb535P7e89Jovc/S52mlP9BHUAoeN0/hW5SQJDNc+xi9bCqisjE8Qvn5eE2M7I 9S1twz1UyqSdn57dv3Rcmm6YK5gmKiTNqHp+UwKS2qkEPv2Ooq9vJIZ4wWUQ2+HLAjFc yL8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=L/DKyOzasvjR7Igk+MHP0SRcL4/dO/lM/1tcw1TDH4s=; b=iWGySZcClF0dH1o/upcYiu+euY0k0mIcWlheCbVF+u2/u+/cHpU9JiUjuk49hkyc5X MEWeHIYIj79cCRgzh7wB/zaQVLEt8MEdSr0gZTLZVz6Z9oPMf7pmI1Sl0yQYKVSZS4ol n+itkg9IgVB4ZKAEOY+oVKRIGDe2WO2RGHP326vBemSCPIf0hgkzE2uSReSifLm1F38i ICrJFdob64VhWZrj4YO4pYCWYmU/LoyBYJVCOdEbNkJH48b+8s1b2/pALoyOz1awNAfz c5JYKg81VEgQY8Pw0prsV4gk2MBAhnMpDXs4A3aInmBJKfqkqUDnzG0TpjTD359TVT8M hDDQ== X-Gm-Message-State: AG10YOSxUYPBlPYoxkmACTwVQnF7GBXVyqrqlGHa/V6feb/jkCNq88x2nCzL+MqQeY1XQufksXFjWi/YxBby4g== MIME-Version: 1.0 X-Received: by 10.50.102.40 with SMTP id fl8mr12463237igb.85.1454628696387; Thu, 04 Feb 2016 15:31:36 -0800 (PST) Received: by 10.107.52.205 with HTTP; Thu, 4 Feb 2016 15:31:36 -0800 (PST) In-Reply-To: References: Date: Thu, 4 Feb 2016 21:31:36 -0200 Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success From: Victor Detoni To: Xiaoye Sun Cc: Luigi Rizzo , Pavel Odintsov , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Feb 2016 23:31:37 -0000 Both interfaces are up? Like ifconfig... up I had this the same problem and I solve with commands above Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun escreveu: > Hi Luigi, > > Thanks for your explanation. > > I used three machines to do this experiment. They are directly connected. > > [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)]. > > First, I tried to run bridge.c on machine2 using the command *bridge -i > netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on > machine 1or3) > > For my understanding, in this setup, machine2 will be transparent to > machine1&3 since it forwards packet from its eth2 to eth3 and vice versa > without any modification to the packets. > > I tried to ping machine 3 from machine 1 using the command like *ping > 10.11.10.3*. However, it still does not success. > This is because that before machine1 sends ping message to machine3, it > will first send a ARP request message to get the mac address of machine3. > machine3 gets that ARP request, and send the reply back (I use tcpdump to > verify that machine3 gets the ARP request and send out the ARP reply). > However, machine1 does not get the ARP reply. > > I checked that the bridge can only forwarding packet in one direction at > the same time. it gets the ARP request but doesn't see the ARP reply > (*pkt_queued* always returns 0 for one nic...). > > This behavior looks very weird to me. Do you think there is a compatibility > issues between netmap and the os I am using? Is there a verified linux > distribution (also the version) that perfectly works well with netmap? > > The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) > x86_64 GNU/Linux. > Linux kernel version is *3.16.0-4-amd64* > > > Thanks! > Xiaoye > > > > > > > On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo > wrote: > > > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun > wrote: > > > > > > > > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo > wrote: > > >> > > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun > wrote: > > >> > Hi Luigi, > > >> > > > >> > I have to clarify about the *jumping issue* about the slot indexes. > > >> > In the bridge.c program, the slot index never jumps and it increases > > >> > sequentially. > > >> > In the receiver.c program, the udp packet seq jumps and I showed the > > >> > slot > > >> > index that each udp packet uses. So the slot index jumps together > with > > >> > the > > >> > udp seq (at the receiver program only). > > >> > > >> So let me understand, is the "slot" some information written > > >> in the packet by bridge.c (referring to the rx or tx slot, > > >> I am not sure) and then read and printed by receiver.c > > >> (which gets the packet through recvfrom so there isn't > > >> really any slot index) ? > > >> > > > It works in the other way: > > > The bridge.c checks the seq numbers of the udp packets in netmap slots > > (in > > > nic rx ring) before the swap; then it records the seq number, slot > > > number(both rx and tx (tx indexes were not shown in the previous email > > since > > > they all look correct)) and buf_idx (rx and tx). The bridge.c does not > > > change anything in the buffer and it knows the slot and buf_idx that a > > > packet uses. Please refer to the added code in *process_rings* function > > > http://www.owlnet.rice.edu/~xs6/bridge.c > > > The receiver.c checks the seq numbers only and print out the seq > numbers > > it > > > receive sequentially. > > > With these information, I manually match the seq number I got from > > > receiver.c and the seq number I got from bridge.c. So we know what is > the > > > seq order the receiver sees and which slot a packet uses when bridge.c > > swaps > > > the buf_idxs. > > > > > >> Do you see any ordering inversion when the receiver > > >> gets packets through the NETMAP API (e.g. using bridge.c > > >> instead of receiver.c) ? > > >> > > > There is no ordering inversion seen by bridge.c (As I said in the > > previous > > > paragraph, the bridge.c checks the seq number and I did not see any > order > > > inversion in THIS simple experiment (In my multicast protocol > (mentioned > > in > > > the first email), there is ordering inversion. But let us solve the > > simple > > > bridge.c's problem first. I think they are two relatively independent > > > issues.)). > > > > Sorry there was a misunderstanding. > > I wanted you to check the following setup: > > > > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ] > > > > where in XYZ you replace your receiver.c with some > > netmap-based receiver (it could be pkt-gen in rx mode, > > or possibly even another instance of bridge.c where > > you connect the output port to a vale switch so > > traffic is dropped), and then in XYZ print the content > > of the packets. > > > > From your previous report we know that node 2: sees packets > > in order, and node 3: sees packets out of order. > > However, if the problem were due to bridge.c sending > > the old buffer and not the new one, you'd see not only > > reordering but also replication of packets. > > > > The fact that you see only the reordering in 3: makes > > me think that the problem is in that node, and it could > > be the network stack in 3: that does something strange. > > So if you can run something netmap based in 3: and make > > sure there is only one queue to read from, we could > > at least figure out what is going on. > > > > cheers > > luigi > > > > > > is that > > > > > >> > > >> Are you using native netmap drivers or the emulated mode ? > > >> You can check that by playing with the "admode" sysctl entry > > >> (or sysfs on linux) - try setting to 1 and 2 and see if > > >> the behaviour changes. > > >> > > >> dev.netmap.admode: 0 > > >> Controls the use of native or emulated adapter mode. > > >> 0 uses the best available option, > > >> 1 forces native and fails if not available, > > >> 2 forces emulated hence never fails. > > >> > > > I was using admode 0. I changed the admode to 1 and 2 using the command > > like > > > *echo 1 > /sys/module/netmap/parameters/admode* and restart the bridge > > > program. The behavior keeps the same. > > > > > >> > > >> cheers > > >> luigi > > >> > > >> > > > >> > There is really one ring (tx and rx) for NIC and one ring (tx and > rx) > > >> > for > > >> > the host. > > >> > I also doubt that there might be multiple tx rings for the host. It > > >> > seems > > >> > like that bridge program swap packet to multiple host rings and the > > udp > > >> > recv > > >> > program drains packets from these rings. But this is not the case > > here. > > >> > > > >> > The bridge program prints a line like this > > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.* > > >> > this is printed by the following line the original program > > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, > > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, > > >> > pb->first_rx_ring, > > >> > pb->req.nr_rx_rings);* > > >> > > > >> > I think this shows that there is really one NIC ring and one HOST > > ring. > > >> > > > >> > Is there another way to verify the number of ring that netmap has? > > >> > > > >> > Thanks! > > >> > Xiaoye > > >> > > > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo > > > wrote: > > >> >> > > >> >> Hi, > > >> >> there must be some wrong with your setting because > > >> >> slot indexes must be sequential and in your case they > > >> >> are not (see the jump from 295 to 474 and then > > >> >> back from 485 to 296, and the numerous interleavings > > >> >> that you are seeing later). > > >> >> > > >> >> I have no idea of the cause but typically this pattern > > >> >> is what you see when there are multiple input rings and > > >> >> not just one. > > >> >> > > >> >> Cheers > > >> >> Luigi > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun > > > >> >> wrote: > > >> >> > Hi Luigi, > > >> >> > > > >> >> > Thanks for the detailed advice. > > >> >> > > > >> >> > With more detailed experiments, actually I found that the udp > > >> >> > sender/receiver packet reorder issue *might* be irrelevant to the > > >> >> > original > > >> >> > issue I posted. However, I think we should solve the udp > > >> >> > sender/receiver > > >> >> > issue first. > > >> >> > I run the experiment with more detailed log. Here is my findings. > > >> >> > > > >> >> > 1. I am running a netmap version available since about Oct 13rd > > from > > >> >> > github > > >> >> > (https://github.com/luigirizzo/netmap). So I think this is not > the > > >> >> > one > > >> >> > related to the buffer allocation issue. I tried to running the > > newest > > >> >> > version, however, that version causes problem when I exit the > > bridge > > >> >> > program > > >> >> > (something like kernel error which make the os crash). > > >> >> > > > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get more > > >> >> > information (more detailed log). > > >> >> > The reorder happens multiple times (about 10 times) within a > > second. > > >> >> > Here is > > >> >> > one example trace collected from the above two programs. > > (remembering > > >> >> > that > > >> >> > we have udp sender running on one machine; netmap bridge and udp > > >> >> > receiver > > >> >> > are running on another machine). > > >> >> > There is only one pair of rings each with 512 slots (511 slot > > usable) > > >> >> > on > > >> >> > the > > >> >> > receiver machine. > > >> >> > > > >> >> > =================== packet trace collected from receiver.c > > >> >> > =================== > > >> >> > ===== together with the slot and buf_idx of the corresponding > > netmap > > >> >> > ring > > >> >> > slots ====== > > >> >> > [seq] [slot] [buf_idx] > > >> >> > 8208 294 1833 > > >> >> > 8209 295 1834 > > >> >> > 8388 474 2013 > > >> >> > ... (packet received in order) > > >> >> > 8398 484 2023 > > >> >> > 8399 485 2024 > > >> >> > 8210 296 1835 > > >> >> > 8211 297 1836 > > >> >> > ... (packet received in order) > > >> >> > ... > > >> >> > 8222 308 1847 > > >> >> > 8400 486 2025 > > >> >> > 8223 309 1848 > > >> >> > 8401 487 2026 > > >> >> > 8224 310 1849 > > >> >> > 8402 488 2027 > > >> >> > 8225 311 1850 > > >> >> > 8403 489 2028 > > >> >> > 8226 312 1851 > > >> >> > 8404 450 2029 > > >> >> > 8227 313 1852 > > >> >> > 8228 314 1853 > > >> >> > > =================================================================== > > >> >> > As we can see that the udp receiver got packet 8210 after it got > > >> >> > 8399, > > >> >> > which > > >> >> > is the first reorder. Then, the receiver got 8211 to 8222 > > >> >> > sequentially. > > >> >> > Then > > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved. > > >> >> > > > >> >> > > > >> >> > ==================== event order seen by netmap bridge > > >> >> > ================== > > >> >> > get 8209 > > >> >> > poll called > > >> >> > get 8210 > > >> >> > ... > > >> >> > ... > > >> >> > get 8228 > > >> >> > poll called > > >> >> > get 8229 > > >> >> > ... > > >> >> > ... > > >> >> > get 8383 > > >> >> > poll called > > >> >> > get 8384 > > >> >> > ... > > >> >> > get 8387 > > >> >> > poll called > > >> >> > get 8388 > > >> >> > ... > > >> >> > get 8393 > > >> >> > poll called > > >> >> > get 8394 > > >> >> > ... > > >> >> > get 8399 > > >> >> > poll called > > >> >> > get 8400 > > >> >> > ... > > >> >> > get 8404 > > >> >> > poll called > > >> >> > get 8405 > > >> >> > > =================================================================== > > >> >> > As we can see, from the event ordering see by the bridge.c, all > the > > >> >> > packets > > >> >> > are receiver in order, which means the the reorder happens when > the > > >> >> > bridge > > >> >> > code swap the buf_idx between the nic ring(slot) and the host > > >> >> > ring(slot). > > >> >> > The reordered seq usually right before or after the poll function > > >> >> > call. > > >> >> > > > >> >> > Best, > > >> >> > Xiaoye > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo > > > >> >> > wrote: > > >> >> >> > > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun < > Xiaoye.Sun@rice.edu > > > >> >> >> wrote: > > >> >> >> > Hi Luigi, > > >> >> >> > > > >> >> >> > Thanks for your advice. > > >> >> >> > I forgot to mention that I use the command "ethtool -L eth1 > > >> >> >> > combined > > >> >> >> > 1" > > >> >> >> > to > > >> >> >> > set the number of rings of the nic to 1. The host also only > has > > >> >> >> > one > > >> >> >> > ring. > > >> >> >> > I understand the situation where the first tx ring is full so > > the > > >> >> >> > bridge > > >> >> >> > will swap the packets to the second tx ring and then the > > host/nic > > >> >> >> > might > > >> >> >> > drain either rings. But this is not the case in the > experiment. > > >> >> >> > > >> >> >> ok good to know that. > > >> >> >> > > >> >> >> So if we have ruled out multiqueue and iommu, let's look at > > >> >> >> the internal allocator and at bridge.c > > >> >> >> > > >> >> >> 1. are you running the most recent version of netmap ? > > >> >> >> Some older version (probably 1-2 years ago) had a bug > > >> >> >> in the buffer allocator and some buffers were allocated > > >> >> >> twice. > > >> >> >> > > >> >> >> 2. can you tweak your receiver.c to report some more info > > >> >> >> on how often you get out of sequence packets, how much > > >> >> >> out of sequence they are ? > > >> >> >> Also it would be useful to report gaps on the increasing side > > >> >> >> (i.e. new_seq != old_seq +1 ) > > >> >> >> > > >> >> >> 3. can you tweak bridge.c so that it writes into the packet > > >> >> >> the netmap buffer indexes and slots on the rx and tx side, > > >> >> >> so when you detect a sequence error we can figure out > > >> >> >> where it is happening. > > >> >> >> Ideally you could also add the sequence number detection > > >> >> >> code in bridge.c so we can check whether the errors appear > > >> >> >> on the input or output sides. > > >> >> >> > > >> >> >> cheers > > >> >> >> luigi > > >> >> >> > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> > > >> >> > > -----------------------------------------+------------------------------- > > >> >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di > Ing. > > >> >> dell'Informazione > > >> >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > > >> >> TEL +39-050-2217533 . via Diotisalvi 2 > > >> >> Mobile +39-338-6809875 . 56122 PISA (Italy) > > >> >> > > >> >> > > -----------------------------------------+------------------------------- > > >> >> > > >> > > > >> > > >> > > >> > > >> -- > > >> > > -----------------------------------------+------------------------------- > > >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. > > dell'Informazione > > >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > > >> TEL +39-050-2217533 . via Diotisalvi 2 > > >> Mobile +39-338-6809875 . 56122 PISA (Italy) > > >> > > -----------------------------------------+------------------------------- > > >> > > > > > > > > > > > -- > > -----------------------------------------+------------------------------- > > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. > dell'Informazione > > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > > TEL +39-050-2217533 . via Diotisalvi 2 > > Mobile +39-338-6809875 . 56122 PISA (Italy) > > -----------------------------------------+------------------------------- > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org > " >