Date: Thu, 4 Feb 2016 18:04:13 -0600 From: Xiaoye Sun <Xiaoye.Sun@rice.edu> To: Victor Detoni <victordetoni@gmail.com> Cc: Luigi Rizzo <rizzo@iet.unipi.it>, Pavel Odintsov <pavel.odintsov@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: swaping ring slots between NIC ring and Host ring does not always success Message-ID: <CAJnByzgjEEAzmWZu7BsSWHXmpjUtZcqXFGN8umCqmvgME1Jv%2BA@mail.gmail.com> In-Reply-To: <CANpwN=uHk-VwOoFz7NaPE9A-0B=MAapqxJ-uyCBtn=oMdacYnw@mail.gmail.com> References: <CAJnByzj6Dj3vouZ2NbxqvCV-2-7TVtTR4FaWKuCFaaRN2X%2ByAA@mail.gmail.com> <CALgsdbd3XuE3wMYp4ey%2B1aer%2BHSVNojLYoVqwqTBPAXXdf9i%2BQ@mail.gmail.com> <CAJnByzirLXdCe-kwHV2s_E6ytGJG0Dth=0Ms12RrEk7FK_%2B8Og@mail.gmail.com> <CA%2BhQ2%2BgMWY0eabjHGw0=PJCAkS-wO=RBrN5brSbaqWc3_AOYoQ@mail.gmail.com> <CAJnByziBS8o6LtmpUrUu5xtRUd008Z2hnCsp=WVFv35r2J0rHw@mail.gmail.com> <CA%2BhQ2%2Bim9nFfYnqDS2HgRbAzdf5D0iaLCmCYhfXQVVRMouUFuw@mail.gmail.com> <CAJnByzht-qfDcm8oEg1aSRyVBZ1ygPvc2eMuoyJcq4geueTZ0Q@mail.gmail.com> <CA%2BhQ2%2BiERgWJ=cdFB-cByfT3r11T1kKr-5HiuCYZY-rxbjf=XA@mail.gmail.com> <CAJnByziDzdR2C6DcSRNPtrWACLq0XFpe4X1Ek9yXtFP9ivqWQw@mail.gmail.com> <CA%2BhQ2%2BhjnuGo1xKgc8CQ7gP35tiaZG7%2BroZBmX8aBgb8qWnLgg@mail.gmail.com> <CAJnByzh-VrRZeYdpkRFtCUGEN_arFBkemcN7byb51XV6UPswyg@mail.gmail.com> <CA%2BhQ2%2BiMw3kxjpcZy77vgOEsfk2UY0-farh9C8RKXZHMU7D8kw@mail.gmail.com> <CAJnByzgsuNBhdfPJsGrrHcU79xjK%2Bdq2RENgUkbZcehFm8MUxg@mail.gmail.com> <CAJnByzgNZ9YsYd7tBgYxiQPvuS_VZbhZNGvsPS-0apCDga7XFA@mail.gmail.com> <CANpwN=uHk-VwOoFz7NaPE9A-0B=MAapqxJ-uyCBtn=oMdacYnw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Yes. all the interfaces are up. Are you able to get ARP request when the interfaces are down? On Thursday, February 4, 2016, Victor Detoni <victordetoni@gmail.com> wrote: > Both interfaces are up? Like ifconfig... up > > I had this the same problem and I solve with commands above > > Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun <Xiaoye.Sun@rice.edu > <javascript:_e(%7B%7D,'cvml','Xiaoye.Sun@rice.edu');>> escreveu: > >> Hi Luigi, >> >> Thanks for your explanation. >> >> I used three machines to do this experiment. They are directly connected. >> >> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)]. >> >> First, I tried to run bridge.c on machine2 using the command *bridge -i >> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on >> machine 1or3) >> >> For my understanding, in this setup, machine2 will be transparent to >> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa >> without any modification to the packets. >> >> I tried to ping machine 3 from machine 1 using the command like *ping >> 10.11.10.3*. However, it still does not success. >> This is because that before machine1 sends ping message to machine3, it >> will first send a ARP request message to get the mac address of machine3. >> machine3 gets that ARP request, and send the reply back (I use tcpdump to >> verify that machine3 gets the ARP request and send out the ARP reply). >> However, machine1 does not get the ARP reply. >> >> I checked that the bridge can only forwarding packet in one direction at >> the same time. it gets the ARP request but doesn't see the ARP reply >> (*pkt_queued* always returns 0 for one nic...). >> >> This behavior looks very weird to me. Do you think there is a >> compatibility >> issues between netmap and the os I am using? Is there a verified linux >> distribution (also the version) that perfectly works well with netmap? >> >> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) >> x86_64 GNU/Linux. >> Linux kernel version is *3.16.0-4-amd64* >> >> >> Thanks! >> Xiaoye >> >> >> >> >> >> >> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote: >> >> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu> >> wrote: >> > > >> > > >> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo <rizzo@iet.unipi.it> >> wrote: >> > >> >> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu> >> wrote: >> > >> > Hi Luigi, >> > >> > >> > >> > I have to clarify about the *jumping issue* about the slot indexes. >> > >> > In the bridge.c program, the slot index never jumps and it >> increases >> > >> > sequentially. >> > >> > In the receiver.c program, the udp packet seq jumps and I showed >> the >> > >> > slot >> > >> > index that each udp packet uses. So the slot index jumps together >> with >> > >> > the >> > >> > udp seq (at the receiver program only). >> > >> >> > >> So let me understand, is the "slot" some information written >> > >> in the packet by bridge.c (referring to the rx or tx slot, >> > >> I am not sure) and then read and printed by receiver.c >> > >> (which gets the packet through recvfrom so there isn't >> > >> really any slot index) ? >> > >> >> > > It works in the other way: >> > > The bridge.c checks the seq numbers of the udp packets in netmap slots >> > (in >> > > nic rx ring) before the swap; then it records the seq number, slot >> > > number(both rx and tx (tx indexes were not shown in the previous email >> > since >> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does not >> > > change anything in the buffer and it knows the slot and buf_idx that a >> > > packet uses. Please refer to the added code in *process_rings* >> function >> > > http://www.owlnet.rice.edu/~xs6/bridge.c >> > > The receiver.c checks the seq numbers only and print out the seq >> numbers >> > it >> > > receive sequentially. >> > > With these information, I manually match the seq number I got from >> > > receiver.c and the seq number I got from bridge.c. So we know what is >> the >> > > seq order the receiver sees and which slot a packet uses when bridge.c >> > swaps >> > > the buf_idxs. >> > > >> > >> Do you see any ordering inversion when the receiver >> > >> gets packets through the NETMAP API (e.g. using bridge.c >> > >> instead of receiver.c) ? >> > >> >> > > There is no ordering inversion seen by bridge.c (As I said in the >> > previous >> > > paragraph, the bridge.c checks the seq number and I did not see any >> order >> > > inversion in THIS simple experiment (In my multicast protocol >> (mentioned >> > in >> > > the first email), there is ordering inversion. But let us solve the >> > simple >> > > bridge.c's problem first. I think they are two relatively independent >> > > issues.)). >> > >> > Sorry there was a misunderstanding. >> > I wanted you to check the following setup: >> > >> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ] >> > >> > where in XYZ you replace your receiver.c with some >> > netmap-based receiver (it could be pkt-gen in rx mode, >> > or possibly even another instance of bridge.c where >> > you connect the output port to a vale switch so >> > traffic is dropped), and then in XYZ print the content >> > of the packets. >> > >> > From your previous report we know that node 2: sees packets >> > in order, and node 3: sees packets out of order. >> > However, if the problem were due to bridge.c sending >> > the old buffer and not the new one, you'd see not only >> > reordering but also replication of packets. >> > >> > The fact that you see only the reordering in 3: makes >> > me think that the problem is in that node, and it could >> > be the network stack in 3: that does something strange. >> > So if you can run something netmap based in 3: and make >> > sure there is only one queue to read from, we could >> > at least figure out what is going on. >> > >> > cheers >> > luigi >> > >> > >> > is that >> > > >> > >> >> > >> Are you using native netmap drivers or the emulated mode ? >> > >> You can check that by playing with the "admode" sysctl entry >> > >> (or sysfs on linux) - try setting to 1 and 2 and see if >> > >> the behaviour changes. >> > >> >> > >> dev.netmap.admode: 0 >> > >> Controls the use of native or emulated adapter mode. >> > >> 0 uses the best available option, >> > >> 1 forces native and fails if not available, >> > >> 2 forces emulated hence never fails. >> > >> >> > > I was using admode 0. I changed the admode to 1 and 2 using the >> command >> > like >> > > *echo 1 > /sys/module/netmap/parameters/admode* and restart the bridge >> > > program. The behavior keeps the same. >> > > >> > >> >> > >> cheers >> > >> luigi >> > >> >> > >> > >> > >> > There is really one ring (tx and rx) for NIC and one ring (tx and >> rx) >> > >> > for >> > >> > the host. >> > >> > I also doubt that there might be multiple tx rings for the host. It >> > >> > seems >> > >> > like that bridge program swap packet to multiple host rings and the >> > udp >> > >> > recv >> > >> > program drains packets from these rings. But this is not the case >> > here. >> > >> > >> > >> > The bridge program prints a line like this >> > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.* >> > >> > this is printed by the following line the original program >> > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, >> > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, >> > >> > pb->first_rx_ring, >> > >> > pb->req.nr_rx_rings);* >> > >> > >> > >> > I think this shows that there is really one NIC ring and one HOST >> > ring. >> > >> > >> > >> > Is there another way to verify the number of ring that netmap has? >> > >> > >> > >> > Thanks! >> > >> > Xiaoye >> > >> > >> > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo <rizzo@iet.unipi.it> >> > wrote: >> > >> >> >> > >> >> Hi, >> > >> >> there must be some wrong with your setting because >> > >> >> slot indexes must be sequential and in your case they >> > >> >> are not (see the jump from 295 to 474 and then >> > >> >> back from 485 to 296, and the numerous interleavings >> > >> >> that you are seeing later). >> > >> >> >> > >> >> I have no idea of the cause but typically this pattern >> > >> >> is what you see when there are multiple input rings and >> > >> >> not just one. >> > >> >> >> > >> >> Cheers >> > >> >> Luigi >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu> >> > >> >> wrote: >> > >> >> > Hi Luigi, >> > >> >> > >> > >> >> > Thanks for the detailed advice. >> > >> >> > >> > >> >> > With more detailed experiments, actually I found that the udp >> > >> >> > sender/receiver packet reorder issue *might* be irrelevant to >> the >> > >> >> > original >> > >> >> > issue I posted. However, I think we should solve the udp >> > >> >> > sender/receiver >> > >> >> > issue first. >> > >> >> > I run the experiment with more detailed log. Here is my >> findings. >> > >> >> > >> > >> >> > 1. I am running a netmap version available since about Oct 13rd >> > from >> > >> >> > github >> > >> >> > (https://github.com/luigirizzo/netmap). So I think this is not >> the >> > >> >> > one >> > >> >> > related to the buffer allocation issue. I tried to running the >> > newest >> > >> >> > version, however, that version causes problem when I exit the >> > bridge >> > >> >> > program >> > >> >> > (something like kernel error which make the os crash). >> > >> >> > >> > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get >> more >> > >> >> > information (more detailed log). >> > >> >> > The reorder happens multiple times (about 10 times) within a >> > second. >> > >> >> > Here is >> > >> >> > one example trace collected from the above two programs. >> > (remembering >> > >> >> > that >> > >> >> > we have udp sender running on one machine; netmap bridge and udp >> > >> >> > receiver >> > >> >> > are running on another machine). >> > >> >> > There is only one pair of rings each with 512 slots (511 slot >> > usable) >> > >> >> > on >> > >> >> > the >> > >> >> > receiver machine. >> > >> >> > >> > >> >> > =================== packet trace collected from receiver.c >> > >> >> > =================== >> > >> >> > ===== together with the slot and buf_idx of the corresponding >> > netmap >> > >> >> > ring >> > >> >> > slots ====== >> > >> >> > [seq] [slot] [buf_idx] >> > >> >> > 8208 294 1833 >> > >> >> > 8209 295 1834 >> > >> >> > 8388 474 2013 >> > >> >> > ... (packet received in order) >> > >> >> > 8398 484 2023 >> > >> >> > 8399 485 2024 >> > >> >> > 8210 296 1835 >> > >> >> > 8211 297 1836 >> > >> >> > ... (packet received in order) >> > >> >> > ... >> > >> >> > 8222 308 1847 >> > >> >> > 8400 486 2025 >> > >> >> > 8223 309 1848 >> > >> >> > 8401 487 2026 >> > >> >> > 8224 310 1849 >> > >> >> > 8402 488 2027 >> > >> >> > 8225 311 1850 >> > >> >> > 8403 489 2028 >> > >> >> > 8226 312 1851 >> > >> >> > 8404 450 2029 >> > >> >> > 8227 313 1852 >> > >> >> > 8228 314 1853 >> > >> >> > >> =================================================================== >> > >> >> > As we can see that the udp receiver got packet 8210 after it got >> > >> >> > 8399, >> > >> >> > which >> > >> >> > is the first reorder. Then, the receiver got 8211 to 8222 >> > >> >> > sequentially. >> > >> >> > Then >> > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved. >> > >> >> > >> > >> >> > >> > >> >> > ==================== event order seen by netmap bridge >> > >> >> > ================== >> > >> >> > get 8209 >> > >> >> > poll called >> > >> >> > get 8210 >> > >> >> > ... >> > >> >> > ... >> > >> >> > get 8228 >> > >> >> > poll called >> > >> >> > get 8229 >> > >> >> > ... >> > >> >> > ... >> > >> >> > get 8383 >> > >> >> > poll called >> > >> >> > get 8384 >> > >> >> > ... >> > >> >> > get 8387 >> > >> >> > poll called >> > >> >> > get 8388 >> > >> >> > ... >> > >> >> > get 8393 >> > >> >> > poll called >> > >> >> > get 8394 >> > >> >> > ... >> > >> >> > get 8399 >> > >> >> > poll called >> > >> >> > get 8400 >> > >> >> > ... >> > >> >> > get 8404 >> > >> >> > poll called >> > >> >> > get 8405 >> > >> >> > >> =================================================================== >> > >> >> > As we can see, from the event ordering see by the bridge.c, all >> the >> > >> >> > packets >> > >> >> > are receiver in order, which means the the reorder happens when >> the >> > >> >> > bridge >> > >> >> > code swap the buf_idx between the nic ring(slot) and the host >> > >> >> > ring(slot). >> > >> >> > The reordered seq usually right before or after the poll >> function >> > >> >> > call. >> > >> >> > >> > >> >> > Best, >> > >> >> > Xiaoye >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo < >> rizzo@iet.unipi.it> >> > >> >> > wrote: >> > >> >> >> >> > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun < >> Xiaoye.Sun@rice.edu> >> > >> >> >> wrote: >> > >> >> >> > Hi Luigi, >> > >> >> >> > >> > >> >> >> > Thanks for your advice. >> > >> >> >> > I forgot to mention that I use the command "ethtool -L eth1 >> > >> >> >> > combined >> > >> >> >> > 1" >> > >> >> >> > to >> > >> >> >> > set the number of rings of the nic to 1. The host also only >> has >> > >> >> >> > one >> > >> >> >> > ring. >> > >> >> >> > I understand the situation where the first tx ring is full so >> > the >> > >> >> >> > bridge >> > >> >> >> > will swap the packets to the second tx ring and then the >> > host/nic >> > >> >> >> > might >> > >> >> >> > drain either rings. But this is not the case in the >> experiment. >> > >> >> >> >> > >> >> >> ok good to know that. >> > >> >> >> >> > >> >> >> So if we have ruled out multiqueue and iommu, let's look at >> > >> >> >> the internal allocator and at bridge.c >> > >> >> >> >> > >> >> >> 1. are you running the most recent version of netmap ? >> > >> >> >> Some older version (probably 1-2 years ago) had a bug >> > >> >> >> in the buffer allocator and some buffers were allocated >> > >> >> >> twice. >> > >> >> >> >> > >> >> >> 2. can you tweak your receiver.c to report some more info >> > >> >> >> on how often you get out of sequence packets, how much >> > >> >> >> out of sequence they are ? >> > >> >> >> Also it would be useful to report gaps on the increasing >> side >> > >> >> >> (i.e. new_seq != old_seq +1 ) >> > >> >> >> >> > >> >> >> 3. can you tweak bridge.c so that it writes into the packet >> > >> >> >> the netmap buffer indexes and slots on the rx and tx side, >> > >> >> >> so when you detect a sequence error we can figure out >> > >> >> >> where it is happening. >> > >> >> >> Ideally you could also add the sequence number detection >> > >> >> >> code in bridge.c so we can check whether the errors appear >> > >> >> >> on the input or output sides. >> > >> >> >> >> > >> >> >> cheers >> > >> >> >> luigi >> > >> >> >> >> > >> >> > >> > >> >> >> > >> >> >> > >> >> >> > >> >> -- >> > >> >> >> > >> >> >> > >> -----------------------------------------+------------------------------- >> > >> >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> > >> >> dell'Informazione >> > >> >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > >> >> TEL +39-050-2217533 . via Diotisalvi 2 >> > >> >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> >> >> > >> >> >> > >> -----------------------------------------+------------------------------- >> > >> >> >> > >> > >> > >> >> > >> >> > >> >> > >> -- >> > >> >> > >> -----------------------------------------+------------------------------- >> > >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> > dell'Informazione >> > >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > >> TEL +39-050-2217533 . via Diotisalvi 2 >> > >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> >> > >> -----------------------------------------+------------------------------- >> > >> >> > > >> > >> > >> > >> > -- >> > >> -----------------------------------------+------------------------------- >> > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> dell'Informazione >> > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > TEL +39-050-2217533 . via Diotisalvi 2 >> > Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> -----------------------------------------+------------------------------- >> > >> > >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJnByzgjEEAzmWZu7BsSWHXmpjUtZcqXFGN8umCqmvgME1Jv%2BA>