Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Feb 2016 22:26:01 -0200
From:      Victor Detoni <victordetoni@gmail.com>
To:        Xiaoye Sun <Xiaoye.Sun@rice.edu>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, Pavel Odintsov <pavel.odintsov@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: swaping ring slots between NIC ring and Host ring does not always success
Message-ID:  <CANpwN=tfqitQW0BTXA7bU%2BTfmP8=wr7gE8wAP=hjAamjD7ny9Q@mail.gmail.com>
In-Reply-To: <CAJnByzgjEEAzmWZu7BsSWHXmpjUtZcqXFGN8umCqmvgME1Jv%2BA@mail.gmail.com>
References:  <CAJnByzj6Dj3vouZ2NbxqvCV-2-7TVtTR4FaWKuCFaaRN2X%2ByAA@mail.gmail.com> <CALgsdbd3XuE3wMYp4ey%2B1aer%2BHSVNojLYoVqwqTBPAXXdf9i%2BQ@mail.gmail.com> <CAJnByzirLXdCe-kwHV2s_E6ytGJG0Dth=0Ms12RrEk7FK_%2B8Og@mail.gmail.com> <CA%2BhQ2%2BgMWY0eabjHGw0=PJCAkS-wO=RBrN5brSbaqWc3_AOYoQ@mail.gmail.com> <CAJnByziBS8o6LtmpUrUu5xtRUd008Z2hnCsp=WVFv35r2J0rHw@mail.gmail.com> <CA%2BhQ2%2Bim9nFfYnqDS2HgRbAzdf5D0iaLCmCYhfXQVVRMouUFuw@mail.gmail.com> <CAJnByzht-qfDcm8oEg1aSRyVBZ1ygPvc2eMuoyJcq4geueTZ0Q@mail.gmail.com> <CA%2BhQ2%2BiERgWJ=cdFB-cByfT3r11T1kKr-5HiuCYZY-rxbjf=XA@mail.gmail.com> <CAJnByziDzdR2C6DcSRNPtrWACLq0XFpe4X1Ek9yXtFP9ivqWQw@mail.gmail.com> <CA%2BhQ2%2BhjnuGo1xKgc8CQ7gP35tiaZG7%2BroZBmX8aBgb8qWnLgg@mail.gmail.com> <CAJnByzh-VrRZeYdpkRFtCUGEN_arFBkemcN7byb51XV6UPswyg@mail.gmail.com> <CA%2BhQ2%2BiMw3kxjpcZy77vgOEsfk2UY0-farh9C8RKXZHMU7D8kw@mail.gmail.com> <CAJnByzgsuNBhdfPJsGrrHcU79xjK%2Bdq2RENgUkbZcehFm8MUxg@mail.gmail.com> <CAJnByzgNZ9YsYd7tBgYxiQPvuS_VZbhZNGvsPS-0apCDga7XFA@mail.gmail.com> <CANpwN=uHk-VwOoFz7NaPE9A-0B=MAapqxJ-uyCBtn=oMdacYnw@mail.gmail.com> <CAJnByzgjEEAzmWZu7BsSWHXmpjUtZcqXFGN8umCqmvgME1Jv%2BA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm sorry, I made mistake. To workaround this try `ip link set $IFACE
promisc on`



On Thu, Feb 4, 2016 at 10:04 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu> wrote:

> Yes. all the interfaces are up. Are you able to get ARP request when the
> interfaces are down?
>
>
> On Thursday, February 4, 2016, Victor Detoni <victordetoni@gmail.com>
> wrote:
>
>> Both interfaces are up? Like ifconfig... up
>>
>> I had this the same problem and I solve with commands above
>>
>> Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun <Xiaoye.Sun@rice.edu>
>> escreveu:
>>
>>> Hi Luigi,
>>>
>>> Thanks for your explanation.
>>>
>>> I used three machines to do this experiment. They are directly connected.
>>>
>>> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].
>>>
>>> First, I tried to run bridge.c on machine2 using the command *bridge -i
>>> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
>>> machine 1or3)
>>>
>>> For my understanding, in this setup, machine2 will be transparent to
>>> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
>>> without any modification to the packets.
>>>
>>> I tried to ping machine 3 from machine 1 using the command like *ping
>>> 10.11.10.3*. However, it still does not success.
>>> This is because that before machine1 sends ping message to machine3, it
>>> will first send a ARP request message to get the mac address of machine3.
>>> machine3 gets that ARP request, and send the reply back (I use tcpdump to
>>> verify that machine3 gets the ARP request and send out the ARP reply).
>>> However, machine1 does not get the ARP reply.
>>>
>>> I checked that the bridge can only forwarding packet in one direction at
>>> the same time. it gets the ARP request but doesn't see the ARP reply
>>> (*pkt_queued* always returns 0 for one nic...).
>>>
>>> This behavior looks very weird to me. Do you think there is a
>>> compatibility
>>> issues between netmap and the os I am using? Is there a verified linux
>>> distribution (also the version) that perfectly works well with netmap?
>>>
>>> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
>>> x86_64 GNU/Linux.
>>> Linux kernel version is *3.16.0-4-amd64*
>>>
>>>
>>> Thanks!
>>> Xiaoye
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:
>>>
>>> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu>
>>> wrote:
>>> > >
>>> > >
>>> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo <rizzo@iet.unipi.it>
>>> wrote:
>>> > >>
>>> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu>
>>> wrote:
>>> > >> > Hi Luigi,
>>> > >> >
>>> > >> > I have to clarify about the *jumping issue* about the slot
>>> indexes.
>>> > >> > In the bridge.c program, the slot index never jumps and it
>>> increases
>>> > >> > sequentially.
>>> > >> > In the receiver.c program, the udp packet seq jumps and I showed
>>> the
>>> > >> > slot
>>> > >> > index that each udp packet uses. So the slot index jumps together
>>> with
>>> > >> > the
>>> > >> > udp seq (at the receiver program only).
>>> > >>
>>> > >> So let me understand, is the "slot" some information written
>>> > >> in the packet by bridge.c (referring to the rx or tx slot,
>>> > >> I am not sure) and then read and printed by receiver.c
>>> > >> (which gets the packet through recvfrom so there isn't
>>> > >> really any slot index) ?
>>> > >>
>>> > > It works in the other way:
>>> > > The bridge.c checks the seq numbers of the udp packets in netmap
>>> slots
>>> > (in
>>> > > nic rx ring) before the swap; then it records the seq number, slot
>>> > > number(both rx and tx (tx indexes were not shown in the previous
>>> email
>>> > since
>>> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does
>>> not
>>> > > change anything in the buffer and it knows the slot and buf_idx that
>>> a
>>> > > packet uses. Please refer to the added code in *process_rings*
>>> function
>>> > > http://www.owlnet.rice.edu/~xs6/bridge.c
>>> > > The receiver.c checks the seq numbers only and print out the seq
>>> numbers
>>> > it
>>> > > receive sequentially.
>>> > > With these information, I manually match the seq number I got from
>>> > > receiver.c and the seq number I got from bridge.c. So we know what
>>> is the
>>> > > seq order the receiver sees and which slot a packet uses when
>>> bridge.c
>>> > swaps
>>> > > the buf_idxs.
>>> > >
>>> > >> Do you see any ordering inversion when the receiver
>>> > >> gets packets through the NETMAP API (e.g. using bridge.c
>>> > >> instead of receiver.c) ?
>>> > >>
>>> > > There is no ordering inversion seen by bridge.c (As I said in the
>>> > previous
>>> > > paragraph, the bridge.c checks the seq number and I did not see any
>>> order
>>> > > inversion in THIS simple experiment (In my multicast protocol
>>> (mentioned
>>> > in
>>> > > the first email), there is ordering inversion. But let us solve the
>>> > simple
>>> > > bridge.c's problem first. I think they are two relatively independent
>>> > > issues.)).
>>> >
>>> > Sorry there was a misunderstanding.
>>> > I wanted you to check the following setup:
>>> >
>>> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ]
>>> >
>>> > where in XYZ you replace your receiver.c with some
>>> > netmap-based receiver (it could be pkt-gen in rx mode,
>>> > or possibly even another instance of bridge.c where
>>> > you connect the output port to a vale switch so
>>> > traffic is dropped), and then in XYZ print the content
>>> > of the packets.
>>> >
>>> > From your previous report we know that node 2: sees packets
>>> > in order, and node 3: sees packets out of order.
>>> > However, if the problem were due to bridge.c sending
>>> > the old buffer and not the new one, you'd see not only
>>> > reordering but also replication of packets.
>>> >
>>> > The fact that you see only the reordering in 3: makes
>>> > me think that the problem is in that node, and it could
>>> > be the network stack in 3: that does something strange.
>>> > So if you can run something netmap based in 3: and make
>>> > sure there is only one queue to read from, we could
>>> > at least figure out what is going on.
>>> >
>>> > cheers
>>> > luigi
>>> >
>>> >
>>> > is that
>>> > >
>>> > >>
>>> > >> Are you using native netmap drivers or the emulated mode ?
>>> > >> You can check that by playing with the "admode" sysctl entry
>>> > >> (or sysfs on linux) - try setting to 1 and 2 and see if
>>> > >> the behaviour changes.
>>> > >>
>>> > >>      dev.netmap.admode: 0
>>> > >>              Controls the use of native or emulated adapter mode.
>>> > >>              0 uses the best available option,
>>> > >>              1 forces native and fails if not available,
>>> > >>              2 forces emulated hence never fails.
>>> > >>
>>> > > I was using admode 0. I changed the admode to 1 and 2 using the
>>> command
>>> > like
>>> > > *echo 1 > /sys/module/netmap/parameters/admode* and restart the
>>> bridge
>>> > > program. The behavior keeps the same.
>>> > >
>>> > >>
>>> > >> cheers
>>> > >> luigi
>>> > >>
>>> > >> >
>>> > >> > There is really one ring (tx and rx) for NIC and one ring (tx and
>>> rx)
>>> > >> > for
>>> > >> > the host.
>>> > >> > I also doubt that there might be multiple tx rings for the host.
>>> It
>>> > >> > seems
>>> > >> > like that bridge program swap packet to multiple host rings and
>>> the
>>> > udp
>>> > >> > recv
>>> > >> > program drains packets from these rings. But this is not the case
>>> > here.
>>> > >> >
>>> > >> > The bridge program prints a line like this
>>> > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.*
>>> > >> > this is printed by the following line the original program
>>> > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name,
>>> > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name,
>>> > >> > pb->first_rx_ring,
>>> > >> > pb->req.nr_rx_rings);*
>>> > >> >
>>> > >> > I think this shows that there is really one NIC ring and one HOST
>>> > ring.
>>> > >> >
>>> > >> > Is there another way to verify the number of ring that netmap has?
>>> > >> >
>>> > >> > Thanks!
>>> > >> > Xiaoye
>>> > >> >
>>> > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo <rizzo@iet.unipi.it>
>>> > wrote:
>>> > >> >>
>>> > >> >> Hi,
>>> > >> >> there must be some wrong with your setting because
>>> > >> >> slot indexes must be sequential and in your case they
>>> > >> >> are not (see the jump from 295 to 474 and then
>>> > >> >> back from 485 to 296, and the numerous interleavings
>>> > >> >> that you are seeing later).
>>> > >> >>
>>> > >> >> I have no idea of the cause but typically this pattern
>>> > >> >> is what you see when there are multiple input rings and
>>> > >> >> not just one.
>>> > >> >>
>>> > >> >> Cheers
>>> > >> >> Luigi
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu
>>> >
>>> > >> >> wrote:
>>> > >> >> > Hi Luigi,
>>> > >> >> >
>>> > >> >> > Thanks for the detailed advice.
>>> > >> >> >
>>> > >> >> > With more detailed experiments, actually I found that the udp
>>> > >> >> > sender/receiver packet reorder issue *might* be irrelevant to
>>> the
>>> > >> >> > original
>>> > >> >> > issue I posted. However, I think we should solve the udp
>>> > >> >> > sender/receiver
>>> > >> >> > issue first.
>>> > >> >> > I run the experiment with more detailed log. Here is my
>>> findings.
>>> > >> >> >
>>> > >> >> > 1. I am running a netmap version available since about Oct 13rd
>>> > from
>>> > >> >> > github
>>> > >> >> > (https://github.com/luigirizzo/netmap). So I think this is
>>> not the
>>> > >> >> > one
>>> > >> >> > related to the buffer allocation issue. I tried to running the
>>> > newest
>>> > >> >> > version, however, that version causes problem when I exit the
>>> > bridge
>>> > >> >> > program
>>> > >> >> > (something like kernel error which make the os crash).
>>> > >> >> >
>>> > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get
>>> more
>>> > >> >> > information (more detailed log).
>>> > >> >> > The reorder happens multiple times (about 10 times) within a
>>> > second.
>>> > >> >> > Here is
>>> > >> >> > one example trace collected from the above two programs.
>>> > (remembering
>>> > >> >> > that
>>> > >> >> > we have udp sender running on one machine; netmap bridge and
>>> udp
>>> > >> >> > receiver
>>> > >> >> > are running on another machine).
>>> > >> >> > There is only one pair of rings each with 512 slots (511 slot
>>> > usable)
>>> > >> >> > on
>>> > >> >> > the
>>> > >> >> > receiver machine.
>>> > >> >> >
>>> > >> >> > =================== packet trace collected from receiver.c
>>> > >> >> > ===================
>>> > >> >> > ===== together with the slot and buf_idx of the corresponding
>>> > netmap
>>> > >> >> > ring
>>> > >> >> > slots ======
>>> > >> >> > [seq]   [slot]   [buf_idx]
>>> > >> >> > 8208   294    1833
>>> > >> >> > 8209   295    1834
>>> > >> >> > 8388   474    2013
>>> > >> >> > ... (packet received in order)
>>> > >> >> > 8398   484    2023
>>> > >> >> > 8399   485    2024
>>> > >> >> > 8210   296    1835
>>> > >> >> > 8211   297    1836
>>> > >> >> > ... (packet received in order)
>>> > >> >> > ...
>>> > >> >> > 8222   308    1847
>>> > >> >> > 8400   486    2025
>>> > >> >> > 8223   309    1848
>>> > >> >> > 8401   487    2026
>>> > >> >> > 8224   310    1849
>>> > >> >> > 8402   488    2027
>>> > >> >> > 8225   311    1850
>>> > >> >> > 8403   489    2028
>>> > >> >> > 8226   312    1851
>>> > >> >> > 8404   450    2029
>>> > >> >> > 8227   313    1852
>>> > >> >> > 8228   314    1853
>>> > >> >> >
>>> ===================================================================
>>> > >> >> > As we can see that the udp receiver got packet 8210 after it
>>> got
>>> > >> >> > 8399,
>>> > >> >> > which
>>> > >> >> > is the first reorder. Then, the receiver got 8211 to 8222
>>> > >> >> > sequentially.
>>> > >> >> > Then
>>> > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved.
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > ==================== event order seen by netmap bridge
>>> > >> >> > ==================
>>> > >> >> > get 8209
>>> > >> >> > poll called
>>> > >> >> > get 8210
>>> > >> >> > ...
>>> > >> >> > ...
>>> > >> >> > get 8228
>>> > >> >> > poll called
>>> > >> >> > get 8229
>>> > >> >> > ...
>>> > >> >> > ...
>>> > >> >> > get 8383
>>> > >> >> > poll called
>>> > >> >> > get 8384
>>> > >> >> > ...
>>> > >> >> > get 8387
>>> > >> >> > poll called
>>> > >> >> > get 8388
>>> > >> >> > ...
>>> > >> >> > get 8393
>>> > >> >> > poll called
>>> > >> >> > get 8394
>>> > >> >> > ...
>>> > >> >> > get 8399
>>> > >> >> > poll called
>>> > >> >> > get 8400
>>> > >> >> > ...
>>> > >> >> > get 8404
>>> > >> >> > poll called
>>> > >> >> > get 8405
>>> > >> >> >
>>> ===================================================================
>>> > >> >> > As we can see, from the event ordering see by the bridge.c,
>>> all the
>>> > >> >> > packets
>>> > >> >> > are receiver in order, which means the the reorder happens
>>> when the
>>> > >> >> > bridge
>>> > >> >> > code swap the buf_idx between the nic ring(slot) and the host
>>> > >> >> > ring(slot).
>>> > >> >> > The reordered seq usually right before or after the poll
>>> function
>>> > >> >> > call.
>>> > >> >> >
>>> > >> >> > Best,
>>> > >> >> > Xiaoye
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo <
>>> rizzo@iet.unipi.it>
>>> > >> >> > wrote:
>>> > >> >> >>
>>> > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun <
>>> Xiaoye.Sun@rice.edu>
>>> > >> >> >> wrote:
>>> > >> >> >> > Hi Luigi,
>>> > >> >> >> >
>>> > >> >> >> > Thanks for your advice.
>>> > >> >> >> > I forgot to mention that I use the command "ethtool -L eth1
>>> > >> >> >> > combined
>>> > >> >> >> > 1"
>>> > >> >> >> > to
>>> > >> >> >> > set the number of rings of the nic to 1.  The host also
>>> only has
>>> > >> >> >> > one
>>> > >> >> >> > ring.
>>> > >> >> >> > I understand the situation where the first tx ring is full
>>> so
>>> > the
>>> > >> >> >> > bridge
>>> > >> >> >> > will swap the packets to the second tx ring and then the
>>> > host/nic
>>> > >> >> >> > might
>>> > >> >> >> > drain either rings. But this is not the case in the
>>> experiment.
>>> > >> >> >>
>>> > >> >> >> ok good to know that.
>>> > >> >> >>
>>> > >> >> >> So if we have ruled out multiqueue and iommu, let's look at
>>> > >> >> >> the internal allocator and at bridge.c
>>> > >> >> >>
>>> > >> >> >> 1. are you running the most recent version of netmap ?
>>> > >> >> >>    Some older version (probably 1-2 years ago) had a bug
>>> > >> >> >>    in the buffer allocator and some buffers were allocated
>>> > >> >> >>    twice.
>>> > >> >> >>
>>> > >> >> >> 2. can you tweak your receiver.c to report some more info
>>> > >> >> >>    on how often you get out of sequence packets, how much
>>> > >> >> >>    out of sequence they are ?
>>> > >> >> >>    Also it would be useful to report gaps on the increasing
>>> side
>>> > >> >> >>    (i.e. new_seq != old_seq +1 )
>>> > >> >> >>
>>> > >> >> >> 3. can you tweak bridge.c so that it writes into the packet
>>> > >> >> >>    the netmap buffer indexes and slots on the rx and tx side,
>>> > >> >> >>    so when you detect a sequence error we can figure out
>>> > >> >> >>    where it is happening.
>>> > >> >> >>    Ideally you could also add the sequence number detection
>>> > >> >> >>    code in bridge.c so we can check whether the errors appear
>>> > >> >> >>    on the input or output sides.
>>> > >> >> >>
>>> > >> >> >> cheers
>>> > >> >> >> luigi
>>> > >> >> >>
>>> > >> >> >
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> --
>>> > >> >>
>>> > >> >>
>>> >
>>> -----------------------------------------+-------------------------------
>>> > >> >>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
>>> > >> >> dell'Informazione
>>> > >> >>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>>> > >> >>  TEL      +39-050-2217533               . via Diotisalvi 2
>>> > >> >>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>>> > >> >>
>>> > >> >>
>>> >
>>> -----------------------------------------+-------------------------------
>>> > >> >>
>>> > >> >
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >>
>>> >
>>> -----------------------------------------+-------------------------------
>>> > >>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
>>> > dell'Informazione
>>> > >>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>>> > >>  TEL      +39-050-2217533               . via Diotisalvi 2
>>> > >>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>>> > >>
>>> >
>>> -----------------------------------------+-------------------------------
>>> > >>
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> -----------------------------------------+-------------------------------
>>> >  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
>>> dell'Informazione
>>> >  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>>> >  TEL      +39-050-2217533               . via Diotisalvi 2
>>> >  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>>> >
>>> -----------------------------------------+-------------------------------
>>> >
>>> >
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>>
>>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANpwN=tfqitQW0BTXA7bU%2BTfmP8=wr7gE8wAP=hjAamjD7ny9Q>