Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 Feb 2016 16:23:03 -0600
From:      Xiaoye Sun <Xiaoye.Sun@rice.edu>
To:        Luigi Rizzo <rizzo@iet.unipi.it>, Pavel Odintsov <pavel.odintsov@gmail.com>, freebsd-net@freebsd.org
Subject:   Fwd: swaping ring slots between NIC ring and Host ring does not always success
Message-ID:  <CAJnByzgNZ9YsYd7tBgYxiQPvuS_VZbhZNGvsPS-0apCDga7XFA@mail.gmail.com>
In-Reply-To: <CAJnByzgsuNBhdfPJsGrrHcU79xjK%2Bdq2RENgUkbZcehFm8MUxg@mail.gmail.com>
References:  <CAJnByzj6Dj3vouZ2NbxqvCV-2-7TVtTR4FaWKuCFaaRN2X%2ByAA@mail.gmail.com> <CALgsdbd3XuE3wMYp4ey%2B1aer%2BHSVNojLYoVqwqTBPAXXdf9i%2BQ@mail.gmail.com> <CAJnByzirLXdCe-kwHV2s_E6ytGJG0Dth=0Ms12RrEk7FK_%2B8Og@mail.gmail.com> <CA%2BhQ2%2BgMWY0eabjHGw0=PJCAkS-wO=RBrN5brSbaqWc3_AOYoQ@mail.gmail.com> <CAJnByziBS8o6LtmpUrUu5xtRUd008Z2hnCsp=WVFv35r2J0rHw@mail.gmail.com> <CA%2BhQ2%2Bim9nFfYnqDS2HgRbAzdf5D0iaLCmCYhfXQVVRMouUFuw@mail.gmail.com> <CAJnByzht-qfDcm8oEg1aSRyVBZ1ygPvc2eMuoyJcq4geueTZ0Q@mail.gmail.com> <CA%2BhQ2%2BiERgWJ=cdFB-cByfT3r11T1kKr-5HiuCYZY-rxbjf=XA@mail.gmail.com> <CAJnByziDzdR2C6DcSRNPtrWACLq0XFpe4X1Ek9yXtFP9ivqWQw@mail.gmail.com> <CA%2BhQ2%2BhjnuGo1xKgc8CQ7gP35tiaZG7%2BroZBmX8aBgb8qWnLgg@mail.gmail.com> <CAJnByzh-VrRZeYdpkRFtCUGEN_arFBkemcN7byb51XV6UPswyg@mail.gmail.com> <CA%2BhQ2%2BiMw3kxjpcZy77vgOEsfk2UY0-farh9C8RKXZHMU7D8kw@mail.gmail.com> <CAJnByzgsuNBhdfPJsGrrHcU79xjK%2Bdq2RENgUkbZcehFm8MUxg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Luigi,

Thanks for your explanation.

I used three machines to do this experiment. They are directly connected.

[(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].

First, I tried to run bridge.c on machine2 using the command *bridge -i
netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
machine 1or3)

For my understanding, in this setup, machine2 will be transparent to
machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
without any modification to the packets.

I tried to ping machine 3 from machine 1 using the command like *ping
10.11.10.3*. However, it still does not success.
This is because that before machine1 sends ping message to machine3, it
will first send a ARP request message to get the mac address of machine3.
machine3 gets that ARP request, and send the reply back (I use tcpdump to
verify that machine3 gets the ARP request and send out the ARP reply).
However, machine1 does not get the ARP reply.

I checked that the bridge can only forwarding packet in one direction at
the same time. it gets the ARP request but doesn't see the ARP reply
(*pkt_queued* always returns 0 for one nic...).

This behavior looks very weird to me. Do you think there is a compatibility
issues between netmap and the os I am using? Is there a verified linux
distribution (also the version) that perfectly works well with netmap?

The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
x86_64 GNU/Linux.
Linux kernel version is *3.16.0-4-amd64*


Thanks!
Xiaoye






On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:

> On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu> wrote:
> >
> >
> > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:
> >>
> >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu> wrote:
> >> > Hi Luigi,
> >> >
> >> > I have to clarify about the *jumping issue* about the slot indexes.
> >> > In the bridge.c program, the slot index never jumps and it increases
> >> > sequentially.
> >> > In the receiver.c program, the udp packet seq jumps and I showed the
> >> > slot
> >> > index that each udp packet uses. So the slot index jumps together with
> >> > the
> >> > udp seq (at the receiver program only).
> >>
> >> So let me understand, is the "slot" some information written
> >> in the packet by bridge.c (referring to the rx or tx slot,
> >> I am not sure) and then read and printed by receiver.c
> >> (which gets the packet through recvfrom so there isn't
> >> really any slot index) ?
> >>
> > It works in the other way:
> > The bridge.c checks the seq numbers of the udp packets in netmap slots
> (in
> > nic rx ring) before the swap; then it records the seq number, slot
> > number(both rx and tx (tx indexes were not shown in the previous email
> since
> > they all look correct)) and buf_idx (rx and tx). The bridge.c does not
> > change anything in the buffer and it knows the slot and buf_idx that a
> > packet uses. Please refer to the added code in *process_rings* function
> > http://www.owlnet.rice.edu/~xs6/bridge.c
> > The receiver.c checks the seq numbers only and print out the seq numbers
> it
> > receive sequentially.
> > With these information, I manually match the seq number I got from
> > receiver.c and the seq number I got from bridge.c. So we know what is the
> > seq order the receiver sees and which slot a packet uses when bridge.c
> swaps
> > the buf_idxs.
> >
> >> Do you see any ordering inversion when the receiver
> >> gets packets through the NETMAP API (e.g. using bridge.c
> >> instead of receiver.c) ?
> >>
> > There is no ordering inversion seen by bridge.c (As I said in the
> previous
> > paragraph, the bridge.c checks the seq number and I did not see any order
> > inversion in THIS simple experiment (In my multicast protocol (mentioned
> in
> > the first email), there is ordering inversion. But let us solve the
> simple
> > bridge.c's problem first. I think they are two relatively independent
> > issues.)).
>
> Sorry there was a misunderstanding.
> I wanted you to check the following setup:
>
> [1: send.c] ->- [2: bridge.c] ->- [3: XYZ]
>
> where in XYZ you replace your receiver.c with some
> netmap-based receiver (it could be pkt-gen in rx mode,
> or possibly even another instance of bridge.c where
> you connect the output port to a vale switch so
> traffic is dropped), and then in XYZ print the content
> of the packets.
>
> From your previous report we know that node 2: sees packets
> in order, and node 3: sees packets out of order.
> However, if the problem were due to bridge.c sending
> the old buffer and not the new one, you'd see not only
> reordering but also replication of packets.
>
> The fact that you see only the reordering in 3: makes
> me think that the problem is in that node, and it could
> be the network stack in 3: that does something strange.
> So if you can run something netmap based in 3: and make
> sure there is only one queue to read from, we could
> at least figure out what is going on.
>
> cheers
> luigi
>
>
> is that
> >
> >>
> >> Are you using native netmap drivers or the emulated mode ?
> >> You can check that by playing with the "admode" sysctl entry
> >> (or sysfs on linux) - try setting to 1 and 2 and see if
> >> the behaviour changes.
> >>
> >>      dev.netmap.admode: 0
> >>              Controls the use of native or emulated adapter mode.
> >>              0 uses the best available option,
> >>              1 forces native and fails if not available,
> >>              2 forces emulated hence never fails.
> >>
> > I was using admode 0. I changed the admode to 1 and 2 using the command
> like
> > *echo 1 > /sys/module/netmap/parameters/admode* and restart the bridge
> > program. The behavior keeps the same.
> >
> >>
> >> cheers
> >> luigi
> >>
> >> >
> >> > There is really one ring (tx and rx) for NIC and one ring (tx and rx)
> >> > for
> >> > the host.
> >> > I also doubt that there might be multiple tx rings for the host. It
> >> > seems
> >> > like that bridge program swap packet to multiple host rings and the
> udp
> >> > recv
> >> > program drains packets from these rings. But this is not the case
> here.
> >> >
> >> > The bridge program prints a line like this
> >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.*
> >> > this is printed by the following line the original program
> >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name,
> >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name,
> >> > pb->first_rx_ring,
> >> > pb->req.nr_rx_rings);*
> >> >
> >> > I think this shows that there is really one NIC ring and one HOST
> ring.
> >> >
> >> > Is there another way to verify the number of ring that netmap has?
> >> >
> >> > Thanks!
> >> > Xiaoye
> >> >
> >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo <rizzo@iet.unipi.it>
> wrote:
> >> >>
> >> >> Hi,
> >> >> there must be some wrong with your setting because
> >> >> slot indexes must be sequential and in your case they
> >> >> are not (see the jump from 295 to 474 and then
> >> >> back from 485 to 296, and the numerous interleavings
> >> >> that you are seeing later).
> >> >>
> >> >> I have no idea of the cause but typically this pattern
> >> >> is what you see when there are multiple input rings and
> >> >> not just one.
> >> >>
> >> >> Cheers
> >> >> Luigi
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun <Xiaoye.Sun@rice.edu>
> >> >> wrote:
> >> >> > Hi Luigi,
> >> >> >
> >> >> > Thanks for the detailed advice.
> >> >> >
> >> >> > With more detailed experiments, actually I found that the udp
> >> >> > sender/receiver packet reorder issue *might* be irrelevant to the
> >> >> > original
> >> >> > issue I posted. However, I think we should solve the udp
> >> >> > sender/receiver
> >> >> > issue first.
> >> >> > I run the experiment with more detailed log. Here is my findings.
> >> >> >
> >> >> > 1. I am running a netmap version available since about Oct 13rd
> from
> >> >> > github
> >> >> > (https://github.com/luigirizzo/netmap). So I think this is not the
> >> >> > one
> >> >> > related to the buffer allocation issue. I tried to running the
> newest
> >> >> > version, however, that version causes problem when I exit the
> bridge
> >> >> > program
> >> >> > (something like kernel error which make the os crash).
> >> >> >
> >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get more
> >> >> > information (more detailed log).
> >> >> > The reorder happens multiple times (about 10 times) within a
> second.
> >> >> > Here is
> >> >> > one example trace collected from the above two programs.
> (remembering
> >> >> > that
> >> >> > we have udp sender running on one machine; netmap bridge and udp
> >> >> > receiver
> >> >> > are running on another machine).
> >> >> > There is only one pair of rings each with 512 slots (511 slot
> usable)
> >> >> > on
> >> >> > the
> >> >> > receiver machine.
> >> >> >
> >> >> > =================== packet trace collected from receiver.c
> >> >> > ===================
> >> >> > ===== together with the slot and buf_idx of the corresponding
> netmap
> >> >> > ring
> >> >> > slots ======
> >> >> > [seq]   [slot]   [buf_idx]
> >> >> > 8208   294    1833
> >> >> > 8209   295    1834
> >> >> > 8388   474    2013
> >> >> > ... (packet received in order)
> >> >> > 8398   484    2023
> >> >> > 8399   485    2024
> >> >> > 8210   296    1835
> >> >> > 8211   297    1836
> >> >> > ... (packet received in order)
> >> >> > ...
> >> >> > 8222   308    1847
> >> >> > 8400   486    2025
> >> >> > 8223   309    1848
> >> >> > 8401   487    2026
> >> >> > 8224   310    1849
> >> >> > 8402   488    2027
> >> >> > 8225   311    1850
> >> >> > 8403   489    2028
> >> >> > 8226   312    1851
> >> >> > 8404   450    2029
> >> >> > 8227   313    1852
> >> >> > 8228   314    1853
> >> >> > ===================================================================
> >> >> > As we can see that the udp receiver got packet 8210 after it got
> >> >> > 8399,
> >> >> > which
> >> >> > is the first reorder. Then, the receiver got 8211 to 8222
> >> >> > sequentially.
> >> >> > Then
> >> >> > it got packet from 8223-8227 and 8400-8404 interleaved.
> >> >> >
> >> >> >
> >> >> > ==================== event order seen by netmap bridge
> >> >> > ==================
> >> >> > get 8209
> >> >> > poll called
> >> >> > get 8210
> >> >> > ...
> >> >> > ...
> >> >> > get 8228
> >> >> > poll called
> >> >> > get 8229
> >> >> > ...
> >> >> > ...
> >> >> > get 8383
> >> >> > poll called
> >> >> > get 8384
> >> >> > ...
> >> >> > get 8387
> >> >> > poll called
> >> >> > get 8388
> >> >> > ...
> >> >> > get 8393
> >> >> > poll called
> >> >> > get 8394
> >> >> > ...
> >> >> > get 8399
> >> >> > poll called
> >> >> > get 8400
> >> >> > ...
> >> >> > get 8404
> >> >> > poll called
> >> >> > get 8405
> >> >> > ===================================================================
> >> >> > As we can see, from the event ordering see by the bridge.c, all the
> >> >> > packets
> >> >> > are receiver in order, which means the the reorder happens when the
> >> >> > bridge
> >> >> > code swap the buf_idx between the nic ring(slot) and the host
> >> >> > ring(slot).
> >> >> > The reordered seq usually right before or after the poll function
> >> >> > call.
> >> >> >
> >> >> > Best,
> >> >> > Xiaoye
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo <rizzo@iet.unipi.it>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun <Xiaoye.Sun@rice.edu>
> >> >> >> wrote:
> >> >> >> > Hi Luigi,
> >> >> >> >
> >> >> >> > Thanks for your advice.
> >> >> >> > I forgot to mention that I use the command "ethtool -L eth1
> >> >> >> > combined
> >> >> >> > 1"
> >> >> >> > to
> >> >> >> > set the number of rings of the nic to 1.  The host also only has
> >> >> >> > one
> >> >> >> > ring.
> >> >> >> > I understand the situation where the first tx ring is full so
> the
> >> >> >> > bridge
> >> >> >> > will swap the packets to the second tx ring and then the
> host/nic
> >> >> >> > might
> >> >> >> > drain either rings. But this is not the case in the experiment.
> >> >> >>
> >> >> >> ok good to know that.
> >> >> >>
> >> >> >> So if we have ruled out multiqueue and iommu, let's look at
> >> >> >> the internal allocator and at bridge.c
> >> >> >>
> >> >> >> 1. are you running the most recent version of netmap ?
> >> >> >>    Some older version (probably 1-2 years ago) had a bug
> >> >> >>    in the buffer allocator and some buffers were allocated
> >> >> >>    twice.
> >> >> >>
> >> >> >> 2. can you tweak your receiver.c to report some more info
> >> >> >>    on how often you get out of sequence packets, how much
> >> >> >>    out of sequence they are ?
> >> >> >>    Also it would be useful to report gaps on the increasing side
> >> >> >>    (i.e. new_seq != old_seq +1 )
> >> >> >>
> >> >> >> 3. can you tweak bridge.c so that it writes into the packet
> >> >> >>    the netmap buffer indexes and slots on the rx and tx side,
> >> >> >>    so when you detect a sequence error we can figure out
> >> >> >>    where it is happening.
> >> >> >>    Ideally you could also add the sequence number detection
> >> >> >>    code in bridge.c so we can check whether the errors appear
> >> >> >>    on the input or output sides.
> >> >> >>
> >> >> >> cheers
> >> >> >> luigi
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>
> -----------------------------------------+-------------------------------
> >> >>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
> >> >> dell'Informazione
> >> >>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
> >> >>  TEL      +39-050-2217533               . via Diotisalvi 2
> >> >>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> >> >>
> >> >>
> -----------------------------------------+-------------------------------
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >>
> -----------------------------------------+-------------------------------
> >>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
> dell'Informazione
> >>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
> >>  TEL      +39-050-2217533               . via Diotisalvi 2
> >>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> >>
> -----------------------------------------+-------------------------------
> >>
> >
>
>
>
> --
> -----------------------------------------+-------------------------------
>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>  TEL      +39-050-2217533               . via Diotisalvi 2
>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> -----------------------------------------+-------------------------------
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJnByzgNZ9YsYd7tBgYxiQPvuS_VZbhZNGvsPS-0apCDga7XFA>