From owner-freebsd-net@freebsd.org Fri Feb 5 00:04:14 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B66A5A9BE26 for ; Fri, 5 Feb 2016 00:04:14 +0000 (UTC) (envelope-from sunxiaoye07@gmail.com) Received: from mail-io0-x22c.google.com (mail-io0-x22c.google.com [IPv6:2607:f8b0:4001:c06::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 71FC81D20 for ; Fri, 5 Feb 2016 00:04:14 +0000 (UTC) (envelope-from sunxiaoye07@gmail.com) Received: by mail-io0-x22c.google.com with SMTP id g73so111895738ioe.3 for ; Thu, 04 Feb 2016 16:04:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=EmbMO+V8STj0nkKx9wYAsjGyFG6//7OyO7hpkWYBZEg=; b=t8ded/CcT/PAB10LtsYBShqejBmufBVVU083H1U2ZsMvs++d1nrUkQW6BaipU8jJpU MOWDA9no6jLfIrnxdnuxwB9yZktDUAq+Y0kOFZ9BSvLdWvc3gkI9fyIQDj8+OlkSDJyM vomMjD35bllTfV/9TJQrDbIrcZkfdEEcxdGHra8sCe6X25OpgDBIAvEw/MKDg/+9FpKl nNLLT0qrcUulpLdHErt/R2aUrrlKMdkH+yNegM2qRsphCXaXoWvrARTc5x37/lU3vnlT 20ueHwC4nrKtOT4WuTrdvQd7sWpA5Brr/upZdrpGXU3IBe6bRFBhqnINln6qD6UrQOx7 GfXg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=EmbMO+V8STj0nkKx9wYAsjGyFG6//7OyO7hpkWYBZEg=; b=2C701jeTGRN7BW+4BpfZFaJGEpwYIf+RrZ7xUw74Tx+1dXnZYwikekC8OzSjByj582 olH86KesnbG+JZ90O85wN10jIRyFSrj2hoa2sz23CF3Yy0IHgzcNew0gubl4v4mmBvYd M+Gbx/QUhlU/0/DOz1BSAeav2Q1ERw0bSEOCOiuG82VTIJABblYboyid6qckCXLkJfuK F/bZI2KzdajAyMm53Bx/RFf8sgkPQXIf8ztcHlFRMrCNSiKmdX8rydPK7Iu6Kdri0dC0 /W3KKrETpxruHLUFl3dzV/Z3FpcZ8EYFGvKS+5Meg6ajk8/ZEKRHt51jq0/c6d8OT1cl WH8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EmbMO+V8STj0nkKx9wYAsjGyFG6//7OyO7hpkWYBZEg=; b=Tpf2dPEX4L/Q7elUG3gVeRipRJVk4LsVxjEF+auH5BPk4mpAfcOIL3WoBOL6CUl7wx j3gM7oeyoZXPgE8IVk39WI3LA/pzB4MUy+PbhnTI+nLjrLHavUSwe9MfcnSYsVGGXOuE i2gRlFKRNSFU1ZXtDgBX9DINwK8dKAu/w6sopJu/CSlFLX6awHQLFTH8+WoSteaW+f89 WnaVFvdSqH4+IAv/apX90250ruLuk6Zg8TGY+e+wpQ0iPnhnXa1RN6dsl+9hMN+Mc2DI 8tyQy03r36Di8bLlG/jIBoxenHFMobMzVFoLAF6kwNiR6NqW0fcmog/8cmV4ZjqYuyvg nqdg== X-Gm-Message-State: AG10YOSRUj0pI/RShm5YHz2VuKI5HCLbe035WpTfclsrRnpJRlkdlIQ/q81NqfXLJWuPwwBN89mUcBMQeUPKCg== MIME-Version: 1.0 X-Received: by 10.107.137.100 with SMTP id l97mr13789927iod.110.1454630653906; Thu, 04 Feb 2016 16:04:13 -0800 (PST) Sender: sunxiaoye07@gmail.com Received: by 10.36.98.82 with HTTP; Thu, 4 Feb 2016 16:04:13 -0800 (PST) In-Reply-To: References: Date: Thu, 4 Feb 2016 18:04:13 -0600 X-Google-Sender-Auth: ET4AEWuAfQQRzIamygAw9YTDev0 Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success From: Xiaoye Sun To: Victor Detoni Cc: Luigi Rizzo , Pavel Odintsov , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Feb 2016 00:04:14 -0000 Yes. all the interfaces are up. Are you able to get ARP request when the interfaces are down? On Thursday, February 4, 2016, Victor Detoni wrote: > Both interfaces are up? Like ifconfig... up > > I had this the same problem and I solve with commands above > > Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun > escreveu: > >> Hi Luigi, >> >> Thanks for your explanation. >> >> I used three machines to do this experiment. They are directly connected. >> >> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)]. >> >> First, I tried to run bridge.c on machine2 using the command *bridge -i >> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on >> machine 1or3) >> >> For my understanding, in this setup, machine2 will be transparent to >> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa >> without any modification to the packets. >> >> I tried to ping machine 3 from machine 1 using the command like *ping >> 10.11.10.3*. However, it still does not success. >> This is because that before machine1 sends ping message to machine3, it >> will first send a ARP request message to get the mac address of machine3. >> machine3 gets that ARP request, and send the reply back (I use tcpdump to >> verify that machine3 gets the ARP request and send out the ARP reply). >> However, machine1 does not get the ARP reply. >> >> I checked that the bridge can only forwarding packet in one direction at >> the same time. it gets the ARP request but doesn't see the ARP reply >> (*pkt_queued* always returns 0 for one nic...). >> >> This behavior looks very weird to me. Do you think there is a >> compatibility >> issues between netmap and the os I am using? Is there a verified linux >> distribution (also the version) that perfectly works well with netmap? >> >> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) >> x86_64 GNU/Linux. >> Linux kernel version is *3.16.0-4-amd64* >> >> >> Thanks! >> Xiaoye >> >> >> >> >> >> >> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo wrote: >> >> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun >> wrote: >> > > >> > > >> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo >> wrote: >> > >> >> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun >> wrote: >> > >> > Hi Luigi, >> > >> > >> > >> > I have to clarify about the *jumping issue* about the slot indexes. >> > >> > In the bridge.c program, the slot index never jumps and it >> increases >> > >> > sequentially. >> > >> > In the receiver.c program, the udp packet seq jumps and I showed >> the >> > >> > slot >> > >> > index that each udp packet uses. So the slot index jumps together >> with >> > >> > the >> > >> > udp seq (at the receiver program only). >> > >> >> > >> So let me understand, is the "slot" some information written >> > >> in the packet by bridge.c (referring to the rx or tx slot, >> > >> I am not sure) and then read and printed by receiver.c >> > >> (which gets the packet through recvfrom so there isn't >> > >> really any slot index) ? >> > >> >> > > It works in the other way: >> > > The bridge.c checks the seq numbers of the udp packets in netmap slots >> > (in >> > > nic rx ring) before the swap; then it records the seq number, slot >> > > number(both rx and tx (tx indexes were not shown in the previous email >> > since >> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does not >> > > change anything in the buffer and it knows the slot and buf_idx that a >> > > packet uses. Please refer to the added code in *process_rings* >> function >> > > http://www.owlnet.rice.edu/~xs6/bridge.c >> > > The receiver.c checks the seq numbers only and print out the seq >> numbers >> > it >> > > receive sequentially. >> > > With these information, I manually match the seq number I got from >> > > receiver.c and the seq number I got from bridge.c. So we know what is >> the >> > > seq order the receiver sees and which slot a packet uses when bridge.c >> > swaps >> > > the buf_idxs. >> > > >> > >> Do you see any ordering inversion when the receiver >> > >> gets packets through the NETMAP API (e.g. using bridge.c >> > >> instead of receiver.c) ? >> > >> >> > > There is no ordering inversion seen by bridge.c (As I said in the >> > previous >> > > paragraph, the bridge.c checks the seq number and I did not see any >> order >> > > inversion in THIS simple experiment (In my multicast protocol >> (mentioned >> > in >> > > the first email), there is ordering inversion. But let us solve the >> > simple >> > > bridge.c's problem first. I think they are two relatively independent >> > > issues.)). >> > >> > Sorry there was a misunderstanding. >> > I wanted you to check the following setup: >> > >> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ] >> > >> > where in XYZ you replace your receiver.c with some >> > netmap-based receiver (it could be pkt-gen in rx mode, >> > or possibly even another instance of bridge.c where >> > you connect the output port to a vale switch so >> > traffic is dropped), and then in XYZ print the content >> > of the packets. >> > >> > From your previous report we know that node 2: sees packets >> > in order, and node 3: sees packets out of order. >> > However, if the problem were due to bridge.c sending >> > the old buffer and not the new one, you'd see not only >> > reordering but also replication of packets. >> > >> > The fact that you see only the reordering in 3: makes >> > me think that the problem is in that node, and it could >> > be the network stack in 3: that does something strange. >> > So if you can run something netmap based in 3: and make >> > sure there is only one queue to read from, we could >> > at least figure out what is going on. >> > >> > cheers >> > luigi >> > >> > >> > is that >> > > >> > >> >> > >> Are you using native netmap drivers or the emulated mode ? >> > >> You can check that by playing with the "admode" sysctl entry >> > >> (or sysfs on linux) - try setting to 1 and 2 and see if >> > >> the behaviour changes. >> > >> >> > >> dev.netmap.admode: 0 >> > >> Controls the use of native or emulated adapter mode. >> > >> 0 uses the best available option, >> > >> 1 forces native and fails if not available, >> > >> 2 forces emulated hence never fails. >> > >> >> > > I was using admode 0. I changed the admode to 1 and 2 using the >> command >> > like >> > > *echo 1 > /sys/module/netmap/parameters/admode* and restart the bridge >> > > program. The behavior keeps the same. >> > > >> > >> >> > >> cheers >> > >> luigi >> > >> >> > >> > >> > >> > There is really one ring (tx and rx) for NIC and one ring (tx and >> rx) >> > >> > for >> > >> > the host. >> > >> > I also doubt that there might be multiple tx rings for the host. It >> > >> > seems >> > >> > like that bridge program swap packet to multiple host rings and the >> > udp >> > >> > recv >> > >> > program drains packets from these rings. But this is not the case >> > here. >> > >> > >> > >> > The bridge program prints a line like this >> > >> > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.* >> > >> > this is printed by the following line the original program >> > >> > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, >> > >> > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, >> > >> > pb->first_rx_ring, >> > >> > pb->req.nr_rx_rings);* >> > >> > >> > >> > I think this shows that there is really one NIC ring and one HOST >> > ring. >> > >> > >> > >> > Is there another way to verify the number of ring that netmap has? >> > >> > >> > >> > Thanks! >> > >> > Xiaoye >> > >> > >> > >> > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo >> > wrote: >> > >> >> >> > >> >> Hi, >> > >> >> there must be some wrong with your setting because >> > >> >> slot indexes must be sequential and in your case they >> > >> >> are not (see the jump from 295 to 474 and then >> > >> >> back from 485 to 296, and the numerous interleavings >> > >> >> that you are seeing later). >> > >> >> >> > >> >> I have no idea of the cause but typically this pattern >> > >> >> is what you see when there are multiple input rings and >> > >> >> not just one. >> > >> >> >> > >> >> Cheers >> > >> >> Luigi >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun >> > >> >> wrote: >> > >> >> > Hi Luigi, >> > >> >> > >> > >> >> > Thanks for the detailed advice. >> > >> >> > >> > >> >> > With more detailed experiments, actually I found that the udp >> > >> >> > sender/receiver packet reorder issue *might* be irrelevant to >> the >> > >> >> > original >> > >> >> > issue I posted. However, I think we should solve the udp >> > >> >> > sender/receiver >> > >> >> > issue first. >> > >> >> > I run the experiment with more detailed log. Here is my >> findings. >> > >> >> > >> > >> >> > 1. I am running a netmap version available since about Oct 13rd >> > from >> > >> >> > github >> > >> >> > (https://github.com/luigirizzo/netmap). So I think this is not >> the >> > >> >> > one >> > >> >> > related to the buffer allocation issue. I tried to running the >> > newest >> > >> >> > version, however, that version causes problem when I exit the >> > bridge >> > >> >> > program >> > >> >> > (something like kernel error which make the os crash). >> > >> >> > >> > >> >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get >> more >> > >> >> > information (more detailed log). >> > >> >> > The reorder happens multiple times (about 10 times) within a >> > second. >> > >> >> > Here is >> > >> >> > one example trace collected from the above two programs. >> > (remembering >> > >> >> > that >> > >> >> > we have udp sender running on one machine; netmap bridge and udp >> > >> >> > receiver >> > >> >> > are running on another machine). >> > >> >> > There is only one pair of rings each with 512 slots (511 slot >> > usable) >> > >> >> > on >> > >> >> > the >> > >> >> > receiver machine. >> > >> >> > >> > >> >> > =================== packet trace collected from receiver.c >> > >> >> > =================== >> > >> >> > ===== together with the slot and buf_idx of the corresponding >> > netmap >> > >> >> > ring >> > >> >> > slots ====== >> > >> >> > [seq] [slot] [buf_idx] >> > >> >> > 8208 294 1833 >> > >> >> > 8209 295 1834 >> > >> >> > 8388 474 2013 >> > >> >> > ... (packet received in order) >> > >> >> > 8398 484 2023 >> > >> >> > 8399 485 2024 >> > >> >> > 8210 296 1835 >> > >> >> > 8211 297 1836 >> > >> >> > ... (packet received in order) >> > >> >> > ... >> > >> >> > 8222 308 1847 >> > >> >> > 8400 486 2025 >> > >> >> > 8223 309 1848 >> > >> >> > 8401 487 2026 >> > >> >> > 8224 310 1849 >> > >> >> > 8402 488 2027 >> > >> >> > 8225 311 1850 >> > >> >> > 8403 489 2028 >> > >> >> > 8226 312 1851 >> > >> >> > 8404 450 2029 >> > >> >> > 8227 313 1852 >> > >> >> > 8228 314 1853 >> > >> >> > >> =================================================================== >> > >> >> > As we can see that the udp receiver got packet 8210 after it got >> > >> >> > 8399, >> > >> >> > which >> > >> >> > is the first reorder. Then, the receiver got 8211 to 8222 >> > >> >> > sequentially. >> > >> >> > Then >> > >> >> > it got packet from 8223-8227 and 8400-8404 interleaved. >> > >> >> > >> > >> >> > >> > >> >> > ==================== event order seen by netmap bridge >> > >> >> > ================== >> > >> >> > get 8209 >> > >> >> > poll called >> > >> >> > get 8210 >> > >> >> > ... >> > >> >> > ... >> > >> >> > get 8228 >> > >> >> > poll called >> > >> >> > get 8229 >> > >> >> > ... >> > >> >> > ... >> > >> >> > get 8383 >> > >> >> > poll called >> > >> >> > get 8384 >> > >> >> > ... >> > >> >> > get 8387 >> > >> >> > poll called >> > >> >> > get 8388 >> > >> >> > ... >> > >> >> > get 8393 >> > >> >> > poll called >> > >> >> > get 8394 >> > >> >> > ... >> > >> >> > get 8399 >> > >> >> > poll called >> > >> >> > get 8400 >> > >> >> > ... >> > >> >> > get 8404 >> > >> >> > poll called >> > >> >> > get 8405 >> > >> >> > >> =================================================================== >> > >> >> > As we can see, from the event ordering see by the bridge.c, all >> the >> > >> >> > packets >> > >> >> > are receiver in order, which means the the reorder happens when >> the >> > >> >> > bridge >> > >> >> > code swap the buf_idx between the nic ring(slot) and the host >> > >> >> > ring(slot). >> > >> >> > The reordered seq usually right before or after the poll >> function >> > >> >> > call. >> > >> >> > >> > >> >> > Best, >> > >> >> > Xiaoye >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo < >> rizzo@iet.unipi.it> >> > >> >> > wrote: >> > >> >> >> >> > >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun < >> Xiaoye.Sun@rice.edu> >> > >> >> >> wrote: >> > >> >> >> > Hi Luigi, >> > >> >> >> > >> > >> >> >> > Thanks for your advice. >> > >> >> >> > I forgot to mention that I use the command "ethtool -L eth1 >> > >> >> >> > combined >> > >> >> >> > 1" >> > >> >> >> > to >> > >> >> >> > set the number of rings of the nic to 1. The host also only >> has >> > >> >> >> > one >> > >> >> >> > ring. >> > >> >> >> > I understand the situation where the first tx ring is full so >> > the >> > >> >> >> > bridge >> > >> >> >> > will swap the packets to the second tx ring and then the >> > host/nic >> > >> >> >> > might >> > >> >> >> > drain either rings. But this is not the case in the >> experiment. >> > >> >> >> >> > >> >> >> ok good to know that. >> > >> >> >> >> > >> >> >> So if we have ruled out multiqueue and iommu, let's look at >> > >> >> >> the internal allocator and at bridge.c >> > >> >> >> >> > >> >> >> 1. are you running the most recent version of netmap ? >> > >> >> >> Some older version (probably 1-2 years ago) had a bug >> > >> >> >> in the buffer allocator and some buffers were allocated >> > >> >> >> twice. >> > >> >> >> >> > >> >> >> 2. can you tweak your receiver.c to report some more info >> > >> >> >> on how often you get out of sequence packets, how much >> > >> >> >> out of sequence they are ? >> > >> >> >> Also it would be useful to report gaps on the increasing >> side >> > >> >> >> (i.e. new_seq != old_seq +1 ) >> > >> >> >> >> > >> >> >> 3. can you tweak bridge.c so that it writes into the packet >> > >> >> >> the netmap buffer indexes and slots on the rx and tx side, >> > >> >> >> so when you detect a sequence error we can figure out >> > >> >> >> where it is happening. >> > >> >> >> Ideally you could also add the sequence number detection >> > >> >> >> code in bridge.c so we can check whether the errors appear >> > >> >> >> on the input or output sides. >> > >> >> >> >> > >> >> >> cheers >> > >> >> >> luigi >> > >> >> >> >> > >> >> > >> > >> >> >> > >> >> >> > >> >> >> > >> >> -- >> > >> >> >> > >> >> >> > >> -----------------------------------------+------------------------------- >> > >> >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> > >> >> dell'Informazione >> > >> >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > >> >> TEL +39-050-2217533 . via Diotisalvi 2 >> > >> >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> >> >> > >> >> >> > >> -----------------------------------------+------------------------------- >> > >> >> >> > >> > >> > >> >> > >> >> > >> >> > >> -- >> > >> >> > >> -----------------------------------------+------------------------------- >> > >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> > dell'Informazione >> > >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > >> TEL +39-050-2217533 . via Diotisalvi 2 >> > >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> >> > >> -----------------------------------------+------------------------------- >> > >> >> > > >> > >> > >> > >> > -- >> > >> -----------------------------------------+------------------------------- >> > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> dell'Informazione >> > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> > TEL +39-050-2217533 . via Diotisalvi 2 >> > Mobile +39-338-6809875 . 56122 PISA (Italy) >> > >> -----------------------------------------+------------------------------- >> > >> > >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >