From owner-freebsd-net@freebsd.org Tue Feb 2 05:34:51 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41E4EA977F1 for ; Tue, 2 Feb 2016 05:34:51 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A131AAD4 for ; Tue, 2 Feb 2016 05:34:50 +0000 (UTC) (envelope-from rizzo.unipi@gmail.com) Received: by mail-lb0-x235.google.com with SMTP id cl12so87651072lbc.1 for ; Mon, 01 Feb 2016 21:34:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=TGq+QzhqQ2A/hxvhmQshjph9jnzIKTYk2CtaJLhdG4o=; b=Vt4a7Wf79PStDGFPEFON83Px7t1k/kwXZrdFvOVnSM7iCmw5nFuGlh78WnDBxRA9Rh cwv9H+tSlotykwJY6BtsHxvXfNCSGuxQV+xN/C/MAAL3ONclPaaXBmvqEjSrbDrPSXL7 ofsdIsQ+BjIy3WThOyDdAoPNt62noa5XjiMAqSE15Y8+yMh6YWGXGdVyLudmX06sRVo7 1w9Wt7g2XPcxwZNXlS1+qWaaS218BNchmFQExt5jKjhToqvzIF8/kPRakuIozXLZXJOB YlLPgClrswa+usX9Jw4vG14LFFwSCSZosnFUWxhSmibg4FRVcrSEHm4UJ5EG/hqaDyVm +egQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=TGq+QzhqQ2A/hxvhmQshjph9jnzIKTYk2CtaJLhdG4o=; b=h0w/ZTYAxGYwPuHOhM8QCmG2Wak5edNbeKV6VjfdR7iX7D5W8lQ2Yf+D8x8kJpftEh gX8y+/tth2OJ8r1eVcljwN1DuMkf3CahcfnmO6MFUggcyzNXisqRtAxXKpJZXXKerKC3 P56WpUiT7U7USEtQc/Eg7Saly6OCi2iu/s4JTUDn5gcR+L2uSRpPGg5RX87yEUODSccp yA0X+8rWYndF/SMNo4E7REdDQcAurLIJKj1m+pXeDAjMuVFNzojb04C/DYIIL51vvHzh 8ylm6u4Rs3diuy60Vp2hNsvatInUfEVmfcxXEYpdm6LSI7lUx4E0DQaPlcZ4w78R8U0c 5g5Q== X-Gm-Message-State: AG10YOSBevrOR59hB9xRlQcIVJGijPRhmXArJy3NaxQ2nvsOWpuhJU0c3vvH1btz5EV/pFIC5vBQBH9knSgHwg== MIME-Version: 1.0 X-Received: by 10.112.126.72 with SMTP id mw8mr10224354lbb.14.1454391288571; Mon, 01 Feb 2016 21:34:48 -0800 (PST) Sender: rizzo.unipi@gmail.com Received: by 10.114.4.232 with HTTP; Mon, 1 Feb 2016 21:34:48 -0800 (PST) In-Reply-To: References: Date: Tue, 2 Feb 2016 06:34:48 +0100 X-Google-Sender-Auth: FZMAk3I8UbiXoeKAfEZPMNVJ8M4 Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success From: Luigi Rizzo To: Xiaoye Sun Cc: Pavel Odintsov , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Feb 2016 05:34:51 -0000 On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun wrote: > Hi Luigi, > > I have to clarify about the *jumping issue* about the slot indexes. > In the bridge.c program, the slot index never jumps and it increases > sequentially. > In the receiver.c program, the udp packet seq jumps and I showed the slot > index that each udp packet uses. So the slot index jumps together with the > udp seq (at the receiver program only). So let me understand, is the "slot" some information written in the packet by bridge.c (referring to the rx or tx slot, I am not sure) and then read and printed by receiver.c (which gets the packet through recvfrom so there isn't really any slot index) ? Do you see any ordering inversion when the receiver gets packets through the NETMAP API (e.g. using bridge.c instead of receiver.c) ? Are you using native netmap drivers or the emulated mode ? You can check that by playing with the "admode" sysctl entry (or sysfs on linux) - try setting to 1 and 2 and see if the behaviour changes. dev.netmap.admode: 0 Controls the use of native or emulated adapter mode. 0 uses the best available option, 1 forces native and fails if not available, 2 forces emulated hence never fails. cheers luigi > > There is really one ring (tx and rx) for NIC and one ring (tx and rx) for > the host. > I also doubt that there might be multiple tx rings for the host. It seems > like that bridge program swap packet to multiple host rings and the udp recv > program drains packets from these rings. But this is not the case here. > > The bridge program prints a line like this > *515.277263 main [277] Ready to go, eth3 0x1/1 <-> eth3 0x0/1.* > this is printed by the following line the original program > *D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, > pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, pb->first_rx_ring, > pb->req.nr_rx_rings);* > > I think this shows that there is really one NIC ring and one HOST ring. > > Is there another way to verify the number of ring that netmap has? > > Thanks! > Xiaoye > > On Mon, Feb 1, 2016 at 10:48 PM, Luigi Rizzo wrote: >> >> Hi, >> there must be some wrong with your setting because >> slot indexes must be sequential and in your case they >> are not (see the jump from 295 to 474 and then >> back from 485 to 296, and the numerous interleavings >> that you are seeing later). >> >> I have no idea of the cause but typically this pattern >> is what you see when there are multiple input rings and >> not just one. >> >> Cheers >> Luigi >> >> >> >> >> On Tue, Feb 2, 2016 at 12:24 AM, Xiaoye Sun wrote: >> > Hi Luigi, >> > >> > Thanks for the detailed advice. >> > >> > With more detailed experiments, actually I found that the udp >> > sender/receiver packet reorder issue *might* be irrelevant to the >> > original >> > issue I posted. However, I think we should solve the udp sender/receiver >> > issue first. >> > I run the experiment with more detailed log. Here is my findings. >> > >> > 1. I am running a netmap version available since about Oct 13rd from >> > github >> > (https://github.com/luigirizzo/netmap). So I think this is not the one >> > related to the buffer allocation issue. I tried to running the newest >> > version, however, that version causes problem when I exit the bridge >> > program >> > (something like kernel error which make the os crash). >> > >> > 2 & 3. I changed the receiver.c & bridge.c so that I can get more >> > information (more detailed log). >> > The reorder happens multiple times (about 10 times) within a second. >> > Here is >> > one example trace collected from the above two programs. (remembering >> > that >> > we have udp sender running on one machine; netmap bridge and udp >> > receiver >> > are running on another machine). >> > There is only one pair of rings each with 512 slots (511 slot usable) on >> > the >> > receiver machine. >> > >> > =================== packet trace collected from receiver.c >> > =================== >> > ===== together with the slot and buf_idx of the corresponding netmap >> > ring >> > slots ====== >> > [seq] [slot] [buf_idx] >> > 8208 294 1833 >> > 8209 295 1834 >> > 8388 474 2013 >> > ... (packet received in order) >> > 8398 484 2023 >> > 8399 485 2024 >> > 8210 296 1835 >> > 8211 297 1836 >> > ... (packet received in order) >> > ... >> > 8222 308 1847 >> > 8400 486 2025 >> > 8223 309 1848 >> > 8401 487 2026 >> > 8224 310 1849 >> > 8402 488 2027 >> > 8225 311 1850 >> > 8403 489 2028 >> > 8226 312 1851 >> > 8404 450 2029 >> > 8227 313 1852 >> > 8228 314 1853 >> > =================================================================== >> > As we can see that the udp receiver got packet 8210 after it got 8399, >> > which >> > is the first reorder. Then, the receiver got 8211 to 8222 sequentially. >> > Then >> > it got packet from 8223-8227 and 8400-8404 interleaved. >> > >> > >> > ==================== event order seen by netmap bridge >> > ================== >> > get 8209 >> > poll called >> > get 8210 >> > ... >> > ... >> > get 8228 >> > poll called >> > get 8229 >> > ... >> > ... >> > get 8383 >> > poll called >> > get 8384 >> > ... >> > get 8387 >> > poll called >> > get 8388 >> > ... >> > get 8393 >> > poll called >> > get 8394 >> > ... >> > get 8399 >> > poll called >> > get 8400 >> > ... >> > get 8404 >> > poll called >> > get 8405 >> > =================================================================== >> > As we can see, from the event ordering see by the bridge.c, all the >> > packets >> > are receiver in order, which means the the reorder happens when the >> > bridge >> > code swap the buf_idx between the nic ring(slot) and the host >> > ring(slot). >> > The reordered seq usually right before or after the poll function call. >> > >> > Best, >> > Xiaoye >> > >> > >> > >> > >> > >> > >> > >> > >> > On Fri, Jan 29, 2016 at 4:27 PM, Luigi Rizzo wrote: >> >> >> >> On Fri, Jan 29, 2016 at 2:12 PM, Xiaoye Sun >> >> wrote: >> >> > Hi Luigi, >> >> > >> >> > Thanks for your advice. >> >> > I forgot to mention that I use the command "ethtool -L eth1 combined >> >> > 1" >> >> > to >> >> > set the number of rings of the nic to 1. The host also only has one >> >> > ring. >> >> > I understand the situation where the first tx ring is full so the >> >> > bridge >> >> > will swap the packets to the second tx ring and then the host/nic >> >> > might >> >> > drain either rings. But this is not the case in the experiment. >> >> >> >> ok good to know that. >> >> >> >> So if we have ruled out multiqueue and iommu, let's look at >> >> the internal allocator and at bridge.c >> >> >> >> 1. are you running the most recent version of netmap ? >> >> Some older version (probably 1-2 years ago) had a bug >> >> in the buffer allocator and some buffers were allocated >> >> twice. >> >> >> >> 2. can you tweak your receiver.c to report some more info >> >> on how often you get out of sequence packets, how much >> >> out of sequence they are ? >> >> Also it would be useful to report gaps on the increasing side >> >> (i.e. new_seq != old_seq +1 ) >> >> >> >> 3. can you tweak bridge.c so that it writes into the packet >> >> the netmap buffer indexes and slots on the rx and tx side, >> >> so when you detect a sequence error we can figure out >> >> where it is happening. >> >> Ideally you could also add the sequence number detection >> >> code in bridge.c so we can check whether the errors appear >> >> on the input or output sides. >> >> >> >> cheers >> >> luigi >> >> >> > >> >> >> >> -- >> -----------------------------------------+------------------------------- >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> TEL +39-050-2217533 . via Diotisalvi 2 >> Mobile +39-338-6809875 . 56122 PISA (Italy) >> -----------------------------------------+------------------------------- >> > -- -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------