From owner-freebsd-net@freebsd.org Wed Nov 22 13:39:57 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E02EDEBF68 for ; Wed, 22 Nov 2017 13:39:57 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: from mail-qk0-x231.google.com (mail-qk0-x231.google.com [IPv6:2607:f8b0:400d:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BBF3F7AB18 for ; Wed, 22 Nov 2017 13:39:56 +0000 (UTC) (envelope-from v.maffione@gmail.com) Received: by mail-qk0-x231.google.com with SMTP id v137so16755086qkb.1 for ; Wed, 22 Nov 2017 05:39:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=U8RC0OErPfDWW3GK8am2ItmsELXnXS8CFfXAVwziKYA=; b=TrUaEdyJBGqnNbp9wuuMkPqHsJBCO/ihb0CaqUun76PEuF0UcPnSBHIkchXsu0NJfQ Erd/4MSLrZ+O1STad5npzckF31xaDNWT2Qwriil/3nP62kck18djdT7/h6lmCMRh+bVE lRorqNmszZnN03h2KjTCZes4E/IpnwAeI19HDIqb4YnV99nmO6CYYHSuNwVa8R1ZsqUX WlTmvJh5r1onJbmTiBv9RGyd+1H2T8sIrCWUeL75wOOS1ZExjQovsR3PeL8xHo0w6ptY 0yqOWWrWONBCGz3KP8/mKuql5SevBcQiQ2EV0ijQdC00qvgjXfXB9phwbGtU3rPbUwiB LunQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=U8RC0OErPfDWW3GK8am2ItmsELXnXS8CFfXAVwziKYA=; b=uZVwYBypbYg9bNkvK8iadpDEIs5gTQ7JMW6i5w0gWjSxH2jwtt/BNxLu/DahV07JK2 t+N+cfKWwj0OK0bpml7Txw24fbVecNZ8AkLV+jQkrPtEhk1jJHJXz1XtJ8q2iaVaqqgX K6OYLq9EYp7hyt+1xGtvhJpwYzJwx6pYp1WdqNJKDK8qZ9KvxQ46ZJwKeB6VcdSrufKV s0WRgWl7uqlbi7u1YKgkv5qHhLLHJiLdPTLeoMe3xibQvrYNqzEAVTRFgR/mdrUm/CPa OjOCM5kIzc7qRUruDIS/MCVncfFxEsEAPSwn2mE5nDRTvQ4uBwmEPiF+cja5tsABnlXZ w9Ig== X-Gm-Message-State: AJaThX6ZvNa9VvoI4wU9LOMegpvMzf4lDDDq40cVUow9tgB5Vjme79fm PQ34tGz3R/ZadbmEMx2uJvjg55aFQp+TWBfHwIw= X-Google-Smtp-Source: AGs4zMbnB7GQoMpTze72LQ6MdE58ivLfEBRIPOLJYa6FE7iZ9j4XiQdd77a4pzE7nBQ2PLmi0JF5hm3MODlVf3QHRfA= X-Received: by 10.55.120.199 with SMTP id t190mr15933519qkc.63.1511357995745; Wed, 22 Nov 2017 05:39:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.174.25 with HTTP; Wed, 22 Nov 2017 05:39:55 -0800 (PST) In-Reply-To: References: From: Vincenzo Maffione Date: Wed, 22 Nov 2017 14:39:55 +0100 Message-ID: Subject: Re: swaping ring slots between NIC ring and Host ring does not always success To: Xiaoye Sun Cc: Luigi Rizzo , "freebsd-net@freebsd.org" , Victor Detoni , Pavel Odintsov Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Nov 2017 13:39:57 -0000 Hi, 2017-11-21 7:51 GMT+01:00 Xiaoye Sun : > Hi, > > Recently I found another problem with netmap. I think this new problem > could be related to the problems in this threads so I just post the new > problem here. > > In my setup, I have a sender program having a netmap ring (a pair of > RX/TX ring) for the NIC and a ring for the host stack. The sender program > puts customized packets (each packet has a unique sequence number and the > sender sends the packet in a sequence number increasing order) to the NIC > TX ring directly and also forwards the packets from the host RX ring to the > NIC TX ring using "zerocopy" by swapping the buffer indices. > However, the receiver sees duplicated customized packets. For example, in > the case where the ring size is 32 (32 slots in a ring) the order of the > sequence numbers the receiver see is 1,2,3,4,5,...,68,69,*70* > ,71,72,73,...,99,100,*70*,101,102,103,... . An interesting thing I found > is > that the "gaps" between these two duplicated packets (70 in the example) > are always a number very close to the ring size, 32 in this example. In my > experiment, I use a ring with 4096 slots and the gap is always more than > 4090 and close to 4096. I verified that this duplication happens due to the > sender, not the receiver. Assuming my sender's implementation is correct, > then this duplication may happen in netmap and the NIC driver (ixgbe). > Netmap itself doesn't do any duplication nor takes a look at the packets. It just passes down ring->cur/ring->head to the ixgbe driver (after validation). The ixgbe driver datapath is bypassed and replaced with a netmap-enabled datapath (see https://github.com/luigirizzo/netmap/blob/master/LINUX/ixgbe_netmap_linux.h#L294-L461 ); no duplication should happen there as each netmap slot (1 TX packet) is used only once. > > > Thinking back to the original problem in this post, I think these problems > may be related. It seems to me that there could be multiple threads pulling > the packets from the NIC TX ring (or the thread moved to other CPUs when > the problem occurs) and these threads may run on different cores so that > the outdated content in the buffer may be sent out when new content is > written to the buffer. > > There are no such threads pulling from the NIC TX ring. Your application directly puts new packets to be transmitted in the netmap buffers referenced in the netmap TX ring. When then you call NIOCTXSYNC or poll(), all the new TX buffers (e.g. all the ones from the previous value ring->head (included) to the new value of ring->head (excluded)) are moved to the NIC TX ring. This happens in the context of your application thread, no worker threads are used. Then the NIC hardware starts the transmission. > I am wondering if there is a way to pin the NIC driver of the netmap module > to a specific core. or is there a way to know the root of such problem? > The only threads are the ones of your application. Maybe your problem comes from concurrent accesses to the netmap TX ring from different threads? Only one thread at a given time should update a netmap TX/RX ring. Otherwise the behaviour is unspecified. Cheers, Vincenzo > > Best, > Xiaoye > > -- Vincenzo Maffione