Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Nov 2017 22:26:13 +0100
From:      "Kristof Provost" <kristof@sigsegv.be>
To:        "Catalin Salgau" <csalgau@users.sourceforge.net>
Cc:        freebsd-net@freebsd.org
Subject:   Re: BPF packet pagesize limit
Message-ID:  <A2A39E3C-8A17-4C17-A52D-0EF72F809F99@sigsegv.be>
In-Reply-To: <966f384c-10b4-d018-efb1-68a7064c9521@users.sourceforge.net>
References:  <966f384c-10b4-d018-efb1-68a7064c9521@users.sourceforge.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 19 Nov 2017, at 19:49, Catalin Salgau wrote:
> I'm trying to address the limitation in (upstream) net/vblade that was
> brought up in
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205164
> This is related to writes larger than hw.pagesize but smaller than the
> configured MTU with BPF.
> I traced this to sys/net/bpf.c where calls to bpfwrite() will use
> bpf_movein() which in turn uses m_get2() to allocate a single mbuf. 
> This
> will fail if the requested mbuf size is larger than MJUMPAGESIZE
> (defined as PAGE_SIZE on x86). I believe this should use m_getm2() and
> populate multiple mbufs.
> Code in NetBSD explicitly notes that they omit mbuf chaining, but this
> is not documentated behaviour in the man page.
>
Your analysis looks correct.

> Any chance of having this fixed in a supported release, or getting a
> usable/documented workaround?

Can you see if this works for you?

	diff --git a/sys/net/bpf.c b/sys/net/bpf.c
	index b176856cf35..b9ff40699bb 100644
	--- a/sys/net/bpf.c
	+++ b/sys/net/bpf.c
	@@ -547,9 +547,11 @@ bpf_movein(struct uio *uio, int linktype, struct 
ifnet *ifp, struct mbuf **mp,
	        if (len < hlen || len - hlen > ifp->if_mtu)
	                return (EMSGSIZE);

	-       m = m_get2(len, M_WAITOK, MT_DATA, M_PKTHDR);
	+       m = m_getm2(NULL, len, M_WAITOK, MT_DATA, M_PKTHDR);
	        if (m == NULL)
	                return (EIO);
	+       KASSERT(m->m_next == NULL, ("mbuf chains not supported here"));
	+
	        m->m_pkthdr.len = m->m_len = len;
	        *mp = m;

It’s a little icky to trust that this will produce a single mbuf 
rather than a chain, but it appears to be the case. Sadly the rest of 
the bpf code (and especially bpf_filter()) really needs the mbuf to have 
a single contiguous buffer.

Regards,
Kristof
From owner-freebsd-net@freebsd.org  Mon Nov 20 23:35:54 2017
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92B9CD94373
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Mon, 20 Nov 2017 23:35:54 +0000 (UTC)
 (envelope-from sunxiaoye07@gmail.com)
Received: from mail-ot0-x230.google.com (mail-ot0-x230.google.com
 [IPv6:2607:f8b0:4003:c0f::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5327D72893
 for <freebsd-net@freebsd.org>; Mon, 20 Nov 2017 23:35:54 +0000 (UTC)
 (envelope-from sunxiaoye07@gmail.com)
Received: by mail-ot0-x230.google.com with SMTP id s4so9066926ote.4
 for <freebsd-net@freebsd.org>; Mon, 20 Nov 2017 15:35:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=iNKW2W7RyghlNjRMNESJ3vjIK+4noElroYWolDSmNjM=;
 b=CHbM2IGu6qSE6yG7BstmXyh5FFaHqPhlHqF0F9a+QHwbVlOeMRqYAM5QbBWebNIDFC
 joCx1Z8z/Rg/SVhb5yBwnqkrVdYMSrkHnVl93qoboc/7V6Icjt/7jWOdCftdiBC1qwXT
 lo1UsLUyeAhOhPGVhnH0AuEt1iVa/z62r6xJxja0j440P7wliOYonA3p1bCGhm4hR3Yw
 CNgq7jmz0696W+6UZGBeeCEb7d40Ve72Z0x1TdBfX5lFKkhgLwsn0VTtIx8qelJ19NPG
 d3CeBSvBPOUkoWj3CerDoa8tGS4GPhTrk2o5aFrjXFL4fVWrZrbclbFnkyeU2L/sCH+D
 VI2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=iNKW2W7RyghlNjRMNESJ3vjIK+4noElroYWolDSmNjM=;
 b=ZE872F044rMQ1krlUJNtaduCszoLUZjk/n/sx2fTdKsAwiYnzhc4jUL62Wdpd6n4KD
 kX1PqDwjvSlXUIVOj1s79Yu26C7+1slUgfC+ndpZ6mp3JlfOybpjr0z/xe1Q7r2RSeWX
 CUzqmAxSsvPl7vXuDWVxfcJrAxzbX2qni4L9xnvm2i6vaqo/R2O6lWnnrqwRyg8DN67z
 znKSu38RBntlbj1u09U7+UXoQQLCGrGHBPxkEvJWr8bEOfLXeIsxgVtZQ2KAUc/VYl8E
 FxdPVXjCtI7z6EcOQ8dMAaUChnXNrdm/kE9Q1Li3BRuq5FteHeF/YY2L7Cc8V7nGeoDv
 d8QA==
X-Gm-Message-State: AJaThX5w/4tIGd4g/JY/InPXnkj9mjUMzfbM32wQ+4rxolDUAG6DURR5
 mNI8NTkaYLWHxnUHIsrPneUm9SWIuuAA3udb1eo=
X-Google-Smtp-Source: AGs4zMbLoP5aU4CtV/8k89xycvbEQHVppEQjQZGxDbZwAScVUbi7LOFPcgYywsfnLMrVJlthAldsw803iphcq/yIKhI=
X-Received: by 10.157.89.173 with SMTP id u45mr9992245oth.341.1511220953509;
 Mon, 20 Nov 2017 15:35:53 -0800 (PST)
MIME-Version: 1.0
Sender: sunxiaoye07@gmail.com
Received: by 10.157.14.167 with HTTP; Mon, 20 Nov 2017 15:35:53 -0800 (PST)
In-Reply-To: <CA+_eA9i5WOiA8j3y8fX65rzDLXEyt2B2wo8pK12jM2ZvEBURYg@mail.gmail.com>
References: <CAJnByzh4Kzp6-DXXcB06QHSBJpHBKhtDnKUn7R+K0A_5VUThyw@mail.gmail.com>
 <CA+_eA9i5WOiA8j3y8fX65rzDLXEyt2B2wo8pK12jM2ZvEBURYg@mail.gmail.com>
From: Xiaoye Sun <Xiaoye.Sun@rice.edu>
Date: Mon, 20 Nov 2017 17:35:53 -0600
X-Google-Sender-Auth: T_iUv2h4LTCMwAbzMacQMp5N5Hw
Message-ID: <CAJnByzhv27D_V=kyJjzQPQ28GM8kACKsH87MB5uDKVkQ-aka0g@mail.gmail.com>
Subject: Re: [netmap] when does a packet in the netmap ring send out exactly
To: Vincenzo Maffione <v.maffione@gmail.com>
Cc: FreeBSD Net <freebsd-net@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.25
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Nov 2017 23:35:54 -0000

Hi,

I found that the tail pointer only moves when the ring has less than half
of the slots available. This prevents me from knowing the accurate time
when the packet in a slot is processed. Is there a way to move the tail
pointer as long as the packet in the slot is processed? Is this a
configurable feature?

Best,
Xiaoye

On Fri, Oct 27, 2017 at 11:52 AM, Vincenzo Maffione <v.maffione@gmail.com>
wrote:

> Hi,
>   This is actually a limitation of the netmap API: ring->tail is exposed
> to the user so that it knows it can use the slots in the range
> "[ring->head..ring->tail[" for new transmissions (note that head is
> included, tail excluded, to prevent wraparound). However, there is no
> explicit indication of "up to what slots packets were transmitted".
> For hw NICs, however, ring->tail is an indication of where transmission
> was completed.
> Example:
> 1) at the beginning ring->tail = ring->head = ring->cur = 0
> 2) then your program moves head/cur forward: head = cur = 10
> 3) you call TXSYNC, to submit the packets to the NIC.
> 4) after the TXSYNC call, is very likely that tail is still 0, i.e.
> because no transmission has been completed by the NIC (and no interrupt
> generated).
> 5) say after 20 us you issue another TXSYNC,  and in the meanwhile 6
> packets had completed. In this case after TXSYNC you will find tail==5,
> meaning that packets in the slots 0,1,2,3,4 and 5 have been completed. Note
> that also the slot pointed by tail has been completed.
>
> But you are right that there is no way to receive completion notification
> if the queue is not full. You must use TXSYNC to check (by sleeping or busy
> wait) when tail moves forward.
>
> Cheers,
>   Vincenzo
>
>
> 2017-10-27 3:06 GMT+02:00 Xiaoye Sun <Xiaoye.Sun@rice.edu>:
>
>> Hi
>>
>> I write a netmap program that sends packets to the network. my program
>> uses one netmap ring and fills the ring slots with packets.
>> My program needs to do something (action A) after a particular packet
>> (packet P) in the ring slot is sent to the network. so the program tracks
>> the position of the tail point and checks if the tail point has moved
>> across the slot I used to put that packet P.
>> However, I found that the tail pointer may not move forward even seconds
>> after the receiver side got packet P.
>> Sometimes the tail pointer never moves forward until the TX ring is full.
>> I try ioctl(NIOCTXSYNC), however, it cannot 100% solve the problem.
>>
>> My question is that is there a way to make the TX ring empty as early as
>> possible so that I can know when my packet is sent out. or is there
>> another
>> way to know when the packet in the slot is sent to the network/NIC
>> physical
>> queue?
>>
>> I am using Linux 3.16.0-4-amd64.
>>
>> Thanks!
>>
>> Best,
>> Xiaoye
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>
>
>
> --
> Vincenzo Maffione
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A2A39E3C-8A17-4C17-A52D-0EF72F809F99>