From owner-freebsd-net@freebsd.org Fri Sep 11 22:19:42 2020 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2F49F3E81BD for ; Fri, 11 Sep 2020 22:19:42 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from mx2.shrew.net (mx2.shrew.net [38.97.5.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Bp9Cx2j3pz3gkV for ; Fri, 11 Sep 2020 22:19:41 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from mail.shrew.net (mail.shrew.prv [10.24.10.20]) by mx2.shrew.net (8.15.2/8.15.2) with ESMTP id 08BMJY9K024274 for ; Fri, 11 Sep 2020 17:19:34 -0500 (CDT) (envelope-from mgrooms@shrew.net) Received: from [10.22.200.30] (unknown [136.49.68.36]) by mail.shrew.net (Postfix) with ESMTPSA id 467B9198D70 for ; Fri, 11 Sep 2020 17:19:29 -0500 (CDT) Subject: Re: Socket option to configure Ethernet PCP / CoS per-flow To: freebsd-net@freebsd.org References: <20200911.185432.122001633.sthaug@nethelp.no> From: Matthew Grooms Message-ID: <5db84c2c-701a-c93a-34d1-f326525a1f8e@shrew.net> Date: Fri, 11 Sep 2020 17:19:36 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx2.shrew.net [10.24.10.11]); Fri, 11 Sep 2020 17:19:34 -0500 (CDT) X-Rspamd-Queue-Id: 4Bp9Cx2j3pz3gkV X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of mgrooms@shrew.net designates 38.97.5.132 as permitted sender) smtp.mailfrom=mgrooms@shrew.net X-Spamd-Result: default: False [-2.36 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-0.92)[-0.917]; RCVD_COUNT_THREE(0.00)[3]; DMARC_NA(0.00)[shrew.net]; NEURAL_HAM_SHORT(-0.37)[-0.365]; NEURAL_HAM_MEDIUM(-0.78)[-0.776]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:174, ipnet:38.0.0.0/8, country:US]; RCVD_TLS_LAST(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net]; RECEIVED_SPAMHAUS_PBL(0.00)[136.49.68.36:received] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2020 22:19:42 -0000 On 9/11/2020 12:15 PM, Scheffenegger, Richard wrote: > Thank you for the quick feedback. > > On a related note - it just occurred to me, that the PCP functionality could be extended to make more effective use of PFC (priority flow control) without explicitly managing it on an application level directly. > > Right now, PFC typically degenerates to good-old Flow control, as all traffic is handled just in the default class (0, or whatever is set up using the IOCTL interface API). > > Typically, the different Ethernet classes come with a notion of prioritization between them - traffic in a "higher" class may be forwarded prior to traffic in a lower class. But that is not a strong requirement - using WRR with 1/8th bandwidth "reserved" for each class in a switch, assigning flows to a random PCP value, PFC could work in a more scalable fashion - only blocking a fraction of traffic, that is actually queue building (has to go over a lower bandwidth link, or a NIC excessively pausing its ingress), thus reducing the chance of the formation of congrestion trees... > > E.g. PCP runs from 0 (default) to 7; > > Adding a socket option to explicitly assign traffic to one of these flows would allow testing and configuring applications to make use of "real" prioritization capabilities of modern switches. > > And what I was just pondering was a special interface level setting (e.g. 8), which results in a socket to pick a "random" value when created, to distribute packets across all the queues available in hardware, allowing PFC to no longer collapse in effect to old FC style "on"/"off" for all traffic... > > Perhaps someone here has experience with congestion tree formation in multi-hop switching environments, and can comment if the above approach would be feasible to address that FC issue? > > > Richard Scheffenegger Hey There Richard, I live in Austin where we are fortunate enough to have Google Fiber. And while I love the service, I hate the idea of being forced to use the Google Fiber black box as my edge device. But get full use of the service, you have to set VLAN + PCP values appropriately or you hit a Google imposed traffic shaping bottleneck. In any case, I was able to do this using pf as the packet classifier. You simply write a rule to match the traffic and assign the desired value. Perhaps this may be a way to accomplish what you're trying to do without having to add a new socket option. Have a look at the pf.conf man page and search for 'set prio'. I assume ipfw has an equivalent feature as well ...      set prio priority | (priority, priority)            Packets matching this rule will be assigned a specific queueing            priority.  Priorities are assigned as integers 0 through 7.  If the            packet is transmitted on a vlan(4) interface, the queueing priority            will be written as the priority code point in the 802.1Q VLAN            header.  If two priorities are given, packets which have a TOS of            lowdelay and TCP ACKs with no data payload will be assigned to the            second one. Hope this helps, -Matthew