Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Sep 2020 17:15:42 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        "sthaug@nethelp.no" <sthaug@nethelp.no>
Cc:        "net@FreeBSD.org" <net@FreeBSD.org>, "transport@freebsd.org" <transport@freebsd.org>
Subject:   RE: Socket option to configure Ethernet PCP / CoS per-flow
Message-ID:  <SN4PR0601MB37283BD2AFBC97768D92D90886240@SN4PR0601MB3728.namprd06.prod.outlook.com>
In-Reply-To: <20200911.185432.122001633.sthaug@nethelp.no>
References:  <SN4PR0601MB372898DF9D2838392B22C7AF86240@SN4PR0601MB3728.namprd06.prod.outlook.com> <20200911.185432.122001633.sthaug@nethelp.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Thank you for the quick feedback.

On a related note - it just occurred to me, that the PCP functionality coul=
d be extended to make more effective use of PFC (priority flow control) wit=
hout explicitly managing it on an application level directly.

Right now, PFC typically degenerates to good-old Flow control, as all traff=
ic is handled just in the default class (0, or whatever is set up using the=
 IOCTL interface API).

Typically, the different Ethernet classes come with a notion of prioritizat=
ion between them - traffic in a "higher" class may be forwarded prior to tr=
affic in a lower class. But that is not a strong requirement - using WRR wi=
th 1/8th bandwidth "reserved" for each class in a switch, assigning flows t=
o a random PCP value, PFC could work in a more scalable fashion - only bloc=
king a fraction of traffic, that is actually queue building (has to go over=
 a lower bandwidth link, or a NIC excessively pausing its ingress), thus re=
ducing the chance of the formation of congrestion trees...

E.g. PCP runs from 0 (default) to 7;=20

Adding a socket option to explicitly assign traffic to one of these flows w=
ould allow testing and configuring applications to make use of "real" prior=
itization capabilities of modern switches.

And what I was just pondering was a special interface level setting (e.g. 8=
), which results in a socket to pick a "random" value when created, to dist=
ribute packets across all the queues available in hardware, allowing PFC to=
 no longer collapse in effect to old FC style "on"/"off" for all traffic...=
=20

Perhaps someone here has experience with congestion tree formation in multi=
-hop switching environments, and can comment if the above approach would be=
 feasible to address that FC issue?


Richard Scheffenegger


-----Original Message-----
From: sthaug@nethelp.no <sthaug@nethelp.no>=20
Sent: Freitag, 11. September 2020 18:55
To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
Cc: net@FreeBSD.org; transport@freebsd.org
Subject: Re: Socket option to configure Ethernet PCP / CoS per-flow

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.




> However, while this allows all traffic sent via a specific interface to b=
e marked with a PCP (priority code point), it defeats the purpose of PFC (p=
riority flow control) which works by individually pausing different queues =
of an interface, provided there is an actual differentiation of traffic int=
o those various classes.
>
> Internally, we have added a socket option (SO_VLAN_PCP) to change the PCP=
 specifically for traffic associated with that socket, to be marked differe=
ntly from whatever the interface default is (unmarked, or the default PCP).
>
> Does the community see value in having such a socket option widely availa=
ble? (Linux currently doesn't seem to have a per-socket option either, only=
 a per-interface IOCTL API).

I've been doing quite a bit of network testing using iperf3 and similar too=
ls, and have wanted this type of functionality since the interface option b=
ecame available. Having this on a socket level would make it possible to te=
ach iperf3, ping and other tools to set PCP and facilitate/simplify testing=
 of L2 networks.

So the answer is a definite yes! This would be valuable.

Steinar Haug, Nethelp consulting, sthaug@nethelp.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB37283BD2AFBC97768D92D90886240>