From owner-freebsd-net@freebsd.org  Tue Aug 27 20:59:33 2019
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id D2464DF7CF
 for <freebsd-net@mailman.nyi.freebsd.org>;
 Tue, 27 Aug 2019 20:59:33 +0000 (UTC) (envelope-from vit@otcnet.ru)
Received: from mail.otcnet.ru (mail.otcnet.ru [194.190.78.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 46J1TJ5SZJz4L9c
 for <freebsd-net@freebsd.org>; Tue, 27 Aug 2019 20:59:32 +0000 (UTC)
 (envelope-from vit@otcnet.ru)
Received: from Victors-MacBook-Air-2.local (unknown [195.91.148.145])
 by mail.otcnet.ru (Postfix) with ESMTPSA id 8CA8689DF2;
 Tue, 27 Aug 2019 23:59:31 +0300 (MSK)
Subject: Re: finding optimal ipfw strategy
To: Eugene Grosbein <eugen@grosbein.net>,
 "Andrey V. Elsukov" <bu7cher@yandex.ru>, freebsd-net@freebsd.org
References: <f38b21a5-8f9f-4f60-4b27-c810f78cdc88@otcnet.ru>
 <4ff39c8f-341c-5d72-1b26-6558c57bff8d@grosbein.net>
 <a559d2bd-5218-f344-2e88-c00893272222@otcnet.ru>
 <ddaa55bc-1fa5-151b-258e-e3e9844802ef@yandex.ru>
 <c275f853-62a7-6bb7-d309-bf8a27d3dbae@grosbein.net>
 <f2aa4e0e-2339-d3e6-5a41-567b0c55b9e3@otcnet.ru>
 <eb249ea4-1f14-4826-d235-ed81e1c5e4d0@grosbein.net>
From: Victor Gamov <vit@otcnet.ru>
Organization: OTCnet
Message-ID: <d78ccbbf-115e-d550-077c-383de805556b@otcnet.ru>
Date: Tue, 27 Aug 2019 23:59:29 +0300
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:60.0)
 Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <eb249ea4-1f14-4826-d235-ed81e1c5e4d0@grosbein.net>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 46J1TJ5SZJz4L9c
X-Spamd-Bar: ------
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=pass (mx1.freebsd.org: domain of vit@otcnet.ru designates 194.190.78.3 as
 permitted sender) smtp.mailfrom=vit@otcnet.ru
X-Spamd-Result: default: False [-6.51 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+a:mail.otcnet.ru:c];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[otcnet.ru]; TO_DN_SOME(0.00)[];
 HAS_ORG_HEADER(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 NEURAL_HAM_SHORT(-0.98)[-0.982,0];
 IP_SCORE(-3.33)[ip: (-8.76), ipnet: 194.190.78.0/24(-4.38), asn: 50822(-3.50),
 country: RU(0.01)]; RCVD_NO_TLS_LAST(0.10)[];
 FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[];
 MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:50822, ipnet:194.190.78.0/24, country:RU];
 MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Aug 2019 20:59:33 -0000

On 27/08/2019 23:30, Eugene Grosbein wrote:
> 28.08.2019 2:20, Victor Gamov wrote:
> 
>> sysctl.conf
>> =====
>> net.link.ether.ipfw=1
>> net.link.bridge.ipfw=1
>> net.link.bridge.ipfw_arp=1
>> net.link.bridge.pfil_member=1
>>
>> net.inet.ip.fw.verbose_limit=100
>> net.inet.ip.fw.verbose=1
>> =====
> 
> You should avoid passing same packet multiple times through the ruleset.
> Less checks, better performance.

Yes, I feel it :-)


> Do you really use ipfw filtering based on layer2 parameters like MAC addresses?
> If not, you should disable net.link.ether.ipfw. If yes, you should use "layer2" keyword
> explicily in rules filtering by ethernet headers and place these rules above others
> and use "allow ip from any to any layer2" after L2 filtering is done,
> so L2 packets do not go through other rules extra time.
> 
> Do you really need to filter each bridged L3 packet twice? Once as "out xmit $bridge"
> and once as "out xmit $brige_member"? If not, you should disable
> net.link.bridge.ipfw and keep net.link.bridge.pfil_member=1 only.

Packets must be filtered on input VLANs (bridge members) and on output 
VLANs.  So net.link.bridge.pfil_member=1


> Perhaps, you are ruining the performance with such settings making same work 3 times without real need.
> 
> Do you really need filtering ARP? Disable net.link.bridge.ipfw_arp if not.

I need to drop ARP moving via bridge.  As I use many VLANs all VLAN must 
be isolated and only multicast must be bridged from one VLAN to others. 
  To block ARP following rule used:
deny ip from any to any mac-type 0x0806 via bridge1202

As I understand correctly I need net.link.bridge.ipfw_arp and 
net.link.bridge.ipfw to do it.  I'm not sure about net.link.ether.ipfw


>> `sysctl net.isr`
>> =====
>> sysctl net.isr
>> net.isr.numthreads: 1
>> net.isr.maxprot: 16
>> net.isr.defaultqlimit: 256
>> net.isr.maxqlimit: 10240
>> net.isr.bindthreads: 0
>> net.isr.maxthreads: 1
>> net.isr.dispatch: direct
>> =====
>>
>> I don't know about internals but I think high interrupt load is bad and it because NIC does >> not support per CPU-queue for example.
> 
> All decent igb(4) NICs support at least 8 hardware input queues unless disabled by driver/kernel.
> However, net.isr settings are not about such queues.
> 
> High interrupt number is definitely better than dropping input frames by NIC chip
> due to overflow of its internal buffers just because CPU was not notified it's time to get traffic
> out of these buffers. The driver tries not to overload CPU with interrupts and that's fine
> but default 8000 limit is not adequate to modern CPU and was not adequate for many years.
> Raise limit to 32000.

I see.  Thanks!  I'll tune net.isr ASAP.


>>> If not, you should try something like this. For loader.conf:
>>
>> Sorry, it's a production system and I can reboot it at the middle of October only.
>>
>>> #substitute total number of CPU cores in the system here
>>> net.isr.maxthreads=4
>>> # EOF
>>
>> Is it ok for multicast?  It's UDP traffic which must be ordered.  I read 'maxthreads=1' used to keep TCP traffic ordered.
> 
> It's a job for uplink to feed your bridge with ordered UDP flows. If you use igb(4) driver,
> FreeBSD kernel will keep flows ordered automatically. There is no place in the code
> where they could be reordered if you do not use lagg(4) without LACP.

Thanks again.  I'll set maxthreads=4 at next reboot.


>>> And if you haven't already seen it, you may find useful my blog post
>>> (in Russian) https://dadv.livejournal.com/139170.html
>>> It's a bit old but still can give you some light.
>> Yes, I read it already :-)   Also some Calomel articles.  I'll try to tune system at next reboot.
>> The main question for myself now is "how is this architecture correct" and "how many traffic is possible to process".
> 
> You have read numbers from my posts. ipfw+dummynet+PPPoE+routing+LACP+vlan tagging takes much more CPU power
> than just ipfw+bridging and my system still processed much more traffic.
> 
> Make sure you don't pass same packets multiple times through ipfw rules.
> ipfw has its counters for rules and you should use them to summarize octets carefully
> and compare with numbers shown by netstat or systat (they both have same in-kernel source)
> to verify whether packets go through ipfw extra times or not.

It's not too easy but I'll try to build test system and check on it.


If 'bridge + drop on outgoing' is not a bottleneck I'll tune system and 
use this approach while it's possible.


--
CU,
Victor Gamov