From owner-freebsd-current@freebsd.org Tue Oct 30 16:46:15 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7793610EDA1D for ; Tue, 30 Oct 2018 16:46:15 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E0AFE6F2E5; Tue, 30 Oct 2018 16:46:14 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w9UGkB4P062561; Tue, 30 Oct 2018 09:46:11 -0700 (PDT) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w9UGkBce062560; Tue, 30 Oct 2018 09:46:11 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201810301646.w9UGkBce062560@pdx.rh.CN85.dnsmgr.net> Subject: Re: 12.0-BETA1 vnet with pf firewall In-Reply-To: <9D50D781-73BA-45B0-ADBB-CF01DE587BC5@lists.zabbadoz.net> To: "Bjoern A. Zeeb" Date: Tue, 30 Oct 2018 09:46:11 -0700 (PDT) CC: Kristof Provost , Ernie Luzar , FreeBSD current X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2018 16:46:15 -0000 > On 30 Oct 2018, at 14:14, Rodney W. Grimes wrote: > > >> On 30 Oct 2018, at 14:29, Bjoern A. Zeeb wrote: > >>> On 30 Oct 2018, at 12:23, Kristof Provost wrote: > >>>> I?m not too familiar with this part of the vnet code, but it looks > >>>> to me like we?ve got more per-vnet variables that was originally > >>>> anticipated, so we may need to just increase the allocated space. > >>> > >>> Can you elfdump -a the two modules and see how big their set_vnet > >>> section sizes are? I see: > >>> > >>> pf.ko: sh_size: 6664 > >>> ipl.ko: sh_size: 2992 > >>> > >> I see exactly the same numbers. > >> > >>> VNET_MODMIN is two pages (8k). So yes, that would exceed the module > >>> space. > >>> Having 6.6k global variable space is a bit excessive? Where does > >>> that > >>> come from? multicast used to have a similar problem in the past > >>> that > >>> it could not be loaded as a module as it had a massive array there > >>> and > >>> we changed it to be malloced and that reduced it to a pointer. > >>> > >>> 0000000000000f38 l O set_vnet 0000000000000428 > >>> vnet_entry_pfr_nulltable > >> That?s a default table. It?s large because it uses MAXPATHLEN for > >> the pfrt_anchor string. > >> > >>> 0000000000000b10 l O set_vnet 00000000000003d0 > >>> vnet_entry_pf_default_rule > >> Default rule. Rules potentially contain names, tag names, interface > >> names, ? so it?s a large structure. > >> > >>> 0000000000001370 l O set_vnet 0000000000000690 > >>> vnet_entry_pf_main_anchor > >> Anchors use MAXPATHLEN for the anchor path, so that?s 1024 bytes > >> right > >> away. > >> > >>> 0000000000000000 l O set_vnet 0000000000000120 > >>> vnet_entry_pf_status > >>> > >> pf status. Mostly counters. > >> > >> I?ll see about putting moving those into the heap on my todo list. > > > > Though that removes the current situation, it is a partial fix, > > doesnt this static sized 2 page VNET_MODMIN needs to be fixed in the > > longer term? > > I think about it the other way round: we might want to bump it to 4 > pages in short term for 12.0 maybe? 81MB in your 10000 vnet case, not horrible really as a quick and slightly dirty fix. > The problem is that whether or not you use modules these 2/4 pages will > be allocated per-vnet, so if you run 50 vnet jails that?s 100/200 > pages. And while people might say memory is cheap, I?ve run 10.000 > vnet jails before on a single machine ? it adds up. I wonder if we > could make it a tunable though.. Let me quickly think about it and come > up with a patch. A boot time tunable that defaulted to 2 pages would be good too. We are faced with similiar issues in bhyve, where each VM created gets an allocation based on MAXCPU, I have wip in process to alter this so that the 4th (and currently hidden) cpu topology option controls the size of these allocations. > > I?ll also go and see to get better error reporting into the > link_elf*.c files for this case. Great! -- Rod Grimes rgrimes@freebsd.org