Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Feb 2023 19:08:30 -0800
From:      Gleb Smirnoff <glebius@freebsd.org>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        Rick Macklem <rmacklem@freebsd.org>, bz@freebsd.org, jamie@freebsd.org, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: ef6fcc5e2b07 - main - nfsd: Add VNET_SYSUNINIT() macros for vnet cleanup
Message-ID:  <Y/gqLrIPQGcuWhTr@FreeBSD.org>
In-Reply-To: <CAM5tNy6raj72UdmN%2BmFAzWOR%2BzFEV-eMWU6%2BXcSrjKXycu1BBw@mail.gmail.com>
References:  <202302202112.31KLCfQB080359@gitrepo.freebsd.org> <Y/Uz5B6ywWjaa1TB@FreeBSD.org> <CAM5tNy6KFzCarKWze4mjbnqsERUoshHFfZeXzZMF3RurVKf7TA@mail.gmail.com> <Y/cHpcn/1S3Q0Swa@FreeBSD.org> <CAM5tNy6raj72UdmN%2BmFAzWOR%2BzFEV-eMWU6%2BXcSrjKXycu1BBw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  Rick,

On Thu, Feb 23, 2023 at 05:56:06AM -0800, Rick Macklem wrote:
R> > This one actually doesn't look correct to me. What happens here is that the sysctls
R> > will affect only the default VNET.
R> >
R> Yes, but the sysctls are mostly useless anyhow. I don't know how to make them
R> work in a prison. (I know how to use SYSCTL_FLAG_VNET, but that does not
R> work in this case.)

Doesn't they work as intended in my patch D38742?

R> > I think the VNET-itezation of this file went a bit wrong. You don't need to convert
R> > static structures to malloced when you VNET-ize a module. The infrastructure should
R> > take care of memory management.
R> >
R> Not if you want to keep the vnet footprint small. This was necessary
R> so that nfsd.ko
R> (and friends) will load dynamically. Without the conversion to
R> mallocs, it would complain
R> the vnet was out of memory when nfsd.ko tried to load.
R> (I'm sure I didn't need to do all of them, but it made sense to keep
R> the vnet footprint
R> as small as possible.)
R> > The dynamic sysctl context seems to be unneeded from the very beginning. It was
R> > always attached to the static softc. Suggested patch:
R> >
R> > https://reviews.freebsd.org/D38742
R> >
R> I'll look at it, but if it stops malloc'ng softc, then I will be
R> worried w.r.t. vnet footprint and
R> dynamic loading. (Note that the structure has an array of hash list
R> pointers in it, so it
R> is rather large.

Replying to you and Bjoern's email too here. Well, if we want a fully blown
virtualized network stack, and this is what VIMAGE is, then, well, we need a
full chunk of memory to keep a network stack data. So, if there is a limit
there, (Bjoern mentioned 8k) then this limit needs to be increased as more
and more subsystems are virtualized. I also don't see how we actually save
any memory using malloc(9) instead of using memory provided by VIMAGE? The
kmem use would be roughly the same if not worse. What exactly are we saving
here?

R> Another reason (along with dynamic loading of the modules) for keeping
R> the foot print
R> small is to try and keep the jails that do not use nfsd(8) and friends
R> lightweight.
R> When I originally coded it, I put it under a kernel option called
R> VNET_NFSD (still in
R> kern_jail.c at this time), but others felt that a new kernel option
R> wasn't desirable,
R> (There is a lot of discussion under D37519. The downside of discussions that
R> happen in phabricator is that not as many people see them. I started an email
R> thread on freebsd-current@, but it quickly migrated to D37519. Just
R> the way things
R> currently happen.)

I briefly looked into D37519 and didn't find any discussion over memory
footprint and savings. I agree that a kernel option of VNET_NFSD is undesirable.
The global option VIMAGE shall control that and nothing else.

R> > For example nfsstatsv1, which is now malloced for non-default vnets and static for
R> > the default. Lots of modules still incorrectly update the global one.
R> >
R> Nope. "struct nfsstatsv1" is a structure that is shared with the NFS
R> client, which is
R> not vnet'd. As such, a static "struct nfsstatsv1" needs to exist for prison0.
R> 
R> My original solution was to create a separate structure for the server
R> side stuff
R> (vnet it), but then the result was a messy copying exercise for the system call,
R> which returns the entire structure (with client and server info).
R> 
R> Once the client side is vnet'd (I've already had a couple of emails asking me
R> to do it, so I plan on starting to work on it) then, yes, the
R> structure can become
R> a malloc'd vnet'd one and the IS_DEFAULT_VNET can go away.

I see the problem. So, we got several things that need to be done to
nfsstatsv1: virtualize, separate server and client, possibly make them use
counter(9) to avoid races and performance issues. And I'm afraid there is
API/ABI that needs to be preserved? Depending on order of doing changes to
nfsstatsv1 the amount of work is going to be different.

So, if we start with my D38743 the only problem observed is that the client
code will use virtual version of the stats. So if somebody decides to run
a client from a vnet jail, it will fill this jail stats instead of global.
Is it a big deal if the future plan is actually to make client virtualized
too?

R> I am not sure why you think IS_DEFAULT_VNET() is such a big deal in
R> the initialization code? It work. I can see an argument for using both
R> SYSINIT() and VNET_SYSINIT() for cases where you have both non-vnet;d
R> and vnet'd data to initialize. However, I don't see any problem with using
R> IS_DEFAULT_VNET() when needed.

My main concern is complexity and its unforseen consequences. The original design
of VIMAGE is simple - just wrap every global variable into VNET() macro and you
are done. Don't make non-default vnet in any sense different to the default one!
It actually works pretty good if the recipe is being followed. Sometimes this leads
to pretty large mechanical changes, and people try to avoid that. For example in
pf(4) we had quite a long story with getting it stable with VIMAGE, and I think
that was because we just didn't do it right from the beginning.

So with NFS we started to create complexity and we already got problems. The
memory definitely is used after free, see:

https://ci.freebsd.org/job/FreeBSD-main-amd64-test/23034/console

-- 
Gleb Smirnoff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Y/gqLrIPQGcuWhTr>