Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Feb 2023 06:42:05 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        Rick Macklem <rmacklem@freebsd.org>, bz@freebsd.org, jamie@freebsd.org,  src-committers@freebsd.org, dev-commits-src-all@freebsd.org,  dev-commits-src-main@freebsd.org
Subject:   Re: git: ef6fcc5e2b07 - main - nfsd: Add VNET_SYSUNINIT() macros for vnet cleanup
Message-ID:  <CAM5tNy4yg0PUdWmPg2qtWew8kFzzL8mwe3UUZpAwCDX8NOtKAQ@mail.gmail.com>
In-Reply-To: <Y/gqLrIPQGcuWhTr@FreeBSD.org>
References:  <202302202112.31KLCfQB080359@gitrepo.freebsd.org> <Y/Uz5B6ywWjaa1TB@FreeBSD.org> <CAM5tNy6KFzCarKWze4mjbnqsERUoshHFfZeXzZMF3RurVKf7TA@mail.gmail.com> <Y/cHpcn/1S3Q0Swa@FreeBSD.org> <CAM5tNy6raj72UdmN%2BmFAzWOR%2BzFEV-eMWU6%2BXcSrjKXycu1BBw@mail.gmail.com> <Y/gqLrIPQGcuWhTr@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 23, 2023 at 7:08 PM Gleb Smirnoff <glebius@freebsd.org> wrote:
>
>   Rick,
>
> On Thu, Feb 23, 2023 at 05:56:06AM -0800, Rick Macklem wrote:
> R> > This one actually doesn't look correct to me. What happens here is that the sysctls
> R> > will affect only the default VNET.
> R> >
> R> Yes, but the sysctls are mostly useless anyhow. I don't know how to make them
> R> work in a prison. (I know how to use SYSCTL_FLAG_VNET, but that does not
> R> work in this case.)
>
> Doesn't they work as intended in my patch D38742?
Yes, and my pretty trivial D38748 does as well. What I was trying to
say above was
"I don't know how to vnet sysctls done under a SYSCTL_ADD_NODE()".. One of
those also exists in the krpc.

>
> R> > I think the VNET-itezation of this file went a bit wrong. You don't need to convert
> R> > static structures to malloced when you VNET-ize a module. The infrastructure should
> R> > take care of memory management.
> R> >
> R> Not if you want to keep the vnet footprint small. This was necessary
> R> so that nfsd.ko
> R> (and friends) will load dynamically. Without the conversion to
> R> mallocs, it would complain
> R> the vnet was out of memory when nfsd.ko tried to load.
> R> (I'm sure I didn't need to do all of them, but it made sense to keep
> R> the vnet footprint
> R> as small as possible.)
> R> > The dynamic sysctl context seems to be unneeded from the very beginning. It was
> R> > always attached to the static softc. Suggested patch:
> R> >
> R> > https://reviews.freebsd.org/D38742
> R> >
> R> I'll look at it, but if it stops malloc'ng softc, then I will be
> R> worried w.r.t. vnet footprint and
> R> dynamic loading. (Note that the structure has an array of hash list
> R> pointers in it, so it
> R> is rather large.
>
> Replying to you and Bjoern's email too here. Well, if we want a fully blown
> virtualized network stack, and this is what VIMAGE is, then, well, we need a
> full chunk of memory to keep a network stack data. So, if there is a limit
> there, (Bjoern mentioned 8k) then this limit needs to be increased as more
> and more subsystems are virtualized. I also don't see how we actually save
> any memory using malloc(9) instead of using memory provided by VIMAGE? The
> kmem use would be roughly the same if not worse. What exactly are we saving
> here?
>
I know nothing about the internals of VIMAGE, so I'll leave that to
others to discuss.
I will point out that I do not see any disadvantage to using malloc().


> R> Another reason (along with dynamic loading of the modules) for keeping
> R> the foot print
> R> small is to try and keep the jails that do not use nfsd(8) and friends
> R> lightweight.
> R> When I originally coded it, I put it under a kernel option called
> R> VNET_NFSD (still in
> R> kern_jail.c at this time), but others felt that a new kernel option
> R> wasn't desirable,
> R> (There is a lot of discussion under D37519. The downside of discussions that
> R> happen in phabricator is that not as many people see them. I started an email
> R> thread on freebsd-current@, but it quickly migrated to D37519. Just
> R> the way things
> R> currently happen.)
>
> I briefly looked into D37519 and didn't find any discussion over memory
> footprint and savings. I agree that a kernel option of VNET_NFSD is undesirable.
> The global option VIMAGE shall control that and nothing else.
>
Well, there was a discussion of how to gte nfsd.ko to load dynamically (after
I found out it would not) somewhere. The answer was basically "malloc arrays and
big data structures", which worked and I am fine with.

> R> > For example nfsstatsv1, which is now malloced for non-default vnets and static for
> R> > the default. Lots of modules still incorrectly update the global one.
> R> >
> R> Nope. "struct nfsstatsv1" is a structure that is shared with the NFS
> R> client, which is
> R> not vnet'd. As such, a static "struct nfsstatsv1" needs to exist for prison0.
> R>
> R> My original solution was to create a separate structure for the server
> R> side stuff
> R> (vnet it), but then the result was a messy copying exercise for the system call,
> R> which returns the entire structure (with client and server info).
> R>
> R> Once the client side is vnet'd (I've already had a couple of emails asking me
> R> to do it, so I plan on starting to work on it) then, yes, the
> R> structure can become
> R> a malloc'd vnet'd one and the IS_DEFAULT_VNET can go away.
>
> I see the problem. So, we got several things that need to be done to
> nfsstatsv1: virtualize, separate server and client, possibly make them use
> counter(9) to avoid races and performance issues. And I'm afraid there is
> API/ABI that needs to be preserved? Depending on order of doing changes to
> nfsstatsv1 the amount of work is going to be different.
>
> So, if we start with my D38743 the only problem observed is that the client
> code will use virtual version of the stats. So if somebody decides to run
> a client from a vnet jail, it will fill this jail stats instead of global.
> Is it a big deal if the future plan is actually to make client virtualized
> too?
Your patch is fine. It just requires the rest of the work of vnet'ng the client
to be done and I have not done that yet.

Until then, your patch would just break client mounts. I'm pretty sure they
would crash almost instantly, although I have not tried it.

In the meantime, the code needs to stay in a working form.
(I also won't guarantee I'll get the client vnet'ng done in time for FreeBSD14,
although I think it will happen.)

>
> R> I am not sure why you think IS_DEFAULT_VNET() is such a big deal in
> R> the initialization code? It work. I can see an argument for using both
> R> SYSINIT() and VNET_SYSINIT() for cases where you have both non-vnet;d
> R> and vnet'd data to initialize. However, I don't see any problem with using
> R> IS_DEFAULT_VNET() when needed.
>
> My main concern is complexity and its unforseen consequences. The original design
> of VIMAGE is simple - just wrap every global variable into VNET() macro and you
> are done. Don't make non-default vnet in any sense different to the default one!
> It actually works pretty good if the recipe is being followed. Sometimes this leads
> to pretty large mechanical changes, and people try to avoid that. For example in
> pf(4) we had quite a long story with getting it stable with VIMAGE, and I think
> that was because we just didn't do it right from the beginning.
>
Yea, as you noted, the NFS code can get complex and the nfsstatsv1 part is a bit
messy (mostly maintaining backwards compatibility for old versions of
the structure).

This simple use of IS_DEFAULT_VNET() fixes the problem until the client side is
vnet'd. Then it can go away.

> So with NFS we started to create complexity and we already got problems. The
> memory definitely is used after free, see:
>
> https://ci.freebsd.org/job/FreeBSD-main-amd64-test/23034/console
Yes, I think D38750 fixes this. For my test setup (a main GENERIC kernel),
VNET_SYSUNINIT() functions never get called. After "jail -r" they are stuck in
dying state. This happens even for non-vnet jails. jamie@ has reproduced the
problem on a setup he did.

As such, I can't test the VNET_SYSUNINIT() stuff, so I only saw this problem
when the KASAN testing found it.
It would be interesting to know how the kernel used in the testing setup differs
from GENERIC, because the kernel in the testing environment does appear to
call the VNET_SYSUNINIT() functions?

rick

>
> --
> Gleb Smirnoff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4yg0PUdWmPg2qtWew8kFzzL8mwe3UUZpAwCDX8NOtKAQ>