Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Dec 2022 16:20:45 +0000 (UTC)
From:      "Bjoern A. Zeeb" <bz@freebsd.org>
To:        Zhenlei Huang <zlei.huang@gmail.com>
Cc:        Gleb Smirnoff <glebius@freebsd.org>,  "freebsd-jail@freebsd.org" <freebsd-jail@freebsd.org>
Subject:   Re: What's going on with vnets and epairs w/ addresses?
Message-ID:  <4r8p3sn4-7no8-n2p2-9r16-n8sq3qs4p528@serrofq.bet>
In-Reply-To: <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org>
References:  <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <B6C70A88-11F8-40D7-85E4-63BBA0F7931D@FreeBSD.org> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> <Y534qgEG1nX5i1iB@FreeBSD.org> <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet> <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 18 Dec 2022, Zhenlei Huang wrote:

>
>> On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb <bz@freebsd.org> wrote:
>>
>> On Sat, 17 Dec 2022, Gleb Smirnoff wrote:
>>
>>> Zhenlei,
>>>
>>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
>>> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
>>> Z>
>>> Z> -------------------------------------------
>>> Z> #!/bin/sh
>>> Z>
>>> Z> # test jail name
>>> Z> n="test_ref_leak"
>>> Z>
>>> Z> jail -c name=$n path=/ vnet persist
>>> Z> # The following line trigger jail pr_ref leak
>>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
>>> Z>
>>> Z> jail -R $n
>>> Z>
>>> Z> # wait a moment
>>> Z> sleep 1
>>> Z>
>>> Z> jls -j $n
>>> Z>
>>> Z> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2]
>>> Z>
>>> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 <https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915>;
>>> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b <https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b>;
>
> I can confirm [2] also affects Non-VNET jails.
> Prison pr_ref leak cause jail stuck in dying state.

Usually a TCP connection in TW would do this in the old days and things
would solve themselves after a while.  This was always the case even
long before vnet or multi-IP jails.


>>> Z>
>>> Z>
>>> Z> In [1] the per-VNET uma zone is shared with the global one.
>>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
>>> Z>
>>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() .
>>> Z>
>>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately ,
>>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>>> Z>
>>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy.
>>>
>>> This is known issue and I'd prefer not to call it a problem. The "leak" of a jail
>>> happens only if machine is idle wrt the networking activity.
>>>
>>> Getting back to the problem that started this thread - the epair(4)s not immediately
>>> popping back to prison0. IMHO, the problem again lies in the design of if_vmove and
>>> epair(4) in particular. The if_vmove shall not exist, instead we should do a full
>>> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove doesn't
>>> carry any useful information. With Alexander melifaro@ we discussed better options
>>> for creating or attaching interfaces to jails that if_vmove. Until they are ready
>>> the most easy workaround to deal with annoying epair(4) come back problem is to
>>> remove it manually before destroying a jail, like I did in 80fc25025ff.
>>
>> Ok, move an em0 or cxl0 into the jail;  the problem will be the same I
>> bet and you need the physical interface to not disappear as then you
>> cannot re-create a new jail with it.
>
> Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no chance to be called, thus
> if_vmove is not called and epair(4)s or em0, exl0 are not returned to home vnet.
>
> That can be confirmed by setting debug point on vnet_destroy by DDB, and then create and destroy vnet jails.
>
> So before the problem prison pr_ref count leaks is resolved, it will cover other potential problems such as @glebius
> pointed out.
>
> I think the problem that prison ref count leaks should be resolved first.
>
> I'm also reviewing the life cycles of prison / vnet and it seems they could still be improved.

But that's the not the problem here as your own test case pointed out.

The point is that if you start a plain vnet jail put an interface in and
destroy the jail that works instantly.
The moment you put an address on any interface (incl. loopback as your
test showed, which will not do ARP/NDP things compared to an ethernet
interface) the jail will no longer die immediately.

Simply putting an address on an interface should not defer things.
So indeed something holds onto things there and is not cleaned up
anymore.  Finding that "something" is the important bit and being able
to clean it up.

I always say, if you have a machine in shutdown -r you don't want it
hanging for hours either (now if you toggle the power switch you can do
a lot more without panicing the rest of the system but with jails we
cannot do that).  And we did have vnet jails shutting down preoperly and
clearing up for years.  People had spent a lot of time on that.  So it is
possible and we need to get back to that state.

/bz

-- 
Bjoern A. Zeeb                                                     r15:7



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4r8p3sn4-7no8-n2p2-9r16-n8sq3qs4p528>