Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Dec 2022 14:10:03 +0800
From:      Zhenlei Huang <zlei.huang@gmail.com>
To:        "Bjoern A. Zeeb" <bz@freebsd.org>
Cc:        Gleb Smirnoff <glebius@freebsd.org>, "freebsd-jail@freebsd.org" <freebsd-jail@freebsd.org>
Subject:   Re: What's going on with vnets and epairs w/ addresses?
Message-ID:  <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org>
In-Reply-To: <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet>
References:  <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <B6C70A88-11F8-40D7-85E4-63BBA0F7931D@FreeBSD.org> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> <Y534qgEG1nX5i1iB@FreeBSD.org> <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb <bz@freebsd.org> wrote:
>=20
> On Sat, 17 Dec 2022, Gleb Smirnoff wrote:
>=20
>> Zhenlei,
>>=20
>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
>> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
>> Z>
>> Z> -------------------------------------------
>> Z> #!/bin/sh
>> Z>
>> Z> # test jail name
>> Z> n=3D"test_ref_leak"
>> Z>
>> Z> jail -c name=3D$n path=3D/ vnet persist
>> Z> # The following line trigger jail pr_ref leak
>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
>> Z>
>> Z> jail -R $n
>> Z>
>> Z> # wait a moment
>> Z> sleep 1
>> Z>
>> Z> jls -j $n
>> Z>
>> Z> After DDB debugging and tracing , it seems that is triggered by a =
combine of [1] and [2]
>> Z>
>> Z> [1] =
https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 =
<https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915>;
>> Z> [2] =
https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b =
<https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b>;

I can confirm [2] also affects Non-VNET jails.
Prison pr_ref leak cause jail stuck in dying state.

>> Z>
>> Z>
>> Z> In [1] the per-VNET uma zone is shared with the global one.
>> Z> `pcbinfo->ipi_zone =3D pcbstor->ips_zone;`
>> Z>
>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by =
uma_zfree_smr() .
>> Z>
>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and =
inpcb_dtor() is not called immediately ,
>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>> Z>
>> Z> And it is also not possible to free up the cache by per-VNET =
SYSUNINIT tcp_destroy / udp_destroy / rip_destroy.
>>=20
>> This is known issue and I'd prefer not to call it a problem. The =
"leak" of a jail
>> happens only if machine is idle wrt the networking activity.
>>=20
>> Getting back to the problem that started this thread - the epair(4)s =
not immediately
>> popping back to prison0. IMHO, the problem again lies in the design =
of if_vmove and
>> epair(4) in particular. The if_vmove shall not exist, instead we =
should do a full
>> if_attach() and if_detach(). The state of an ifnet when it undergoes =
if_vmove doesn't
>> carry any useful information. With Alexander melifaro@ we discussed =
better options
>> for creating or attaching interfaces to jails that if_vmove. Until =
they are ready
>> the most easy workaround to deal with annoying epair(4) come back =
problem is to
>> remove it manually before destroying a jail, like I did in =
80fc25025ff.
>=20
> Ok, move an em0 or cxl0 into the jail;  the problem will be the same I
> bet and you need the physical interface to not disappear as then you
> cannot re-create a new jail with it.

Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no =
chance to be called, thus
if_vmove is not called and epair(4)s or em0, exl0 are not returned to =
home vnet.

That can be confirmed by setting debug point on vnet_destroy by DDB, and =
then create and destroy vnet jails.

So before the problem prison pr_ref count leaks is resolved, it will =
cover other potential problems such as @glebius
pointed out.

I think the problem that prison ref count leaks should be resolved =
first.

I'm also reviewing the life cycles of prison / vnet and it seems they =
could still be improved.

>=20
> /bz
>=20
> --=20
> Bjoern A. Zeeb                                                     =
r15:7




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6B201617-68BC-4CC8-A2AE-908E96D69B67>