Date: Tue, 17 Jan 2023 16:42:25 -0500 From: Mark Johnston <markj@freebsd.org> To: "Bjoern A. Zeeb" <bz@freebsd.org> Cc: Kyle Evans <kevans@freebsd.org>, Gleb Smirnoff <glebius@freebsd.org>, Zhenlei Huang <zlei.huang@gmail.com>, "freebsd-jail@freebsd.org" <freebsd-jail@freebsd.org> Subject: Re: What's going on with vnets and epairs w/ addresses? Message-ID: <Y8cWQTB3EbUegFZD@nuc> In-Reply-To: <s37sp986-os88-nq69-s6oo-48597r758n8@serrofq.bet> References: <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <B6C70A88-11F8-40D7-85E4-63BBA0F7931D@FreeBSD.org> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> <Y534qgEG1nX5i1iB@FreeBSD.org> <CACNAnaE6UQB=zNBjVNrF%2Bpd%2Bmh=6H0%2BROYf1%2BD=HKBTp8aX27g@mail.gmail.com> <Y6He6OD6PA0ntoK9@nuc> <s37sp986-os88-nq69-s6oo-48597r758n8@serrofq.bet>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 20, 2022 at 08:50:09PM +0000, Bjoern A. Zeeb wrote: > On Tue, 20 Dec 2022, Mark Johnston wrote: > > > On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote: > >> On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff <glebius@freebsd.org> wrote: > >>> > >>> Zhenlei, > >>> > >>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote: > >>> Z> I managed to repeat this issue on CURRENT/14 with this small snip: > >>> Z> > >>> Z> ------------------------------------------- > >>> Z> #!/bin/sh > >>> Z> > >>> Z> # test jail name > >>> Z> n="test_ref_leak" > >>> Z> > >>> Z> jail -c name=$n path=/ vnet persist > >>> Z> # The following line trigger jail pr_ref leak > >>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8 > >>> Z> > >>> Z> jail -R $n > >>> Z> > >>> Z> # wait a moment > >>> Z> sleep 1 > >>> Z> > >>> Z> jls -j $n > >>> Z> > >>> Z> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2] > >>> Z> > >>> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 <https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915> > >>> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b <https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b> > >>> Z> > >>> Z> > >>> Z> In [1] the per-VNET uma zone is shared with the global one. > >>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;` > >>> Z> > >>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() . > >>> Z> > >>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately , > >>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. > >>> Z> > >>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy. > >>> > >>> This is known issue and I'd prefer not to call it a problem. The "leak" of a jail > >>> happens only if machine is idle wrt the networking activity. > >>> > >>> Getting back to the problem that started this thread - the epair(4)s not immediately > >>> popping back to prison0. IMHO, the problem again lies in the design of if_vmove and > >>> epair(4) in particular. The if_vmove shall not exist, instead we should do a full > >>> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove doesn't > >>> carry any useful information. With Alexander melifaro@ we discussed better options > >>> for creating or attaching interfaces to jails that if_vmove. Until they are ready > >>> the most easy workaround to deal with annoying epair(4) come back problem is to > >>> remove it manually before destroying a jail, like I did in 80fc25025ff. > >>> > >> > >> It still behaved much better prior to eb93b99d6986, which you and Mark > >> were going to work on a solution for to allow the cred "leak" to close > >> up much more quickly. CC markj@, since I think it's been six months > >> since the last time I inquired about it, making this a good time to do > >> it again... > > > > I spent some time trying to see if we could fix this in UMA/SMR and > > talked to Jeff about it a bit. At this point I don't think it's the > > right approach, at least for now. Really we have a composability > > problem where different layers are using different techniques to signal > > that they're done with a particular piece of memory, and they just > > aren't compatible. > > > > One thing I tried is to implement a UMA function which walks over all > > SMR zones and synchronizes all cached items (so that their destructors > > are called). This is really expensive, at minimum it has to bind to all > > A semi-unrelated question -- do we have any documentation around SMR > in the tree which is not in subr_smr.c? > > (I have to admit I find it highly confusing that the acronym is more > easily found as "Shingled Magnetic Recording (SMR)" in a different > header file). Sorry for the delayed reply, I was travelling for a few weeks and still haven't caught up. I did at least write a man page which notes the multiple meanings of that acronym. :) Comments and feedback are welcome: https://reviews.freebsd.org/D38108
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Y8cWQTB3EbUegFZD>