From nobody Fri Dec 16 10:30:57 2022 X-Original-To: freebsd-jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NYQPP0jNyzQfmw for ; Fri, 16 Dec 2022 10:31:21 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NYQPN30TKz46Ck; Fri, 16 Dec 2022 10:31:20 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=W94UzvRq; spf=pass (mx1.freebsd.org: domain of zlei.huang@gmail.com designates 2607:f8b0:4864:20::431 as permitted sender) smtp.mailfrom=zlei.huang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-x431.google.com with SMTP id w26so1483688pfj.6; Fri, 16 Dec 2022 02:31:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=mo+bdKqgDddGxwtVWA22dL0PkNS7K62O3bVawMx8Z+w=; b=W94UzvRqP3r5zTFGFMqNC79F5zYJWqYJXixyBZsGMyhC7dw+8MCWtdYt3LwreDMWZM c8qw7YhQoGLimVO8NT0msH4zoS7TnB3W2LtplK3m7yXeyeGdB4OklbheDOC4X3Svk39T fMl4bon9JsIUvbYdUUIPbeoLlCI7WJuAtM/HFa+DouyF+KGRNNzkkhrGXBcwqSoN1F4e PjqE76a9qcPDCdAMS3zWcZOJWt6ttvzSK8ablTJB41ZmtDegm+fcfvrYWHICX2bqhtQ2 C/0Q9VUREzlxH9KGTIVUviHO4OmUjdyPnAkfd/XFZfiqi6vF/Lq55GE+yxj5Tbz8HS1Z yArg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mo+bdKqgDddGxwtVWA22dL0PkNS7K62O3bVawMx8Z+w=; b=zDrOKOh/v+Adgw7CxBUo1JT4WvNaeoFpCkdWa+Eld6pSmsL+JzgnXlX7+df6/aQLad O9zFx8qfYTcUtak151VrU+s29QzZgUr3xAW65Peey+qpz1RFJhyoV3px9dVHTk1uZ2e1 jwBJa/vjn5moQTuQNdYUEmYqpdp0zp/Zh40q/ga4JHdtetb925Hc/nJmhd6Fq89OjbJk zNGPS5ZM1lL6tc+KisocVmjFqL3S29WIrqb3y7OwlW968P6fnXeAHaVel7lzIWUwAF+r jbJOWxKB3X0bHwQz/JBV9hkliAQKmDuXzryE9T+0JcjVf8UjaLbIsorttf8x01s77lCK zJPg== X-Gm-Message-State: ANoB5pnAO9rtGKUqX7ogNP1dfa+s5oZ8JGZ4b3+rA8gzk6qemQ3ugN+r dgky570l6Lil4As+kqMcdf819bsw3c527A== X-Google-Smtp-Source: AA0mqf5EBbSB6a1a7k64pG7U+/HzXHA19LcT0WMaFQx2ZWBLAYg5izEWhVt9/9dwAFTbHDMf7KWE1w== X-Received: by 2002:aa7:928f:0:b0:56d:465d:9fbc with SMTP id j15-20020aa7928f000000b0056d465d9fbcmr27538683pfa.25.1671186678858; Fri, 16 Dec 2022 02:31:18 -0800 (PST) Received: from [192.168.10.254] ([112.66.186.114]) by smtp.gmail.com with ESMTPSA id p62-20020a622941000000b0056b9ec7e2desm1144544pfp.125.2022.12.16.02.31.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Dec 2022 02:31:18 -0800 (PST) From: Zhenlei Huang X-Google-Original-From: Zhenlei Huang Message-Id: <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_2396E16C-866A-4D36-A1B5-6C4992A3A64A" List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Re: What's going on with vnets and epairs w/ addresses? Date: Fri, 16 Dec 2022 18:30:57 +0800 In-Reply-To: Cc: "freebsd-jail@freebsd.org" , Gleb Smirnoff To: "Bjoern A. Zeeb" References: <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Spamd-Result: default: False [-0.41 / 15.00]; URI_COUNT_ODD(1.00)[29]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_MATCH_TO(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.91)[-0.910]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TAGGED_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::431:from]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MLMMJ_DEST(0.00)[freebsd-jail@freebsd.org] X-Rspamd-Queue-Id: 4NYQPN30TKz46Ck X-Spamd-Bar: / X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_2396E16C-866A-4D36-A1B5-6C4992A3A64A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi, I managed to repeat this issue on CURRENT/14 with this small snip: ------------------------------------------- #!/bin/sh # test jail name n=3D"test_ref_leak" jail -c name=3D$n path=3D/ vnet persist # The following line trigger jail pr_ref leak jexec $n ifconfig lo0 inet 127.0.0.1/8 jail -R $n # wait a moment sleep 1 jls -j $n ------------------------------------------- After DDB debugging and tracing , it seems that is triggered by a = combine of [1] and [2] [1] = https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 = [2] = https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b = In [1] the per-VNET uma zone is shared with the global one. `pcbinfo->ipi_zone =3D pcbstor->ips_zone;` In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by = uma_zfree_smr() . Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() = is not called immediately , thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. And it is also not possible to free up the cache by per-VNET SYSUNINIT = tcp_destroy / udp_destroy / rip_destroy. Best regards, Zhenlei > On Dec 14, 2022, at 9:56 AM, Zhenlei Huang wrote: >=20 >=20 > Hi, >=20 > I also encounter this problem while testing gif tunnel between jails. >=20 > My script is similar but with additional gif tunnels. >=20 >=20 > There are reports in mailing list [1], [2], and another one in forum = [3] . >=20 > Seem to be a long standing issue. >=20 > [1] = https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.htm= l = > [2] = https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html = > [3] = https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-startin= g-networking-et-cetera.84200/ = >=20 >=20 > Best regards, > Zhenlei >=20 >> On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb > wrote: >>=20 >> Hi, >>=20 >> I have used scripts like the below for almost a decade and a half >> (obviously doing more than that in the middle). I haven't used them >> much lately but given other questions I just wanted to fire up a = test. >>=20 >> I have an end-November kernel doing the below my eapirs do not come = back >> to be destroyed (immediately). >> I have to start polling for the jid to be no longer alive and not in >> dying state (hence added the jls/ifconfig -l lines and removed the >> error checking from ifconfig destroy). That seems sometimes rather >> unreasonably long (to the point I give up). >>=20 >> If I don't configure the addresses below this isn't a problem. >>=20 >> Sorry I am confused by too many incarnations of the code; I know I = once >> had a version with an async shutdown path but I believe that never = made >> it into mainline, so why are we holding onto the epairs now and not >> nuking the addresses and returning them and are clean? >>=20 >> It's a bit more funny; I added a twiddle loop at the end and nothing >> happened. So I stop the script and start it again and suddenly = another >> jail or two have cleaned up and their epairs are back. Something = feels >> very very wonky. Play around with this and see ... and let me know = if >> you can reproduce this... I quite wonder why some test cases haven't >> gone crazy ... >>=20 >> /bz >>=20 >> = ------------------------------------------------------------------------ >> #!/bin/sh >>=20 >> set -e >> set -x >>=20 >> js=3D`jail -i -c -n jl host.hostname=3Dleft.example.net = vnet persist` >> jb=3D`jail -i -c -n jr host.hostname=3Dright.example.net = vnet persist` >>=20 >> # Create an epair connecting the two machines (vnet jails). >> ep=3D`ifconfig epair create | sed -e 's/a$//'` >>=20 >> # Add one end to each vnet jail. >> ifconfig ${ep}a vnet ${js} >> ifconfig ${ep}b vnet ${jb} >>=20 >> # Add an IP address on the epairs in each vnet jail. >> # XXX Leave these out and the cleanup seems to work fine. >> jexec ${js} ifconfig ${ep}a inet 192.0.2.1/24 >> jexec ${jb} ifconfig ${ep}b inet 192.0.2.2/24 >>=20 >> # Clean up. >> jail -r ${jb} >> jail -r ${js} >>=20 >> # You want to be able to remove this line ... >> set +e >>=20 >> # No epairs to destroy with addresses configured; fine otherwise. >> ifconfig ${ep}a destroy >> # echo $? >>=20 >> # Add this is here only as things are funny ... >> # jls -av jid dying >> # ifconfig -l >>=20 >> # end >> = ------------------------------------------------------------------------ >>=20 >> --=20 >> Bjoern A. Zeeb = r15:7 >>=20 >=20 --Apple-Mail=_2396E16C-866A-4D36-A1B5-6C4992A3A64A Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Hi,

I= managed to repeat this issue on CURRENT/14 with this small = snip:

-------------------------------------------
#!/bin/sh

# test jail name
n=3D"test_ref_leak"

jail -c name=3D$n path=3D/ vnet = persist
# The following line trigger jail pr_ref = leak
jexec $n ifconfig lo0 inet = 127.0.0.1/8

jail= -R $n

# wait = a moment
sleep 1

jls -j $n


-------------------------------------------


After DDB debugging and tracing , it seems that is triggered = by a combine of [1] and [2]

[1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5a= c895915


In [1] the per-VNET uma zone is shared = with the global one.
`pcbinfo->ipi_zone =3D = pcbstor->ips_zone;`

In [2] unref `inp->inp_cred` is deferred called in = inpcb_dtor() by uma_zfree_smr() .

Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately ,
thus leaking `inp->inp_cred` ref and hence = `prison->pr_ref`.

And it is also not = possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy = / rip_destroy.



Best regards,
Zhenlei

On Dec 14, 2022, at 9:56 AM, Zhenlei Huang <zlei@FreeBSD.org> = wrote:


Hi,

I also encounter this problem while testing gif tunnel = between jails.

My script is similar but with additional gif = tunnels.


There are reports in mailing list [1], = [2], and another one in forum [3] .

Seem to be a long standing = issue.



Best regards,
Zhenlei

On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb <bz@FreeBSD.org> = wrote:

Hi,

I have used scripts like the = below for almost a decade and a half
(obviously doing more = than that in the middle).  I haven't used them
much = lately but given other questions I just wanted to fire up a test.

I have an end-November kernel doing the below = my eapirs do not come back
to be destroyed = (immediately).
I have to start polling for the jid to be = no longer alive and not in
dying state (hence added the = jls/ifconfig -l lines and removed the
error checking from = ifconfig destroy).  That seems sometimes rather
unreasonably long (to the point I give up).

If I don't configure the addresses below this isn't a = problem.

Sorry I am confused by too many = incarnations of the code; I know I once
had a version with = an async shutdown path but I believe that never made
it = into mainline, so why are we holding onto the epairs now and not
nuking the addresses and returning them and are clean?

It's a bit more funny; I added a twiddle loop = at the end and nothing
happened.  So I stop the = script and start it again and suddenly another
jail or two = have cleaned up and their epairs are back.  Something feels
very very wonky.  Play around with this and see ... and = let me know if
you can reproduce this...  I quite = wonder why some test cases haven't
gone crazy ...

/bz

---------------------------------------------------------------= ---------
#!/bin/sh

set -e
set -x

js=3D`jail -i -c -n jl = host.hostname=3Dleft.example.net vnet persist`
jb=3D`jail = -i -c -n jr host.hostname=3Dright.example.net vnet persist`

# Create an epair connecting the two machines (vnet = jails).
ep=3D`ifconfig epair create | sed -e 's/a$//'`

# Add one end to each vnet jail.
ifconfig ${ep}a vnet ${js}
ifconfig ${ep}b vnet = ${jb}

# Add an IP address on the epairs in = each vnet jail.
# XXX Leave these out and the cleanup = seems to work fine.
jexec ${js}  ifconfig ${ep}a inet =  192.0.2.1/24
jexec ${jb}  ifconfig ${ep}b inet =  192.0.2.2/24

# Clean up.
jail -r ${jb}
jail -r ${js}

# You want to be able to remove this line ...
set= +e

# No epairs to destroy with addresses = configured; fine otherwise.
ifconfig ${ep}a destroy
# echo $?

# Add this is here = only as things are funny ...
# jls -av jid dying
# ifconfig -l

# end
---------------------------------------------------------------= ---------

--
Bjoern A. Zeeb =             &n= bsp;           &nbs= p;            =             &n= bsp;  r15:7



= --Apple-Mail=_2396E16C-866A-4D36-A1B5-6C4992A3A64A--