From nobody Tue Jan 17 21:42:25 2023 X-Original-To: freebsd-jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NxMn265zcz2v8ZV for ; Tue, 17 Jan 2023 21:42:30 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NxMn16D9Nz45C9; Tue, 17 Jan 2023 21:42:29 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b="n3uHkk4/"; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::830 as permitted sender) smtp.mailfrom=markjdb@gmail.com; dmarc=none Received: by mail-qt1-x830.google.com with SMTP id j9so5640432qtv.4; Tue, 17 Jan 2023 13:42:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=btnAn5kINzIXY2RgOWClnEB3zfosTtjKd7wc1Mo1rmA=; b=n3uHkk4/9P3u+js4jadS1VEBXiN22v4+PDJrVHxYy3A58h89jlXXoErE9kaM7HSH9M reTmelC9Fxx5sXyi6xCOKcChveqyb2gbvRpKhRpTr9j/eFAXhpJCzkGicRAktiRrpGpw GXb1r24ZPl8gnPNSm4+NvrTgVqIZGTJIKmCZmo9O24McwSxBLv1jPOKO1HT2F5En4jBa 3rDOV0CyIeX/dOrSaA4F5PXQ9kfirblygv0niKw8nmvxeWp1zfRKYLShvIFkJl1cSjtS jic4sqY331At+X6IgJZzwY80VmHVgefZXQPo8E0aenlG5XVE8rYNg6HMpE0xE6f2srjt 9O6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=btnAn5kINzIXY2RgOWClnEB3zfosTtjKd7wc1Mo1rmA=; b=R7ZymDphDPsnFhl3H5BN4WJNGBvwIhRTIOoM49HY2Y5hu+5bAsb8lH2ZrGJYyk4bNN Sy5wgyovrveo9MUHwqOMy6o/qvzi5UTgdX1g/yo5itgHHcvo1NIz9bavMGDj+FJik+NV SRVHA6Po4TIZW4Bzu03XMxsbnNyV03sLvLglAJkR5jKMYMirOtBH8Uf03D5aLHL0Fso1 yDFGZMMFg5iJFSq70rU8U+bXL+fBU5sY3zPyK2+ARt49SfDwtfvh442zKtwrt5jUyI+t yyGdN5Ua4R9ybyxrJfZmO6H+vfAx3MPpyy2J9lVcCD5oRcCY0le5Vl9hVzwXCdR0C13D WVPg== X-Gm-Message-State: AFqh2kqJLdpfG/r6ycGzv55Q9Bu5HVA2V7lfW3Lwoyj6u6v7RXwqXi7e bDDOxuwXoIMStXUV1m5KbOWGkeOlNJw= X-Google-Smtp-Source: AMrXdXt4rEbgz1MSh043yxGFMdFz/S/ii6zshdW4RS0mrMAheCWvMcqQnJj1zmWrMreHFsvPRvkbCQ== X-Received: by 2002:a05:622a:508a:b0:3ac:77ed:3995 with SMTP id fp10-20020a05622a508a00b003ac77ed3995mr5240758qtb.26.1673991748547; Tue, 17 Jan 2023 13:42:28 -0800 (PST) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id b5-20020ac812c5000000b003b63dfad2b4sm1429090qtj.0.2023.01.17.13.42.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jan 2023 13:42:27 -0800 (PST) Date: Tue, 17 Jan 2023 16:42:25 -0500 From: Mark Johnston To: "Bjoern A. Zeeb" Cc: Kyle Evans , Gleb Smirnoff , Zhenlei Huang , "freebsd-jail@freebsd.org" Subject: Re: What's going on with vnets and epairs w/ addresses? Message-ID: References: <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spamd-Result: default: False [-1.68 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_SHORT(-0.98)[-0.984]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; MLMMJ_DEST(0.00)[freebsd-jail@freebsd.org]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::830:from]; FROM_HAS_DN(0.00)[]; TAGGED_RCPT(0.00)[]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MIME_TRACE(0.00)[0:+]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com] X-Rspamd-Queue-Id: 4NxMn16D9Nz45C9 X-Spamd-Bar: - X-ThisMailContainsUnwantedMimeParts: N On Tue, Dec 20, 2022 at 08:50:09PM +0000, Bjoern A. Zeeb wrote: > On Tue, 20 Dec 2022, Mark Johnston wrote: > > > On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote: > >> On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff wrote: > >>> > >>> Zhenlei, > >>> > >>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote: > >>> Z> I managed to repeat this issue on CURRENT/14 with this small snip: > >>> Z> > >>> Z> ------------------------------------------- > >>> Z> #!/bin/sh > >>> Z> > >>> Z> # test jail name > >>> Z> n="test_ref_leak" > >>> Z> > >>> Z> jail -c name=$n path=/ vnet persist > >>> Z> # The following line trigger jail pr_ref leak > >>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8 > >>> Z> > >>> Z> jail -R $n > >>> Z> > >>> Z> # wait a moment > >>> Z> sleep 1 > >>> Z> > >>> Z> jls -j $n > >>> Z> > >>> Z> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2] > >>> Z> > >>> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 > >>> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b > >>> Z> > >>> Z> > >>> Z> In [1] the per-VNET uma zone is shared with the global one. > >>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;` > >>> Z> > >>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() . > >>> Z> > >>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately , > >>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. > >>> Z> > >>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy. > >>> > >>> This is known issue and I'd prefer not to call it a problem. The "leak" of a jail > >>> happens only if machine is idle wrt the networking activity. > >>> > >>> Getting back to the problem that started this thread - the epair(4)s not immediately > >>> popping back to prison0. IMHO, the problem again lies in the design of if_vmove and > >>> epair(4) in particular. The if_vmove shall not exist, instead we should do a full > >>> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove doesn't > >>> carry any useful information. With Alexander melifaro@ we discussed better options > >>> for creating or attaching interfaces to jails that if_vmove. Until they are ready > >>> the most easy workaround to deal with annoying epair(4) come back problem is to > >>> remove it manually before destroying a jail, like I did in 80fc25025ff. > >>> > >> > >> It still behaved much better prior to eb93b99d6986, which you and Mark > >> were going to work on a solution for to allow the cred "leak" to close > >> up much more quickly. CC markj@, since I think it's been six months > >> since the last time I inquired about it, making this a good time to do > >> it again... > > > > I spent some time trying to see if we could fix this in UMA/SMR and > > talked to Jeff about it a bit. At this point I don't think it's the > > right approach, at least for now. Really we have a composability > > problem where different layers are using different techniques to signal > > that they're done with a particular piece of memory, and they just > > aren't compatible. > > > > One thing I tried is to implement a UMA function which walks over all > > SMR zones and synchronizes all cached items (so that their destructors > > are called). This is really expensive, at minimum it has to bind to all > > A semi-unrelated question -- do we have any documentation around SMR > in the tree which is not in subr_smr.c? > > (I have to admit I find it highly confusing that the acronym is more > easily found as "Shingled Magnetic Recording (SMR)" in a different > header file). Sorry for the delayed reply, I was travelling for a few weeks and still haven't caught up. I did at least write a man page which notes the multiple meanings of that acronym. :) Comments and feedback are welcome: https://reviews.freebsd.org/D38108