From nobody Fri Feb 24 14:42:05 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PNXfm6fTwz3tK64; Fri, 24 Feb 2023 14:42:24 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PNXfm4VHfz4DdP; Fri, 24 Feb 2023 14:42:24 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pf1-x42e.google.com with SMTP id y10so1301681pfi.8; Fri, 24 Feb 2023 06:42:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=uCI4mFOTlukp5VDzQ0osYNN+wtWNCzQMKoijn00LsJc=; b=W9oHsoS5QQO3JY7M7nz7Y4ph/Xy4lMs79jQqWvAGnEvuIfEm67Hea2bCWD0esztS75 NGMwVztEwO8tB5GD/lJxOX3Zhbky29bU25Zlxu8sDlX51EafJXWCnxMitDYhMGHU0zKk nuX5adwETcvyqcTFgR+CE7Y3fKh0Ccs+vKorU0AsHvLIksPT1nT1eqTiNV8TLhTbO/Zn +M06/i4MtexiKKsWJG9h2BIkrpNliBJEuWeycunNjsh5oCgb6b9DylKxzGUw1ei/HNqG VaPNlIjEO9xq7KZ+cXmZ2IEG+DCp7y8yho1by9yZVcZT52sPZsGoT7eeF2BDRFbNjifd DPhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uCI4mFOTlukp5VDzQ0osYNN+wtWNCzQMKoijn00LsJc=; b=IfArjqY5kHswMT2UJrGpt8mbPExVbWneaPXy3MPaI80f2CnzbjwiJpuxHvM4lUllFm v/kHaDQNrDcR1ecqeUBv89hCX+A9Erwm5djfjpVBn8tiAsLgkBFKaWEC9L1z7dc13Q7w ezVjyNVNEiBdxrBsVzPexmfKDuJAMAYRfz08vG0Q5Hk+pgmjv9YaMgjwzibtDCsG1Qc0 vX3ECVPwfLQgVGx+UoXPtUZz+wYtnK1Wwk5QH4dzI7hffTeJjkOxw7tEUf/zslx344Ph n1/Fhk1lPGRdOPXAc4QGZIgb2QAgfRTwgkxIBjgsYvFGhGHtKTstWvDd3xn30mYpMYQH 9jHA== X-Gm-Message-State: AO0yUKU1xv0J5p264lo0x1zgDWmAFYg1kl3VDy8EMYjEnoq+OW4QqXJc 8DwVMp6pgVBjtYZ/EfjcqcWO3FD7qRrrEDKTzP8hR+WUQvPg X-Google-Smtp-Source: AK7set/aJbXPr6hLXMP4s1ZDTCfXpPJA6MqeVYc6vt41rw9Hw3zTca+7YYWqRFrVFNSeLedLiS6h3/ylBd7OcJXh/SI= X-Received: by 2002:a63:7902:0:b0:502:fd12:83ce with SMTP id u2-20020a637902000000b00502fd1283cemr1596958pgc.5.1677249742973; Fri, 24 Feb 2023 06:42:22 -0800 (PST) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 References: <202302202112.31KLCfQB080359@gitrepo.freebsd.org> In-Reply-To: From: Rick Macklem Date: Fri, 24 Feb 2023 06:42:05 -0800 Message-ID: Subject: Re: git: ef6fcc5e2b07 - main - nfsd: Add VNET_SYSUNINIT() macros for vnet cleanup To: Gleb Smirnoff Cc: Rick Macklem , bz@freebsd.org, jamie@freebsd.org, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4PNXfm4VHfz4DdP X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Thu, Feb 23, 2023 at 7:08 PM Gleb Smirnoff wrote: > > Rick, > > On Thu, Feb 23, 2023 at 05:56:06AM -0800, Rick Macklem wrote: > R> > This one actually doesn't look correct to me. What happens here is that the sysctls > R> > will affect only the default VNET. > R> > > R> Yes, but the sysctls are mostly useless anyhow. I don't know how to make them > R> work in a prison. (I know how to use SYSCTL_FLAG_VNET, but that does not > R> work in this case.) > > Doesn't they work as intended in my patch D38742? Yes, and my pretty trivial D38748 does as well. What I was trying to say above was "I don't know how to vnet sysctls done under a SYSCTL_ADD_NODE()".. One of those also exists in the krpc. > > R> > I think the VNET-itezation of this file went a bit wrong. You don't need to convert > R> > static structures to malloced when you VNET-ize a module. The infrastructure should > R> > take care of memory management. > R> > > R> Not if you want to keep the vnet footprint small. This was necessary > R> so that nfsd.ko > R> (and friends) will load dynamically. Without the conversion to > R> mallocs, it would complain > R> the vnet was out of memory when nfsd.ko tried to load. > R> (I'm sure I didn't need to do all of them, but it made sense to keep > R> the vnet footprint > R> as small as possible.) > R> > The dynamic sysctl context seems to be unneeded from the very beginning. It was > R> > always attached to the static softc. Suggested patch: > R> > > R> > https://reviews.freebsd.org/D38742 > R> > > R> I'll look at it, but if it stops malloc'ng softc, then I will be > R> worried w.r.t. vnet footprint and > R> dynamic loading. (Note that the structure has an array of hash list > R> pointers in it, so it > R> is rather large. > > Replying to you and Bjoern's email too here. Well, if we want a fully blown > virtualized network stack, and this is what VIMAGE is, then, well, we need a > full chunk of memory to keep a network stack data. So, if there is a limit > there, (Bjoern mentioned 8k) then this limit needs to be increased as more > and more subsystems are virtualized. I also don't see how we actually save > any memory using malloc(9) instead of using memory provided by VIMAGE? The > kmem use would be roughly the same if not worse. What exactly are we saving > here? > I know nothing about the internals of VIMAGE, so I'll leave that to others to discuss. I will point out that I do not see any disadvantage to using malloc(). > R> Another reason (along with dynamic loading of the modules) for keeping > R> the foot print > R> small is to try and keep the jails that do not use nfsd(8) and friends > R> lightweight. > R> When I originally coded it, I put it under a kernel option called > R> VNET_NFSD (still in > R> kern_jail.c at this time), but others felt that a new kernel option > R> wasn't desirable, > R> (There is a lot of discussion under D37519. The downside of discussions that > R> happen in phabricator is that not as many people see them. I started an email > R> thread on freebsd-current@, but it quickly migrated to D37519. Just > R> the way things > R> currently happen.) > > I briefly looked into D37519 and didn't find any discussion over memory > footprint and savings. I agree that a kernel option of VNET_NFSD is undesirable. > The global option VIMAGE shall control that and nothing else. > Well, there was a discussion of how to gte nfsd.ko to load dynamically (after I found out it would not) somewhere. The answer was basically "malloc arrays and big data structures", which worked and I am fine with. > R> > For example nfsstatsv1, which is now malloced for non-default vnets and static for > R> > the default. Lots of modules still incorrectly update the global one. > R> > > R> Nope. "struct nfsstatsv1" is a structure that is shared with the NFS > R> client, which is > R> not vnet'd. As such, a static "struct nfsstatsv1" needs to exist for prison0. > R> > R> My original solution was to create a separate structure for the server > R> side stuff > R> (vnet it), but then the result was a messy copying exercise for the system call, > R> which returns the entire structure (with client and server info). > R> > R> Once the client side is vnet'd (I've already had a couple of emails asking me > R> to do it, so I plan on starting to work on it) then, yes, the > R> structure can become > R> a malloc'd vnet'd one and the IS_DEFAULT_VNET can go away. > > I see the problem. So, we got several things that need to be done to > nfsstatsv1: virtualize, separate server and client, possibly make them use > counter(9) to avoid races and performance issues. And I'm afraid there is > API/ABI that needs to be preserved? Depending on order of doing changes to > nfsstatsv1 the amount of work is going to be different. > > So, if we start with my D38743 the only problem observed is that the client > code will use virtual version of the stats. So if somebody decides to run > a client from a vnet jail, it will fill this jail stats instead of global. > Is it a big deal if the future plan is actually to make client virtualized > too? Your patch is fine. It just requires the rest of the work of vnet'ng the client to be done and I have not done that yet. Until then, your patch would just break client mounts. I'm pretty sure they would crash almost instantly, although I have not tried it. In the meantime, the code needs to stay in a working form. (I also won't guarantee I'll get the client vnet'ng done in time for FreeBSD14, although I think it will happen.) > > R> I am not sure why you think IS_DEFAULT_VNET() is such a big deal in > R> the initialization code? It work. I can see an argument for using both > R> SYSINIT() and VNET_SYSINIT() for cases where you have both non-vnet;d > R> and vnet'd data to initialize. However, I don't see any problem with using > R> IS_DEFAULT_VNET() when needed. > > My main concern is complexity and its unforseen consequences. The original design > of VIMAGE is simple - just wrap every global variable into VNET() macro and you > are done. Don't make non-default vnet in any sense different to the default one! > It actually works pretty good if the recipe is being followed. Sometimes this leads > to pretty large mechanical changes, and people try to avoid that. For example in > pf(4) we had quite a long story with getting it stable with VIMAGE, and I think > that was because we just didn't do it right from the beginning. > Yea, as you noted, the NFS code can get complex and the nfsstatsv1 part is a bit messy (mostly maintaining backwards compatibility for old versions of the structure). This simple use of IS_DEFAULT_VNET() fixes the problem until the client side is vnet'd. Then it can go away. > So with NFS we started to create complexity and we already got problems. The > memory definitely is used after free, see: > > https://ci.freebsd.org/job/FreeBSD-main-amd64-test/23034/console Yes, I think D38750 fixes this. For my test setup (a main GENERIC kernel), VNET_SYSUNINIT() functions never get called. After "jail -r" they are stuck in dying state. This happens even for non-vnet jails. jamie@ has reproduced the problem on a setup he did. As such, I can't test the VNET_SYSUNINIT() stuff, so I only saw this problem when the KASAN testing found it. It would be interesting to know how the kernel used in the testing setup differs from GENERIC, because the kernel in the testing environment does appear to call the VNET_SYSUNINIT() functions? rick > > -- > Gleb Smirnoff