From nobody Wed May 4 23:53:27 2022 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 3CA411ABE1BA for ; Wed, 4 May 2022 23:53:46 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KttvX6Msmz4t08 for ; Wed, 4 May 2022 23:53:44 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-deb9295679so2797977fac.6 for ; Wed, 04 May 2022 16:53:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wyrzVTOQDOc1VM3EjfZ0TnuPLNj5J66qWXGab+TsoWM=; b=uDw8TEDArI3fI41BLl3NBytEuJvxopYa6Vu5lYITb31hKtMeptH9G8GBJA7N+zbT4d LTIPuqFjALvya/H2C80Jj5mRPz73U/jxodTynSq3D3jwx/M9XN/AIvfwQKXnjc+Nj900 KAnyPxBmMAfi/zAMMn/4G2gKVvgQ8kW6a++3G9U+wwHD1khf+jlFozeWhRugiWKQPSD1 puV6Xz3ls5XebS5/jBeTUXzq+d458eoGBZEBpsDURE0mpQXrXx6r1NX1ys2PP1kdT3RG eDmph212tn4pWT9oc3UErKcsvuCmWUzThNtNxuAwHyMNP342anSQ/pP/l2dL92oMXdpy cLLQ== X-Gm-Message-State: AOAM531OjmvLw+mrbTWdDhc3+Gp6Ocdg2v8wGR8AyP1BD6Ae5dvfWQJ3 H/GA0ZGqV7kWK0ir70w0qhuMzRzFSHALG4iFK9EtwRG0 X-Google-Smtp-Source: ABdhPJxN2txRQVr9uJ81jTaXDfk6UxumDWZd4zHqeCZmnD3SB/k/b1CpXOKHfSk5CmCsbqjryqZlHPbN2vBR2hNCHw8= X-Received: by 2002:a05:6870:a2d2:b0:d7:60ca:5065 with SMTP id w18-20020a056870a2d200b000d760ca5065mr1051359oak.72.1651708418539; Wed, 04 May 2022 16:53:38 -0700 (PDT) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Wed, 4 May 2022 17:53:27 -0600 Message-ID: Subject: Re: nfs client's OpenOwner count increases without bounds To: Rick Macklem Cc: FreeBSD Stable ML Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4KttvX6Msmz4t08 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.160.45 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[209.85.160.45:from]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[stable@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[209.85.160.45:from]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; MLMMJ_DEST(0.00)[stable]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On Wed, May 4, 2022 at 5:23 PM Rick Macklem wrote: > > Alan Somers wrote: > > I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop > > mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server. It > > worked fine until a few weeks ago. Now, the desktop's performance > > slowly degrades. It becomes less and less responsive until I restart > > X after 2-3 days. /var/log/Xorg.0.log shows plenty of entries like > > "AT keyboard: client bug: event processing lagging behind by 112ms, > > your system is too slow". "top -S" shows that the busiest process is > > nfscl. A dtrace profile shows that nfscl is spending most of its time > > in nfscl_cleanup_common, in the loop over all nfsclowner objects. > > Running "nfsdumpstate" on the server shows thousands of OpenOwners for > > that client, and < 10 for any other NFS client. The OpenOwners > > increases by about 3000 per day. And yet, "fstat" shows only a couple > > hundred open files on the NFS file system. Why are OpenOwners so > > high? Killing most of my desktop processes doesn't seem to make a > > difference. Restarting X does improve the perceived responsiveness, > > though it does not change the number of OpenOwners. > > > > How can I figure out which process(es) are responsible for the > > excessive OpenOwners? > An OpenOwner represents a process on the client. The OpenOwner > name is an encoding of pid + process startup time. > However, I can't think of an easy way to get at the OpenOwner name. > > Now, why aren't they going away, hmm.. > > I'm assuming the # of Opens is not large? > (Openowners cannot go away until all associated opens > are closed.) Oh, I didn't mention that yes the number of Opens is large. Right now, for example, I have 7950 OpenOwner and 8277 Open. > > Commit 1cedb4ea1a79 in main changed the semantics of this > a little, to avoid a use-after-free bug. However, it is dated > Feb. 25, 2022 and is not in 13.0, so I don't think it could > be the culprit. > > Essentially, the function called nfscl_cleanupkext() should call > nfscl_procdoesntexist(), which returns true after the process has > exited and when that is the case, calls nfscl_cleanup_common(). > --> nfscl_cleanup_common() will either get rid of the openowner or, > if there are still children with open file descriptors, mark it "defunct" > so it can be free'd once the children close the file. > > It could be that X is now somehow creating a long chain of processes > where the children inherit a file descriptor and that delays the cleanup > indefinitely? > Even then, everything should get cleaned up once you kill off X? > (It might take a couple of seconds after killing all the processes off.) > > Another possibility is that the "nfscl" thread is wedged somehow. > It is the one that will call nfscl_cleanupkext() once/sec. If it never > gets called, the openowners will never go away. > > Being old fashioned, I'd probably try to figure this out by adding > some printf()s to nfscl_cleanupkext() and nfscl_cleanup_common(). dtrace shows that nfscl_cleanupkext() is getting called at about 0.6 hz. > > To avoid the problem, you can probably just use the "oneopenown" > mount option. With that option, only one openowner is used for > all opens. (Having separate openowners for each process was needed > for NFSv4.0, but not NFSv4.1/4.2.) > > > Or is it just a red herring and I shouldn't > > worry? > Well, you can probably avoid the problem by using the "oneopenown" > mount option. Ok, I'm trying that now. After unmounting and remounting NFS, "nfsstat -cE" reports 1 OpenOwner and 11 Opens". But on the server, "nfsdumpstate" still reports thousands. Will those go away eventually? > > Thanks for reporting this, rick > ps: And, yes, large numbers of openowners will slow things down, > since the code ends up doing linear scans of them all in a linked > list in various places. > > -Alan >