From nobody Tue Jan 7 23:45:37 2025 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YSSNf3wpKz5kBNt for ; Tue, 07 Jan 2025 23:45:54 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YSSNf0lvxz4N8d for ; Tue, 7 Jan 2025 23:45:54 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=LkeE7NCH; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2a00:1450:4864:20::52a as permitted sender) smtp.mailfrom=rick.macklem@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-5d7e3f1fc01so4584523a12.2 for ; Tue, 07 Jan 2025 15:45:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736293553; x=1736898353; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SImdlqRtmMt8WhW3myj11iS97Qv7b9Gj2Y42sisMhxQ=; b=LkeE7NCHbpS7qtax3270J/tI51HnjdZcqe1LbWNCNjAjARMSWGXBpctDfaE7NhE3Rk GtF+vr56BpkzsI80Z43walVrdI1MNJlyUBf6ZyBFJCQ4FyCr8YZHGvyK9504kxftTDGz FLriEURfRTOd4hE4WhSehbpMOw+mA3NZyxFCli/L2tXdllyLPBIzKCUDdk+Nt+EBqIPj p0vN7ZUugKfT338PIzYLo9Ggll9Ong68tBrSBT11pGRWyUpqhwMtg/jCo6T1ehYmRWa7 NApgZImIhOeTvxqNizyOXPuG2/wJxwPmzg5Go1i5hK9Qt2UspPKgEvDPZzmpWpBCOXw/ rv0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736293553; x=1736898353; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SImdlqRtmMt8WhW3myj11iS97Qv7b9Gj2Y42sisMhxQ=; b=NW8aOPBOuOSVndcJ8hPW1dN2DLVwXK5vncBoecKH/UcRqDupZv7wesMGFfZyr3W+zj 09KhmEAs1a63RMvUiNxHEeSkwbBBJtkaGEPisyJ92BbLvVJS4tcE8pTYqM9BSGowgYgZ /LBcQQN5o5Lar3k/R7MTXiRwvTPN5RRGghRePbK/qv+jOcPTzWlPeXznJ6G4viU8H6Jw d86YuS+ZWAIo7k/OLkNgoP05b7fxETat1U1L7I7Uc4YgqD0VdllQcbY8hnJXqKABTxhz Lsv53sLp9KaQKy78jouEfBL9eueFaVOX61OtIEC0SUn4r7tgmb828K/kya47RrI/Mchx hajw== X-Gm-Message-State: AOJu0YwKv39NMjw/ewp0j8Yot1HQW4zXNv+lugL7sZtve50gRk5vHprA co0vomp0uzBEDwGSNWX2lbd0Tgu10omAcUAl0z02PPsYlVrvrFBQC7QbxLNqTVZ8y3thR6o2BNo U+4ZJIGOoYVTP6aNJRv4JYmJpwoO0 X-Gm-Gg: ASbGnctt0QlrImslhSXcZk82HYNMgTv7j3V4yWLPcseqnRwCz6aSS96X1/y/16mo1yR RGSh5vziiOA71BDHO/ZcYBD2+km8oa457qoE+xA1ydnuKpnV8qiqK0LZesIhIXeHeKD7rL4E= X-Google-Smtp-Source: AGHT+IGr0DFuky2FPbQ2Yg3ORfBBHFETbTj1pCq/N1PWQOmJX7z7jGUuciAsT5NfhDFooTRBCD2ImHWREroXcYfnW1s= X-Received: by 2002:a05:6402:518f:b0:5d0:abb8:79d with SMTP id 4fb4d7f45d1cf-5d972e083f8mr544600a12.14.1736293552678; Tue, 07 Jan 2025 15:45:52 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Rick Macklem Date: Tue, 7 Jan 2025 15:45:37 -0800 X-Gm-Features: AbW1kvZuYgaCRfNPl_pqSR1hq-AxbJK6QeaevH-GAzqqwUCMcBUniCIeq255Ey4 Message-ID: Subject: Re: system stalled, no I/O but 100% CPU from nfs To: "Peter 'PMc' Much" Cc: freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4YSSNf0lvxz4N8d X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.99 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TAGGED_FROM(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52a:from] On Mon, Jan 6, 2025 at 8:45=E2=80=AFAM Peter 'PMc' Much wrote: > > On Mon, Jan 06, 2025 at 05:53:38AM -0800, Rick Macklem wrote: > ! On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much > ! wrote: > > ! > This doesn't look good. It goes on for hours. What can be done about= it? > ! > (13.4 client & server) > ! > > ! > > ! > 44 processes: 4 running, 39 sleeping, 1 waiting > ! > CPU: 0.4% user, 0.0% nice, 99.6% system, 0.0% interrupt, 0.0% idl= e > ! > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free > ! > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Othe= r > ! > 332M Compressed, 804M Uncompressed, 2.42:1 Ratio > ! > Swap: 15G Total, 15G Free > ! > > ! > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU CO= MMAND > ! > 417 root 4 52 0 12M 2148K RUN 20:55 99.12% nf= scbd > ! Do you have delegations enabled on your server > ! (vfs.nfsd.issue_delegations not 0)? > > Not knowingly: > > # sysctl vfs.nfsd.issue_delegations > vfs.nfsd.issue_delegations: 0 > > ! (If you do not, I have no idea why the server would be doing > ! callbacks, which is what nfscbd > ! handles.) > > Me neither. ;) The cpu being associated with nfscbd might just be a glitch. NFS uses kernel threads and it is hard to know what process they might get associated with for these stats. When you do: # ps axHl it will show the kernel threads. If it happens again, it might turn out that the thread(s) racking up CPU aren't actually doing callacks. rick > > The good news at this point is, it is a single event. At first I > thought the whole cluster got slow (it is always too slow ;) ), but > it was only this node - the others have no cpu consumption on > nfscbd. > > The bad thing is, I cannot remember why I did switch that thing > on. > > ! Also, "nfsstat -m" on the client shows you/us what your mount > ! options are. > > It had to be destroyed, as effects got worse. > > What I figured is: it didn't issue any syscalls, and it didn't > act on kill -9. > Which means: most likely it found an infinite loop inside the > kernel, aka a never-returning syscall. > > ! The above suggests that there is still some activity on the client, but= the > ! info. is limited. > > Yes, it got ever slower. The NFS mount is for /usr/ports, and I did > fix some ports there. At some point a "make clean" would start to > take minutes to complete, and there I noticed something is wrong. > Finally it didn't even echo on the console (I had only one cpu > available, and then when something is stuck within the kernel, all > depends on preemption). > > ! If the client is still in this state, you can collect more info via: > ! # tcpdump -s 0 -w out.pcap host > ! run for a little while. > > I had to destroy it. I tried to run dtrace to pinpoint exactly where > that thing does execute, but it didn't startup. At that point I didn't > consider it feasible to try further investigation. > These are temporary building guests, they get destroyed after > completion anyway. > > So, as apparently it was a single event, I might suggest we just > remember that nfscbd /can do this/ (under yet unclear circumstances) > and otherwise hope for the best. > > And probably I should get rid of that daemon altogether. I think I > read something about these delegations, and it looked suitable for the > usecase, but I didn't realize that it would need to be activated > on the server also. > (The usecase is, a snapshot + clone is created from the ports repo, > then switched to a desired tag/branch, and that filetree is then > used by a single guest, exclusively.) > > > Thanks for Your help! > > cheerio, > PMc