Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jan 2025 15:45:37 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: system stalled, no I/O but 100% CPU from nfs
Message-ID:  <CAM5tNy5jfjMYj8hEyXO4sWqjiV=GM4T2tXEaf69nN5J02%2BvU=Q@mail.gmail.com>
In-Reply-To: <Z3wG3fEYjeE9f4nF@disp.intra.daemon.contact>
References:  <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact> <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com> <Z3wG3fEYjeE9f4nF@disp.intra.daemon.contact>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 6, 2025 at 8:45=E2=80=AFAM Peter 'PMc' Much
<pmc@citylink.dinoex.sub.org> wrote:
>
> On Mon, Jan 06, 2025 at 05:53:38AM -0800, Rick Macklem wrote:
> ! On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much
> ! <pmc@citylink.dinoex.sub.org> wrote:
>
> ! >  This doesn't look good. It goes on for hours. What can be done about=
 it?
> ! > (13.4 client & server)
> ! >
> ! >
> ! > 44 processes:  4 running, 39 sleeping, 1 waiting
> ! > CPU:  0.4% user,  0.0% nice, 99.6% system,  0.0% interrupt,  0.0% idl=
e
> ! > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free
> ! > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Othe=
r
> ! >      332M Compressed, 804M Uncompressed, 2.42:1 Ratio
> ! > Swap: 15G Total, 15G Free
> ! >
> ! >   PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU CO=
MMAND
> ! >   417 root          4  52    0    12M  2148K RUN     20:55  99.12% nf=
scbd
> ! Do you have delegations enabled on your server
> ! (vfs.nfsd.issue_delegations not 0)?
>
> Not knowingly:
>
> # sysctl vfs.nfsd.issue_delegations
> vfs.nfsd.issue_delegations: 0
>
> ! (If you do not, I have no idea why the server would be doing
> ! callbacks, which is what nfscbd
> ! handles.)
>
> Me neither. ;)
The cpu being associated with nfscbd might just be a glitch.
NFS uses kernel threads and it is hard to know what process
they might get associated with for these stats.

When you do:
# ps axHl
it will show the kernel threads. If it happens again, it might turn
out that the thread(s) racking up CPU aren't actually doing callacks.

rick

>
> The good news at this point is, it is a single event. At first I
> thought the whole cluster got slow (it is always too slow ;) ), but
> it was only this node - the others have no cpu consumption on
> nfscbd.
>
> The bad thing is, I cannot remember why I did switch that thing
> on.
>
> ! Also, "nfsstat -m" on the client shows you/us what your mount
> ! options are.
>
> It had to be destroyed, as effects got worse.
>
> What I figured is: it didn't issue any syscalls, and it didn't
> act on kill -9.
> Which means: most likely it found an infinite loop inside the
> kernel, aka a never-returning syscall.
>
> ! The above suggests that there is still some activity on the client, but=
 the
> ! info. is limited.
>
> Yes, it got ever slower. The NFS mount is for /usr/ports, and I did
> fix some ports there. At some point a "make clean" would start to
> take minutes to complete, and there I noticed something is wrong.
> Finally it didn't even echo on the console (I had only one cpu
> available, and then when something is stuck within the kernel, all
> depends on preemption).
>
> ! If the client is still in this state, you can collect more info via:
> ! # tcpdump -s 0 -w out.pcap host <nfs-server>
> ! run for a little while.
>
> I had to destroy it. I tried to run dtrace to pinpoint exactly where
> that thing does execute, but it didn't startup. At that point I didn't
> consider it feasible to try further investigation.
> These are temporary building guests, they get destroyed after
> completion anyway.
>
> So, as apparently it was a single event, I might suggest we just
> remember that nfscbd /can do this/ (under yet unclear circumstances)
> and otherwise hope for the best.
>
> And probably I should get rid of that daemon altogether. I think I
> read something about these delegations, and it looked suitable for the
> usecase, but I didn't realize that it would need to be activated
> on the server also.
> (The usecase is, a snapshot + clone is created from the ports repo,
> then switched to a desired tag/branch, and that filetree is then
> used by a single guest, exclusively.)
>
>
> Thanks for Your help!
>
> cheerio,
> PMc



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5jfjMYj8hEyXO4sWqjiV=GM4T2tXEaf69nN5J02%2BvU=Q>