Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Jan 2025 05:53:38 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        "Peter 'PMc' Much" <pmc@citylink.dinoex.sub.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: system stalled, no I/O but 100% CPU from nfs
Message-ID:  <CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg@mail.gmail.com>
In-Reply-To: <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact>
References:  <Z3tdPjxTE6GZmzwW@disp.intra.daemon.contact>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 5, 2025 at 8:45=E2=80=AFPM Peter 'PMc' Much
<pmc@citylink.dinoex.sub.org> wrote:
>
> Cheers,
>
>  This doesn't look good. It goes on for hours. What can be done about it?
> (13.4 client & server)
>
>
> 44 processes:  4 running, 39 sleeping, 1 waiting
> CPU:  0.4% user,  0.0% nice, 99.6% system,  0.0% interrupt,  0.0% idle
> Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free
> ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Other
>      332M Compressed, 804M Uncompressed, 2.42:1 Ratio
> Swap: 15G Total, 15G Free
>
>   PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAN=
D
>   417 root          4  52    0    12M  2148K RUN     20:55  99.12% nfscbd
Do you have delegations enabled on your server
(vfs.nfsd.issue_delegations not 0)?
(If you do not, I have no idea why the server would be doing
callbacks, which is what nfscbd
handles.)

Also, "nfsstat -m" on the client shows you/us what your mount options are.

>     0 root         65 -16    -     0B  1040K swapin   0:17   0.64% kernel
> 11054 root          1  52    0    18M  7664K RUN      0:04   0.10% bsdtar
>    11 root         15 -56    -     0B   240K WAIT     0:15   0.05% intr
>    16 root          1 -16    -     0B    16K -        0:01   0.03% racctd
> 11062 root          1  20    0    14M  3804K RUN      0:00   0.03% top
>     7 root          3 -16    -     0B    48K psleep   0:00   0.01% pageda=
emon
> 11056 root          1  20    0    21M    10M select   0:00   0.01% sshd
>     6 root          1 -16    -     0B    16K -        0:00   0.01% rand_h=
arvest
>
>
>       Interface           Traffic               Peak                Total
>          vtnet0  in      5.380 KB/s          9.113 KB/s          781.439 =
MB
>                  out     4.012 KB/s          8.002 KB/s          674.294 =
MB
>
>
> # nfsstat -zc > /dev/null ; sleep 1 ; nfsstat -c
Adding -E makes it show all RPC counts. (Without -E you just get the
"old Sun compatible"
output.

> Rpc Counts:
>       Getattr      Setattr       Lookup     Readlink         Read        =
Write       Create       Remove
>             1            2            5            0            0        =
    0            0            0
>        Rename         Link      Symlink        Mkdir        Rmdir      Re=
addir     RdirPlus       Access
>             0            0            0            0            0        =
    1            0            1
>         Mknod       Fsstat       Fsinfo     PathConf       Commit
>             0            0            0            0            0
> Rpc Info:
>      TimedOut      Invalid    X Replies      Retries     Requests
>             0            0            0            0           11
> Cache Info:
>     Attr Hits  Attr Misses    Lkup Hits  Lkup Misses    BioR Hits  BioR M=
isses    BioW Hits  BioW Misses
>            11            1            2            5            0        =
    0            0            0
>    BioRL Hits BioRL Misses    BioD Hits  BioD Misses    DirE Hits  DirE M=
isses    Accs Hits  Accs Misses
>             0            0            1            1            1        =
    0            8            1
>
>
The above suggests that there is still some activity on the client, but the
info. is limited.

If the client is still in this state, you can collect more info via:
# tcpdump -s 0 -w out.pcap host <nfs-server>
run for a little while.
The out.pcap file needs to be looked at in wireshark (tcpdump is useless
at decoding NFS). If there is nothing secret in it, you can email it to
me as an attachment, so I can take a look.

# ps axHl done repeatedly gets a lot more info about the NFS related thread=
s.
(I'll admit I doubt the info is useful for this case?)

# nfsstat -E -c -z repeatedly as above.

If you just want to get rid of the mount
# umount -N <mnt-path>
should work, although it can take a couple of minutes.

Either not running "nfscbd" on the client or disabling delegations by
setting vfs.nfsd.issue_delegations=3D0 on the server (assuming you
have them enabled) ,might/should avoid the problem.

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5AzL9%2BWpjRV9N1Wzy94RpA2L93NqnYFjFvx38iAo1iyg>