Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jun 2024 07:21:25 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        fs@freebsd.org
Subject:   Re: NFS, intermittent 'RPC struct is bad' errors
Message-ID:  <CAM5tNy42P34s-mTWmOmaYiNUEtY9uFpfpO6copVJ7OfDZ1oKbw@mail.gmail.com>
In-Reply-To: <ZnJ7ZMWfCEQA0rLg@ilythia.eden.le-fay.org>
References:  <ZnJ7ZMWfCEQA0rLg@ilythia.eden.le-fay.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 18, 2024 at 11:32=E2=80=AFPM Lexi Winter <lexi@le-fay.org> wrot=
e:
>
> hi,
>
> i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT
> Kerberos KDC), with the server exporting ZFS filesystems.
>
> recently i've noticed intermittent errors of 'RPC struct is bad' when
> writing to the NFS server, which usually resolves itself after retrying.
> for example:
>
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f++++++++++ Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>          32,768   0%    0.00kB/s    0:00:00  rsync: [receiver] write fail=
ed on "/data/public/TV/Star Trek Prodigy/Season 01/Star.Trek.Prodigy.S01E01=
E02.1080p.WEBRip.x265-KONTRAST.mkv": RPC struct is bad (72)
> rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3D3.=
3.0]
>
> rsync: [sender] write error: Broken pipe (32)
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f.st....... Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>     912,704,431 100%   96.51MB/s    0:00:09 (xfr#1, to-chk=3D18/19)
> >f++++++++++ Star.Trek.Prodigy.S01E03.1080p.WEBRip.x265-KONTRAST.mkv
>     477,408,567 100%  100.06MB/s    0:00:04 (xfr#2, to-chk=3D17/19)
> [...]
>
> the client is running FreeBSD 15.0-CURRENT from around May 24, and the
> server is running a slightly older 15.0-CURRENT from around May 23.
>
> /etc/exports on the server is pretty standard:
>
> /data/public                    -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> /data/public/Books              -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> /data/public/CalibreLibrary     -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> /data/public/Comics             -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> /data/public/Films              -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> /data/public/Miscellaneous      -sec=3Dkrb5:krb5i:krb5p   -network 2001:8=
b0:aab5::/48
> V4: /data                       -sec=3Dsys:krb5:krb5i:krb5p       -networ=
k 2001:8b0:aab5::/48
>
> client mount options:
>
> hemlock.eden.le-fay.org:/public /data/public    nfs     rw,nfsv4,minorver=
sion=3D2,sec=3Dkrb5p,gssname=3Dhost,bgnow,proto=3Dtcp6,nconnect=3D4,rsize=
=3D1048576,wsize=3D1048576,noncontigwr      0 0
>
> is there anything more i can do investigate this?  would a tcpdump
> capture of the error be useful (considering all the RPC traffic is
> Kerberos-encrypted)?
If you could do a run that causes these failures safely without on the wire
encryption, you could switch the mount to "krb5i". Then a tcpdump done
via something like:
# tcpdump -s 0 -w out.pcap host <other-system>
followed by pulling out.pcap into wireshark, you could maybe see where the
failure is occurring. (Unlike tcpdump, wireshark decodes NFS traffic
quite nicely.)

rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy42P34s-mTWmOmaYiNUEtY9uFpfpO6copVJ7OfDZ1oKbw>