Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 May 2021 00:56:07 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
Message-ID:  <YQXPR0101MB09683A5BE725EF50E590A391DD289@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com>
References:  <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <YQXPR0101MB0968B29934D7BD73FCA73907DD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YTOPR0101MB0970A1257E4DD37335D5B52EDD299@YTOPR0101MB0970.CANPRD01.PROD.OUTLOOK.COM> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com> <YQXPR0101MB0968EA2F32C1EEB8CC8CAD9FDD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <D6842A56-95EC-4A2D-99E3-3DCF95C50F68@yahoo.com> <YQXPR0101MB096874849E9749010F4BDD5CDD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mark Millard wrote:=0A=
[stuff snipped]=0A=
>Well, why is it that ls -R, find, and diff -r all get file=0A=
>name problems via genet0 but diff -r gets no problems=0A=
>comparing the content of files that it does match up (the=0A=
>vast majority)? Any clue how could the problems possibly=0A=
>be unique to the handling of file names/paths? Does it=0A=
>suggest anything else to look into for getting some more=0A=
>potentially useful evidence?=0A=
Well, all I can do is describe the most common TSO related=0A=
failure:=0A=
- When a read RPC reply (including NFS/RPC/TCP/IP headers)=0A=
  is slightly less than 64K bytes (many TSO implementations are=0A=
  limited to 64K or 32 discontiguous segments, think 32 2K=0A=
  mbuf clusters), the driver decides it is ok, but when the MAC=0A=
  header is added it exceeds what the hardware can handle correctly...=0A=
--> This will happen when reading a regular file that is slightly less=0A=
       than a multiple of 64K in size.=0A=
or=0A=
--> This will happen when reading just about any large directory,=0A=
      since the directory reply for a 64K request is converted to Sun XDR=
=0A=
      format and clipped at the last full directory entry that will fit wit=
hin 64K.=0A=
For ports, where most files are small, I think you can tell which is more=
=0A=
likely to happen.=0A=
--> If TSO is disabled, I have no idea how this might matter, but??=0A=
=0A=
>I'll note that netstat -I ue0 -d and netstat -I genet0 -d=0A=
>do not report changes in Ierrs or Idrop in a before vs.=0A=
>after failures comparison. (There may be better figures=0A=
>to look at for all I know.)=0A=
>=0A=
>I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"=0A=
>and got no obvious change in behavior.=0A=
All we know is that the data is getting corrupted somehow.=0A=
=0A=
NFS traffic looks very different than typical TCP traffic. It is=0A=
mostly small messages travelling in both directions concurrently,=0A=
with some large messages thrown in the mix.=0A=
All I'm saying is that, testing a net interface with something like=0A=
bulk data transfer in one direction doesn't verify it works for NFS=0A=
traffic.=0A=
=0A=
Also, the large RPC messages are a chain of about 33 mbufs of=0A=
various lengths, including a mix of partial clusters and regular=0A=
data mbufs, whereas a bulk send on a socket will typically=0A=
result in an mbuf chain of a lot of full 2K clusters.=0A=
--> As such, NFS can be good at tickling subtle bugs it the=0A=
      net driver related to mbuf handling.=0A=
=0A=
rick=0A=
=0A=
> W.r.t. reverting r367492...the patch to replace r367492 was just=0A=
> committed to "main" by rscheff@ with a two week MFC, so it=0A=
> should be in stable/13 soon. Not sure if an errata can be done=0A=
> for it for releng13.0?=0A=
=0A=
That update is reported to be causing "rack" related panics:=0A=
=0A=
https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.ht=
ml=0A=
=0A=
reports (via links):=0A=
=0A=
panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller=
/managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack=
.c:10632=0A=
=0A=
Still, I have a non-debug update to main building and will=0A=
likely do a debug build as well. llvm is rebuilding, so=0A=
the builds will take a notable time.=0A=
=0A=
> Thanks for isolating this, rick=0A=
> ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.=0A=
=0A=
I'll warn that the primary "small arm" development/support=0A=
folk(s) do not work on the RPi*'s these days, beyond=0A=
committing what others provide and the like.=0A=
=0A=
=0A=
=0A=
=0A=
=3D=3D=3D=0A=
Mark Millard=0A=
marklmi at yahoo.com=0A=
( dsl-only.net went=0A=
away in early 2018-Mar)=0A=
=0A=
=0A=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB09683A5BE725EF50E590A391DD289>