Date: Sat, 22 May 2021 00:56:07 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Mark Millard <marklmi@yahoo.com> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) Message-ID: <YQXPR0101MB09683A5BE725EF50E590A391DD289@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <YQXPR0101MB0968B29934D7BD73FCA73907DD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YTOPR0101MB0970A1257E4DD37335D5B52EDD299@YTOPR0101MB0970.CANPRD01.PROD.OUTLOOK.COM> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com> <YQXPR0101MB0968EA2F32C1EEB8CC8CAD9FDD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <D6842A56-95EC-4A2D-99E3-3DCF95C50F68@yahoo.com> <YQXPR0101MB096874849E9749010F4BDD5CDD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <508C3B05-79E5-49ED-8032-DA7DF249E154@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Mark Millard wrote:=0A= [stuff snipped]=0A= >Well, why is it that ls -R, find, and diff -r all get file=0A= >name problems via genet0 but diff -r gets no problems=0A= >comparing the content of files that it does match up (the=0A= >vast majority)? Any clue how could the problems possibly=0A= >be unique to the handling of file names/paths? Does it=0A= >suggest anything else to look into for getting some more=0A= >potentially useful evidence?=0A= Well, all I can do is describe the most common TSO related=0A= failure:=0A= - When a read RPC reply (including NFS/RPC/TCP/IP headers)=0A= is slightly less than 64K bytes (many TSO implementations are=0A= limited to 64K or 32 discontiguous segments, think 32 2K=0A= mbuf clusters), the driver decides it is ok, but when the MAC=0A= header is added it exceeds what the hardware can handle correctly...=0A= --> This will happen when reading a regular file that is slightly less=0A= than a multiple of 64K in size.=0A= or=0A= --> This will happen when reading just about any large directory,=0A= since the directory reply for a 64K request is converted to Sun XDR= =0A= format and clipped at the last full directory entry that will fit wit= hin 64K.=0A= For ports, where most files are small, I think you can tell which is more= =0A= likely to happen.=0A= --> If TSO is disabled, I have no idea how this might matter, but??=0A= =0A= >I'll note that netstat -I ue0 -d and netstat -I genet0 -d=0A= >do not report changes in Ierrs or Idrop in a before vs.=0A= >after failures comparison. (There may be better figures=0A= >to look at for all I know.)=0A= >=0A= >I tried "ifconfig genet0 -rxcsum -rxcsum -rxcsum6 -txcsum6"=0A= >and got no obvious change in behavior.=0A= All we know is that the data is getting corrupted somehow.=0A= =0A= NFS traffic looks very different than typical TCP traffic. It is=0A= mostly small messages travelling in both directions concurrently,=0A= with some large messages thrown in the mix.=0A= All I'm saying is that, testing a net interface with something like=0A= bulk data transfer in one direction doesn't verify it works for NFS=0A= traffic.=0A= =0A= Also, the large RPC messages are a chain of about 33 mbufs of=0A= various lengths, including a mix of partial clusters and regular=0A= data mbufs, whereas a bulk send on a socket will typically=0A= result in an mbuf chain of a lot of full 2K clusters.=0A= --> As such, NFS can be good at tickling subtle bugs it the=0A= net driver related to mbuf handling.=0A= =0A= rick=0A= =0A= > W.r.t. reverting r367492...the patch to replace r367492 was just=0A= > committed to "main" by rscheff@ with a two week MFC, so it=0A= > should be in stable/13 soon. Not sure if an errata can be done=0A= > for it for releng13.0?=0A= =0A= That update is reported to be causing "rack" related panics:=0A= =0A= https://lists.freebsd.org/pipermail/dev-commits-src-main/2021-May/004440.ht= ml=0A= =0A= reports (via links):=0A= =0A= panic: _mtx_lock_sleep: recursed on non-recursive mutex so_snd @ /syzkaller= /managers/i386/kernel/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack= .c:10632=0A= =0A= Still, I have a non-debug update to main building and will=0A= likely do a debug build as well. llvm is rebuilding, so=0A= the builds will take a notable time.=0A= =0A= > Thanks for isolating this, rick=0A= > ps: Co-incidentally, I've been thinking of buying an RBPi4 as a toy.=0A= =0A= I'll warn that the primary "small arm" development/support=0A= folk(s) do not work on the RPi*'s these days, beyond=0A= committing what others provide and the like.=0A= =0A= =0A= =0A= =0A= =3D=3D=3D=0A= Mark Millard=0A= marklmi at yahoo.com=0A= ( dsl-only.net went=0A= away in early 2018-Mar)=0A= =0A= =0A=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB09683A5BE725EF50E590A391DD289>