Date: Thu, 20 May 2021 22:15:13 -0700 From: Mark Millard <marklmi@yahoo.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context) Message-ID: <EA3C446E-7076-49BB-8FFE-123841673DA1@yahoo.com> In-Reply-To: <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com> References: <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <YQXPR0101MB0968B29934D7BD73FCA73907DD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YTOPR0101MB0970A1257E4DD37335D5B52EDD299@YTOPR0101MB0970.CANPRD01.PROD.OUTLOOK.COM> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[Direct drive connection to machine: no problem.] On 2021-May-20, at 21:40, Mark Millard <marklmi at yahoo.com> wrote: > [main test example and main/releng/13 mixed example] >=20 > On 2021-May-20, at 20:36, Mark Millard <marklmi at yahoo.com> wrote: >=20 >> [stable/13 test: example ends up being odder. That might >> allow eliminating some potential alternatives.] >>=20 >> On 2021-May-20, at 19:38, Mark Millard <marklmi at yahoo.com> wrote: >>>=20 >>> On 2021-May-20, at 18:09, Rick Macklem <rmacklem@uoguelph.ca> wrote: >>>>=20 >>>> Oh, one additional thing that I'll dare to top post... >>>> r367492 broke the TCP upcalls that the NFS server uses, such >>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can = occur. >>>> This has not yet been resolved in "main" etc and could explain >>>> why an RPC could time out for a soft mount. >>>=20 >>> See later notes that I added: soft mount is not required >>> to see the problem. >>>=20 >>>> You can revert the patch in r367492 to avoid the problem. >>>=20 >>> If I understand right, you are indicating that this would >>> not apply to the non-soft mount case that I got. >>>=20 >>>> Disabling TSO, LRO are also de-facto standard things to do when >>>> you observe weird NFS behaviour, because they are often broken >>>> in various network device drivers. >>>=20 >>> I'll have to figure out how to experiment with such. Things >>> are at defaults rather generally on the systems. I'm not >>> literate in the subject areas. >>>=20 >>> I'm the only user of the machines and network. It is not >>> outward facing. It is a rather small EtherNet network. >>>=20 >>>> rick >>>>=20 >>>> ________________________________________ >>>> From: owner-freebsd-stable@freebsd.org = <owner-freebsd-stable@freebsd.org> on behalf of Rick Macklem = <rmacklem@uoguelph.ca> >>>> Sent: Thursday, May 20, 2021 8:55 PM >>>> To: FreeBSD-STABLE Mailing List; Mark Millard >>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result = over nfs (in a zfs file systems context) >>>>=20 >>>> Mark Millard wrote: >>>>> [I warn that I'm a fairly minimal user of NFS >>>>> mounts, not knowing all that much. I'm mostly >>>>> reporting this in case it ends up as evidence >>>>> via eventually matching up with others observing >>>>> possibly related oddities.] >>>>>=20 >>>>> I got the following odd sequence (that I've >>>>> mixed notes into). It involved a diff -r over NFS >>>>> showing differences (files missing) and then a >>>>> later diff finding matches for the same files, >>>>> no file system changes made on either machine. >>>>> I'm unable to reproduce the oddity on demand. >>>>>=20 >>>>> Note: A larger scope diff -r originally returned the >>>>> below as well, but doing the narrower diff -r did >>>>> repeat the result and that is what I show. (I >>>>> make no use of devel/ice .) >>>>>=20 >>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >>> . . . >>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>>>>=20 >>>>> Note: The above was not expected. So I tried: >>>>>=20 >>>>> # ls -Tld /mnt/devel/ice/files/* >>>>> -rw-r--r-- 1 root wheel 755 Apr 21 21:07:54 2021 = /mnt/devel/ice/files/Make.rules.FreeBSD >>> . . . >>>>> -rw-r--r-- 1 root wheel 2588 Apr 21 21:07:54 2021 = /mnt/devel/ice/files/patch-scripts-TestUtil.py >>>>>=20 >>>>> Note: So that indicated that the files were there on the >>>>> machine that /mnt references. So attempting the original >>>>> diff -r again: >>>>>=20 >>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more >>>>> # >>>>>=20 >>>>> (Empty difference.) >>>>>=20 >>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*" >>>>> the odd result of the diff -r no longer happened: no >>>>> differences reported. >>>>>=20 >>>>>=20 >>>>>=20 >>>>> For reference (both machines reported): >>>>>=20 >>>>> . . . >>>>> The original mount command was on CA72_16Gp_ZFS: >>>>>=20 >>>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/ >>>> The likely explanation for this is your use of a "soft" mount. >>>> - If the NFS server is slow to respond or there is a temporary = network issue, >>>> the RPC request can time out and then the >>>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, = including the >>>> readdir(3) libc functions expect syscalls to fail this way... >>>> Then the cached directory is messed up. >>>> Doing the "ls" read the directory again and fixed the problem. >>>>=20 >>>> Try to reproduce it for a mount without the "soft" option. >>>> (If a mount point is hung, due to an unresponsive server "umount -N = /mnt" >>>> can usually get rid of it.) >>>> Personally, I thought "soft" was a bad idea when Sun introduced it = in NFS in 1985 >>>> and I still feel that way. >>>> --> If you can reproduce it without "soft" then I can't explain it. >>>> To be honest, the directory reading/caching code in the NFSv3 = client >>>> hasn't changed significantly in literally decades, as far as I = can remember. >>>=20 >>> Well . . . trying an even wider scope diff than >>> the original . . . >>>=20 >>> # umount /mnt/ >>> # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/ >>> # diff -r /usr/ports/ /mnt/ | more >>> Only in /mnt/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_ >>> Only in /usr/ports/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp= p__js__src25.cpp >>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD >>> Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules >>> Only in /usr/ports/devel/ice/files: patch-cpp-Makefile >>> . . . >>> Only in /usr/ports/devel/ice/files: = patch-python-test-Slice-unicodePaths-run.py >>> Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py >>> Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py >>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py >>>=20 >>> So the devel/ice files showed up again. >>>=20 >>> But 2 other lines show up, one finding a file supposedly only >>> on /mnt/. . . >>>=20 >>> QUOTE >>> Only in /mnt/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_ >>> END QUOTE >>>=20 >>> That seems to be a truncated file name. Looking directly on the = machine that >>> /mnt/ references (hitting tab at the end of the partial name to show = a >>> list): >>>=20 >>> # ls -Tld = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_ >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_gen-config.sh =20 >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_platform_aarch64_freebsd_build_js-confdefs.h =20 >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_platform_aarch64_freebsd_build_Unified__cpp__js__src0.cpp =20 >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_platform_aarch64_freebsd_build_Unified__cpp__js__src1.cpp =20 >>> . . . =20 >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_platform_aarch64_freebsd_build_Unified__cpp__js__src9.cpp =20 >>> = /usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-= 60_platform_aarch64_freebsd_include_js-config.h =20 >>>=20 >>> The other machine agrees (machine-local usage). >>>=20 >>> The other of the 2 new names is one of the matches to the prefix: >>>=20 >>> QUOTE >>> Only in /usr/ports/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp= p__js__src25.cpp >>> END QUOTE >>>=20 >>> For reference: I've not gotten any console messages about >>> anything during these. >>>=20 >>>> One additional thing to note is that cached directory contents are = invalidated >>>> when the directory's ctime changes. >>>=20 >>> I'm not aware of anything that should have been touching the >>> /usr/ports file systems on either machine any time near my >>> diff activities. (I'm the only system user.) >>>=20 >>>> I am not sure how/if/when ZFS changes a >>>> directory's ctime. However, if it was badly broken, I'd hear about = this a lot. >>>> (If the ZFS change to ZoL has changed its ctime handling, that = might also explain it >>>> and I'll be hearing a lot more soon as FreeBSD13 becomes adopted. I = never use ZFS and, >>>> as such, never test with it.) >>>=20 >>> I recently decided to try using bectl, which lead to my recent >>> ZFS-based system experiments. >>>=20 >>> This means I can boot the stable/13 or main [so: 14] that >>> I last built and try the same experiments with the same >>> /usr/ports file sysystems. releng/13 's release/13.0.0 , >>> stable/13 , and main are all non-debug builds as stands. I >>> could add debug builds of any or all, but it would take >>> a while. (aarch64 4-core Cortex-A72 contexts.) >>>=20 >>>> --> For UFS, if you use mtime, directory caching does not work as = well, which is >>>> why the client directory caching code uses ctime and not mtime = to detect that >>>> a directory has changed and cached directory blocks need to be = invalidated. >>>>=20 >>>> Jason Bacon did report a directory reading issue some months ago = that never >>>> quite got resolved, although I recall he said he couldn't reproduce = it after a >>>> system update, so he thought it was related to some local change he = had made. >>>> (I can't remember his email or I'd add him to the cc so he could = remind me what >>>> his case was. I do recall it being somewhat reproducible and = happened for both >>>> UFS and ZFS.) >>>>> The network is just a local EtherNet. >>>>=20 >>>=20 >>=20 >>=20 >> stable/13 got similar "diff -r /usr/ports/ /mnt/ | more" results but >> /mnt/devel/electron12/files indications of the = /usr/ports/devel/ice/files >> ones. It did again start with: >>=20 >> Only in /mnt/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_ >> Only in /usr/ports/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp= p__js__src25.cpp >>=20 >> for this rather wide range diff -r . It continued with: >>=20 >> Only in /mnt/devel/electron12/files:=20 >> Only in /mnt/devel/electron12/files: package.json >> Only in /mnt/devel/electron12/files: = patch-apps_ui_views_app__window__frame__view.cc >> Only in /mnt/devel/electron12/files: = patch-ash_display_mirror__window__controller.cc >> Only in /mnt/devel/electron12/files: patch-base_BUILD.gn >> . . . >>=20 >> It finished with: >>=20 >> Only in /mnt/devel/electron12/files: yarn.lock >> Only in /mnt/devel/electron12/files: = <A0><CE><C8>=D6=8F<DC>=DC=A62<B2><E2><AA>^H >> Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser >> Only in /usr/ports/www/chromium/files: = patch-chrome_browser_chrome__browser__main__posix.cc >>=20 >>=20 >> That last is the only /usr/ports/ prefixed path this time: the >> only one where it was under /mnt/ that something appeared to >> be missing. >>=20 >> It appears that the file name on the line after the yarn.lock >> line is garbage with no matching file present when using ls >> on the system that /mnt/ references. >>=20 >> Locally on each machine the devel/electron12/files/* files >> are shown by ls as present ( through yarn.lock ). >>=20 >> NOTE: >> I find it odd that the local /usr/ports/ ended up being where >> most of the files were reported as missing, instead of under >> /mnt/ : Wrong side for a network/network-protocol issue? >>=20 >>=20 >> For reference (David W. indicated I should look at ifconfig >> for figuring out controlling TSO and such so I figured I'd >> show the default ifconfig output): >>=20 >> # ifconfig >> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 >> options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>= >> inet6 ::1 prefixlen 128 >> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 >> inet 127.0.0.1 netmask 0xff000000 >> groups: lo >> nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> >> ue0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 = mtu 1500 >> = options=3D68009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA= TE,RXCSUM_IPV6,TXCSUM_IPV6> >> ether REPLACED >> inet 192.168.1.148 netmask 0xffffff00 broadcast 192.168.1.255 >> media: Ethernet autoselect (1000baseT <full-duplex>) >> status: active >> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >>=20 >> # ifconfig >> genet0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 = mtu 1500 >> = options=3D68000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>= >> ether REPLACED >> inet6 REPLACED%genet0 prefixlen 64 scopeid 0x1 >> inet6 REPLACED prefixlen 64 autoconf >> inet 192.168.1.170 netmask 0xffffff00 broadcast 192.168.1.255 >> media: Ethernet autoselect (1000baseT <full-duplex>) >> status: active >> nd6 options=3D23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> >> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 >> options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>= >> inet6 ::1 prefixlen 128 >> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 >> inet 127.0.0.1 netmask 0xff000000 >> groups: lo >> nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL> >>=20 >>=20 >> # uname -apKU >> FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 = stable/13-n245474-fb34817c686c-dirty: Sat May 1 02:27:02 PDT 2021 = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.= aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300504 1300504 >>=20 >> # ~/fbsd-based-on-what-commit.sh=20 >> branch: stable/13 >> merge-base: fb34817c686cc130449325499870e36979899801 >> merge-base: CommitDate: 2021-05-01 00:56:57 +0000 >> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump = __FreeBSD_version for commits efe7f12cd37b and 9781105bea58 >> n245474 (--first-parent --count for merge-base) >>=20 >> # uname -apKU >> FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 = stable/13-n245474-fb34817c686c-dirty: Sat May 1 02:27:02 PDT 2021 = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.= aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300504 1300504 >>=20 >> # ~/fbsd-based-on-what-commit.sh=20 >> branch: stable/13 >> merge-base: fb34817c686cc130449325499870e36979899801 >> merge-base: CommitDate: 2021-05-01 00:56:57 +0000 >> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump = __FreeBSD_version for commits efe7f12cd37b and 9781105bea58 >> n245474 (--first-parent --count for merge-base) >=20 > Both systems running main: >=20 > # diff -r /usr/ports/ /mnt/ | more > Only in /mnt/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_ > Only in /usr/ports/databases/mongodb42/files/aarch64: = patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp= p__js__src25.cpp > Only in /mnt/devel/electron12/files:=20 > Only in /mnt/devel/electron12/files:=20 > Only in /mnt/devel/electron12/files: patch-chrome2 > Only in /usr/ports/devel/electron12/files: = patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc > Only in /usr/ports/devel/electron12/files: = patch-chrome_browser_ui_webui_settings_appearance__handler.h > Only in /usr/ports/devel/electron12/files: = patch-components_previews_core_previews__features.cc > Only in /usr/ports/devel/electron12/files: = patch-ui_compositor_compositor.cc > Only in /mnt/devel/electron12/files: = <A0><CE><C8>=D6=8F<DC>=DC=A62<B2><E2><AA>^H >=20 > (That was all that was listed.) >=20 > # uname -apKU > FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246411-a6ca7519f89c-dirty: Sat May 1 19:07:50 PDT 2021 = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400013 1400013 >=20 > # ~/fbsd-based-on-what-commit.sh=20 > branch: main > merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6 > merge-base: CommitDate: 2021-05-01 00:58:11 +0000 > a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: = Optimize radix trap handling a little more > n246411 (--first-parent --count for merge-base) >=20 > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 = main-n246411-a6ca7519f89c-dirty: Sat May 1 19:07:50 PDT 2021 = root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400013 1400013 >=20 > # ~/fbsd-based-on-what-commit.sh=20 > branch: main > merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6 > merge-base: CommitDate: 2021-05-01 00:58:11 +0000 > a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: = Optimize radix trap handling a little more > n246411 (--first-parent --count for merge-base) >=20 >=20 >=20 > I tried main on the /usr/ side with releng/13 's release/13.0.0 > where /mnt/ references and got: >=20 > # diff -r /usr/ports/ /mnt/ | more > Only in /mnt/devel/electron12/files: package.json > Only in /mnt/devel/electron12/files: = patch-apps_ui_views_app__window__frame__view.cc > Only in /mnt/devel/electron12/files: = patch-ash_display_mirror__window__controller.cc > Only in /mnt/devel/electron12/files: patch-base_BUILD.gn > . . . > Only in /mnt/devel/electron12/files: = patch-weblayer_browser_system__network__context__manager.cc > Only in /mnt/devel/electron12/files: = patch-weblayer_common_weblayer__paths.cc > Only in /mnt/devel/electron12/files: yarn.lock > Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD > Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules > Only in /usr/ports/devel/ice/files: patch-cpp-Makefile > . . . > Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py > Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py > Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py > Only in /mnt/games: 0ad > Only in /mnt/games: 0verkill > Only in /mnt/games: 2048 > . . . > Only in /mnt/games: zaz > Only in /mnt/games: zhlt > Only in /mnt/games: ztrack >=20 > No obvious garbage or truncated names. Another mix of > /mnt/ vs. /usr/ being the "missing" side. >=20 > NOTE: > So far I do not see an obvious reason to prefer any > specific one of releng/13 vs. stable/13 vs. main > at either end of the connection for the vintages that > I happen to have in place for them. >=20 Just to be sure, I shutdown the machine that /mnt was referencing and moved the drive to the other machine, directly connected. # zpool import -f -N -R /mnt -t zroot zptmp # zfs mount zptmp/usr/ports # diff -r /usr/ports/ /mnt/usr/ports/ | more # So: The diff -r works for this context. The remote status is somehow involved in producing the type of problem. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EA3C446E-7076-49BB-8FFE-123841673DA1>