Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 May 2021 22:15:13 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: releng/13 release/13.0.0 : odd/incorrect diff result over nfs (in a zfs file systems context)
Message-ID:  <EA3C446E-7076-49BB-8FFE-123841673DA1@yahoo.com>
In-Reply-To: <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com>
References:  <623369D9-5EE5-4FEF-B9AD-56499E8F1C09.ref@yahoo.com> <623369D9-5EE5-4FEF-B9AD-56499E8F1C09@yahoo.com> <YQXPR0101MB0968B29934D7BD73FCA73907DD299@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <YTOPR0101MB0970A1257E4DD37335D5B52EDD299@YTOPR0101MB0970.CANPRD01.PROD.OUTLOOK.COM> <04D7264A-206B-4281-B452-779B01EA3327@yahoo.com> <34E915B3-30DF-408C-A931-C39188F3EB0F@yahoo.com> <E938DB30-22C9-4765-9E01-601D80B36910@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Direct drive connection to machine: no problem.]

On 2021-May-20, at 21:40, Mark Millard <marklmi at yahoo.com> wrote:

> [main test example and main/releng/13 mixed example]
>=20
> On 2021-May-20, at 20:36, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> [stable/13 test: example ends up being odder. That might
>> allow eliminating some potential alternatives.]
>>=20
>> On 2021-May-20, at 19:38, Mark Millard <marklmi at yahoo.com> wrote:
>>>=20
>>> On 2021-May-20, at 18:09, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>>>=20
>>>> Oh, one additional thing that I'll dare to top post...
>>>> r367492 broke the TCP upcalls that the NFS server uses, such
>>>> that intermittent hangs of NFS mounts to FreeBSD13 servers can =
occur.
>>>> This has not yet been resolved in "main" etc and could explain
>>>> why an RPC could time out for a soft mount.
>>>=20
>>> See later notes that I added: soft mount is not required
>>> to see the problem.
>>>=20
>>>> You can revert the patch in r367492 to avoid the problem.
>>>=20
>>> If I understand right, you are indicating that this would
>>> not apply to the non-soft mount case that I got.
>>>=20
>>>> Disabling TSO, LRO are also de-facto standard things to do when
>>>> you observe weird NFS  behaviour, because they are often broken
>>>> in various network device drivers.
>>>=20
>>> I'll have to figure out how to experiment with such. Things
>>> are at defaults rather generally on the systems. I'm not
>>> literate in the subject areas.
>>>=20
>>> I'm the only user of the machines and network. It is not
>>> outward facing. It is a rather small EtherNet network.
>>>=20
>>>> rick
>>>>=20
>>>> ________________________________________
>>>> From: owner-freebsd-stable@freebsd.org =
<owner-freebsd-stable@freebsd.org> on behalf of Rick Macklem =
<rmacklem@uoguelph.ca>
>>>> Sent: Thursday, May 20, 2021 8:55 PM
>>>> To: FreeBSD-STABLE Mailing List; Mark Millard
>>>> Subject: Re: releng/13 release/13.0.0 : odd/incorrect diff result =
over nfs (in a zfs file systems context)
>>>>=20
>>>> Mark Millard wrote:
>>>>> [I warn that I'm a fairly minimal user of NFS
>>>>> mounts, not knowing all that much. I'm mostly
>>>>> reporting this in case it ends up as evidence
>>>>> via eventually matching up with others observing
>>>>> possibly related oddities.]
>>>>>=20
>>>>> I got the following odd sequence (that I've
>>>>> mixed notes into). It involved a diff -r over NFS
>>>>> showing differences (files missing) and then a
>>>>> later diff finding matches for the same files,
>>>>> no file system changes made on either machine.
>>>>> I'm unable to reproduce the oddity on demand.
>>>>>=20
>>>>> Note: A larger scope diff -r originally returned the
>>>>> below as well, but doing the narrower diff -r did
>>>>> repeat the result and that is what I show. (I
>>>>> make no use of devel/ice .)
>>>>>=20
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>>> . . .
>>>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>>>=20
>>>>> Note: The above was not expected. So I tried:
>>>>>=20
>>>>> # ls -Tld /mnt/devel/ice/files/*
>>>>> -rw-r--r--  1 root  wheel   755 Apr 21 21:07:54 2021 =
/mnt/devel/ice/files/Make.rules.FreeBSD
>>> . . .
>>>>> -rw-r--r--  1 root  wheel  2588 Apr 21 21:07:54 2021 =
/mnt/devel/ice/files/patch-scripts-TestUtil.py
>>>>>=20
>>>>> Note: So that indicated that the files were there on the
>>>>> machine that /mnt references. So attempting the original
>>>>> diff -r again:
>>>>>=20
>>>>> # diff -r /usr/ports/devel/ice/files /mnt/devel/ice/files | more
>>>>> #
>>>>>=20
>>>>> (Empty difference.)
>>>>>=20
>>>>> Note: So after the explicit "ls -Tld /mnt/devel/ice/files/*"
>>>>> the odd result of the diff -r no longer happened: no
>>>>> differences reported.
>>>>>=20
>>>>>=20
>>>>>=20
>>>>> For reference (both machines reported):
>>>>>=20
>>>>> . . .
>>>>> The original mount command was on CA72_16Gp_ZFS:
>>>>>=20
>>>>> # mount -onoatime,soft 192.168.1.170:/usr/ports/ /mnt/
>>>> The likely explanation for this is your use of a "soft" mount.
>>>> - If the NFS server is slow to respond or there is a temporary =
network issue,
>>>> the RPC request can time out and then the
>>>> syscall can fail with EINT/ETIMEDOUT. Since almost nothing, =
including the
>>>> readdir(3) libc functions expect syscalls to fail this way...
>>>> Then the cached directory is messed up.
>>>> Doing the "ls" read the directory again and fixed the problem.
>>>>=20
>>>> Try to reproduce it for a mount without the "soft" option.
>>>> (If a mount point is hung, due to an unresponsive server "umount -N =
/mnt"
>>>> can usually get rid of it.)
>>>> Personally, I thought "soft" was a bad idea when Sun introduced it =
in NFS in 1985
>>>> and I still feel that way.
>>>> --> If you can reproduce it without "soft" then I can't explain it.
>>>>   To be honest, the directory reading/caching code in the NFSv3 =
client
>>>>   hasn't changed significantly in literally decades, as far as I =
can remember.
>>>=20
>>> Well . . . trying an even wider scope diff than
>>> the original . . .
>>>=20
>>> # umount /mnt/
>>> # mount -onoatime 192.168.1.170:/usr/ports/ /mnt/
>>> # diff -r /usr/ports/ /mnt/ | more
>>> Only in /mnt/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_
>>> Only in /usr/ports/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp=
p__js__src25.cpp
>>> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
>>> Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
>>> Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
>>> . . .
>>> Only in /usr/ports/devel/ice/files: =
patch-python-test-Slice-unicodePaths-run.py
>>> Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py
>>> Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py
>>> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
>>>=20
>>> So the devel/ice files showed up again.
>>>=20
>>> But 2 other lines show up, one finding a file supposedly only
>>> on /mnt/. . .
>>>=20
>>> QUOTE
>>> Only in /mnt/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_
>>> END QUOTE
>>>=20
>>> That seems to be a truncated file name. Looking directly on the =
machine that
>>> /mnt/ references (hitting tab at the end of the partial name to show =
a
>>> list):
>>>=20
>>> # ls -Tld =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_gen-config.sh                                             =20
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_platform_aarch64_freebsd_build_js-confdefs.h             =20
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_platform_aarch64_freebsd_build_Unified__cpp__js__src0.cpp =20
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_platform_aarch64_freebsd_build_Unified__cpp__js__src1.cpp =20
>>> . . . =20
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_platform_aarch64_freebsd_build_Unified__cpp__js__src9.cpp =20
>>> =
/usr/ports/databases/mongodb42/files/aarch64/patch-src_third__party_mozjs-=
60_platform_aarch64_freebsd_include_js-config.h           =20
>>>=20
>>> The other machine agrees (machine-local usage).
>>>=20
>>> The other of the 2 new names is one of the matches to the prefix:
>>>=20
>>> QUOTE
>>> Only in /usr/ports/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp=
p__js__src25.cpp
>>> END QUOTE
>>>=20
>>> For reference: I've not gotten any console messages about
>>> anything during these.
>>>=20
>>>> One additional thing to note is that cached directory contents are =
invalidated
>>>> when the directory's ctime changes.
>>>=20
>>> I'm not aware of anything that should have been touching the
>>> /usr/ports file systems on either machine any time near my
>>> diff activities. (I'm the only system user.)
>>>=20
>>>> I am not sure how/if/when ZFS changes a
>>>> directory's ctime. However, if it was badly broken, I'd hear about =
this a lot.
>>>> (If the ZFS change to ZoL has changed its ctime handling, that =
might also explain it
>>>> and I'll be hearing a lot more soon as FreeBSD13 becomes adopted. I =
never use ZFS and,
>>>> as such, never test with it.)
>>>=20
>>> I recently decided to try using bectl, which lead to my recent
>>> ZFS-based system experiments.
>>>=20
>>> This means I can boot the stable/13 or main [so: 14] that
>>> I last built and try the same experiments with the same
>>> /usr/ports file sysystems. releng/13 's release/13.0.0 ,
>>> stable/13 , and main are all non-debug builds as stands. I
>>> could add debug builds of any or all, but it would take
>>> a while. (aarch64 4-core Cortex-A72 contexts.)
>>>=20
>>>> --> For UFS, if you use mtime, directory caching does not work as =
well, which is
>>>>    why the client directory caching code uses ctime and not mtime =
to detect that
>>>>    a directory has changed and cached directory blocks need to be =
invalidated.
>>>>=20
>>>> Jason Bacon did report a directory reading issue some months ago =
that never
>>>> quite got resolved, although I recall he said he couldn't reproduce =
it after a
>>>> system update, so he thought it was related to some local change he =
had made.
>>>> (I can't remember his email or I'd add him to the cc so he could =
remind me what
>>>> his case was. I do recall it being somewhat reproducible and =
happened for both
>>>> UFS and ZFS.)
>>>>> The network is just a local EtherNet.
>>>>=20
>>>=20
>>=20
>>=20
>> stable/13 got similar "diff -r /usr/ports/ /mnt/ | more" results but
>> /mnt/devel/electron12/files indications of the =
/usr/ports/devel/ice/files
>> ones. It did again start with:
>>=20
>> Only in /mnt/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_
>> Only in /usr/ports/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp=
p__js__src25.cpp
>>=20
>> for this rather wide range diff -r . It continued with:
>>=20
>> Only in /mnt/devel/electron12/files:=20
>> Only in /mnt/devel/electron12/files: package.json
>> Only in /mnt/devel/electron12/files: =
patch-apps_ui_views_app__window__frame__view.cc
>> Only in /mnt/devel/electron12/files: =
patch-ash_display_mirror__window__controller.cc
>> Only in /mnt/devel/electron12/files: patch-base_BUILD.gn
>> . . .
>>=20
>> It finished with:
>>=20
>> Only in /mnt/devel/electron12/files: yarn.lock
>> Only in /mnt/devel/electron12/files: =
<A0><CE><C8>=D6=8F<DC>=DC=A62<B2><E2><AA>^H
>> Only in /mnt/www/chromium/files: patch-chrome_browser_chrome__browser
>> Only in /usr/ports/www/chromium/files: =
patch-chrome_browser_chrome__browser__main__posix.cc
>>=20
>>=20
>> That last is the only /usr/ports/ prefixed path this time: the
>> only one where it was under /mnt/ that something appeared to
>> be missing.
>>=20
>> It appears that the file name on the line after the yarn.lock
>> line is garbage with no matching file present when using ls
>> on the system that /mnt/ references.
>>=20
>> Locally on each machine the devel/electron12/files/* files
>> are shown by ls as present ( through yarn.lock ).
>>=20
>> NOTE:
>> I find it odd that the local /usr/ports/ ended up being where
>> most of the files were reported as missing, instead of under
>> /mnt/ : Wrong side for a network/network-protocol issue?
>>=20
>>=20
>> For reference (David W. indicated I should look at ifconfig
>> for figuring out controlling TSO and such so I figured I'd
>> show the default ifconfig output):
>>=20
>> # ifconfig
>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>       options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>=

>>       inet6 ::1 prefixlen 128
>>       inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
>>       inet 127.0.0.1 netmask 0xff000000
>>       groups: lo
>>       nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>> ue0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>       =
options=3D68009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA=
TE,RXCSUM_IPV6,TXCSUM_IPV6>
>>       ether REPLACED
>>       inet 192.168.1.148 netmask 0xffffff00 broadcast 192.168.1.255
>>       media: Ethernet autoselect (1000baseT <full-duplex>)
>>       status: active
>>       nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>=20
>> # ifconfig
>> genet0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>       =
options=3D68000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>=

>>       ether REPLACED
>>       inet6 REPLACED%genet0 prefixlen 64 scopeid 0x1
>>       inet6 REPLACED prefixlen 64 autoconf
>>       inet 192.168.1.170 netmask 0xffffff00 broadcast 192.168.1.255
>>       media: Ethernet autoselect (1000baseT <full-duplex>)
>>       status: active
>>       nd6 options=3D23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>       options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>=

>>       inet6 ::1 prefixlen 128
>>       inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
>>       inet 127.0.0.1 netmask 0xff000000
>>       groups: lo
>>       nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>=20
>>=20
>> # uname -apKU
>> FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 =
stable/13-n245474-fb34817c686c-dirty: Sat May  1 02:27:02 PDT 2021     =
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.=
aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300504 1300504
>>=20
>> # ~/fbsd-based-on-what-commit.sh=20
>> branch: stable/13
>> merge-base: fb34817c686cc130449325499870e36979899801
>> merge-base: CommitDate: 2021-05-01 00:56:57 +0000
>> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump =
__FreeBSD_version for commits efe7f12cd37b and 9781105bea58
>> n245474 (--first-parent --count for merge-base)
>>=20
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #1 =
stable/13-n245474-fb34817c686c-dirty: Sat May  1 02:27:02 PDT 2021     =
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.=
aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1300504 1300504
>>=20
>> # ~/fbsd-based-on-what-commit.sh=20
>> branch: stable/13
>> merge-base: fb34817c686cc130449325499870e36979899801
>> merge-base: CommitDate: 2021-05-01 00:56:57 +0000
>> fb34817c686c (HEAD -> stable/13, freebsd/stable/13) param.h: bump =
__FreeBSD_version for commits efe7f12cd37b and 9781105bea58
>> n245474 (--first-parent --count for merge-base)
>=20
> Both systems running main:
>=20
> # diff -r /usr/ports/ /mnt/ | more
> Only in /mnt/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_
> Only in /usr/ports/databases/mongodb42/files/aarch64: =
patch-src_third__party_mozjs-60_platform_aarch64_freebsd_build_Unified__cp=
p__js__src25.cpp
> Only in /mnt/devel/electron12/files:=20
> Only in /mnt/devel/electron12/files:=20
> Only in /mnt/devel/electron12/files: patch-chrome2
> Only in /usr/ports/devel/electron12/files: =
patch-chrome_browser_media_webrtc_webrtc__logging__controller.cc
> Only in /usr/ports/devel/electron12/files: =
patch-chrome_browser_ui_webui_settings_appearance__handler.h
> Only in /usr/ports/devel/electron12/files: =
patch-components_previews_core_previews__features.cc
> Only in /usr/ports/devel/electron12/files: =
patch-ui_compositor_compositor.cc
> Only in /mnt/devel/electron12/files: =
<A0><CE><C8>=D6=8F<DC>=DC=A62<B2><E2><AA>^H
>=20
> (That was all that was listed.)
>=20
> # uname -apKU
> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 =
main-n246411-a6ca7519f89c-dirty: Sat May  1 19:07:50 PDT 2021     =
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
4.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400013 1400013
>=20
> # ~/fbsd-based-on-what-commit.sh=20
> branch: main
> merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6
> merge-base: CommitDate: 2021-05-01 00:58:11 +0000
> a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: =
Optimize radix trap handling a little more
> n246411 (--first-parent --count for merge-base)
>=20
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #1 =
main-n246411-a6ca7519f89c-dirty: Sat May  1 19:07:50 PDT 2021     =
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
4.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400013 1400013
>=20
> # ~/fbsd-based-on-what-commit.sh=20
> branch: main
> merge-base: a6ca7519f89c52e9fab205cded0f2bf32d914cd6
> merge-base: CommitDate: 2021-05-01 00:58:11 +0000
> a6ca7519f89c (HEAD -> main, freebsd/main, freebsd/HEAD) powerpc64: =
Optimize radix trap handling a little more
> n246411 (--first-parent --count for merge-base)
>=20
>=20
>=20
> I tried main on the /usr/ side with releng/13 's release/13.0.0
> where /mnt/ references and got:
>=20
> # diff -r /usr/ports/ /mnt/ | more
> Only in /mnt/devel/electron12/files: package.json
> Only in /mnt/devel/electron12/files: =
patch-apps_ui_views_app__window__frame__view.cc
> Only in /mnt/devel/electron12/files: =
patch-ash_display_mirror__window__controller.cc
> Only in /mnt/devel/electron12/files: patch-base_BUILD.gn
> . . .
> Only in /mnt/devel/electron12/files: =
patch-weblayer_browser_system__network__context__manager.cc
> Only in /mnt/devel/electron12/files: =
patch-weblayer_common_weblayer__paths.cc
> Only in /mnt/devel/electron12/files: yarn.lock
> Only in /usr/ports/devel/ice/files: Make.rules.FreeBSD
> Only in /usr/ports/devel/ice/files: patch-config-Make.common.rules
> Only in /usr/ports/devel/ice/files: patch-cpp-Makefile
> . . .
> Only in /usr/ports/devel/ice/files: patch-scripts-Expect.py
> Only in /usr/ports/devel/ice/files: patch-scripts-IceGridAdmin.py
> Only in /usr/ports/devel/ice/files: patch-scripts-TestUtil.py
> Only in /mnt/games: 0ad
> Only in /mnt/games: 0verkill
> Only in /mnt/games: 2048
> . . .
> Only in /mnt/games: zaz
> Only in /mnt/games: zhlt
> Only in /mnt/games: ztrack
>=20
> No obvious garbage or truncated names. Another mix of
> /mnt/ vs. /usr/ being the "missing" side.
>=20
> NOTE:
> So far I do not see an obvious reason to prefer any
> specific one of releng/13 vs. stable/13 vs. main
> at either end of the connection for the vintages that
> I happen to have in place for them.
>=20

Just to be sure, I shutdown the machine that /mnt was
referencing and moved the drive to the other machine,
directly connected.

# zpool import -f -N -R /mnt -t zroot zptmp
# zfs mount zptmp/usr/ports
# diff -r /usr/ports/ /mnt/usr/ports/ | more
#

So: The diff -r works for this context. The remote
status is somehow involved in producing the type of
problem.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EA3C446E-7076-49BB-8FFE-123841673DA1>