Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Nov 2023 00:35:40 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 275047] Cross-device link error when copying files over NFSv4.2 on a stacked filesystem export
Message-ID:  <bug-275047-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275047

            Bug ID: 275047
           Summary: Cross-device link error when copying files over
                    NFSv4.2 on a stacked filesystem export
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: freebsd@kumba.dev

I think I've ran into a bug w/ `copy_file_range` over NFSv4.2, and I *think=
* it
may be a corner case alluded to by @rmacklem in a ~2019 commit (D20584), wh=
ere
he says "One thing I am not sure of is if it will work for stacked file
systems."

Some quick background on the setup in use on my network.
  - NAS server runs FreeBSD 13.2-RELEASE-p5
  - Hosts key FreeBSD elements, like /usr/ports and /usr/src over NFS4
    * /usr/src and /usr/ports are on their own ZFS datasets
    * Exported by direct configuration in /etc/exports
  - Kernel compile basedir is on a separate ZFS dataset
    * Mounted under /usr/src/sys/amd64/compile

This is what the ZFS datasets for /usr/src look like on the NAS machine:
> NAME                                 USED  AVAIL     REFER  MOUNTPOINT
> nas/freebsd/src/13.2-REL             853M  3.32T      853M  /nas/freebsd/=
src/13.2-REL
> nas/freebsd/src/14.0-RC4             916M  3.32T      916M  /nas/freebsd/=
src/14.0-RC4
> nas/freebsd/src/compile-13.2        2.72G  3.32T     2.72G  /nas/freebsd/=
src/13.2-REL/sys/amd64/compile
> nas/freebsd/src/compile-14.0        1.23G  3.32T     1.23G  /nas/freebsd/=
src/14.0-RC4/sys/amd64/compile

Each FreeBSD machine on my network mounts '/usr/src' off this NAS server ov=
er
NFSv4.2 and builds its own kernel when needed, usually when there are secur=
ity
or errata updates out.

I also don't build kernels the usual way; e.g., using 'make CONF=3Dfoo
buildkernel' and the like.  Rather, I follow these steps:
  - cd /usr/src/sys/amd64/conf
  - config <path to kernel config>
  - cd ../compile/<kernel build dir>
  - make <clean|cleandepend|depend|<build>|install>
    * <build> means 'no make target' here to actually build the kernel

Starting around the release of 13.2-RELEASE, I started noticing that some o=
f my
systems would error out of the kernel compile phase when attempting to do b=
asic
'cp' operations, mostly on wireless firmware *.bin files.  The failure mess=
age
was "Cross-device link".  I thought this was an issue in how I have this bu=
ild
environment setup, and those firmware files were not needed by the specific
machines, so I added 'nodevice' lines in my kernel config for them, then
restarted the builds, repeating until they completed w/o error.  I don't re=
call
these errors happening when everything was on 13.1-RELEASE, but I can't con=
firm
that.

However, I recently acquired a cheap laptop, an Asus VivoBook 15, that I wa=
nted
to try and run FreeBSD on.  While it was building a custom kernel, it start=
ed
failing on the same wireless firmware blobs, specifically in the rtw88fw
folders, when trying to copy the *.bin files from the /usr/src tree locatio=
n to
the kernel build directory.  That's the actual wireless device in the lapto=
p,
so I need those build operations to actually succeed this time.

Since '/usr/src' and the kernel build dirs are essentially "stacked" on the=
 NAS
server via ZFS, and being exported over NFSv4.2, I think I have triggered t=
his
obtuse corner case.  I can also trigger the failure by manually attempting =
to
copy the files using FreeBSD's 'cp' tool.  However, GNU Coreutils (v9.1) 'g=
cp'
does NOT trigger the error.  Additionally, if I remount '/usr/src' on the b=
uild
machine as NFSv4.1, then I can use FreeBSD's 'cp' to successfully copy the
file.

When I triggered the failure case using 'truss', that's when I saw that the=
 key
difference was 'copy_file_range failing witn ERR#18:

> 7629: fstatat(AT_FDCWD,"/usr/src/sys/contrib/dev/rtw88fw/rtw8821c_fw.bin"=
,{ mode=3D-rw-r--r-- ,inode=3D99809,size=3D138984,blksize=3D4096 },0x0) =3D=
 0 (0x0)
> 7629: fstatat(AT_FDCWD,"rtw8821c_fw_nfs42.bin",0x7fffffffd430,0x0) ERR#2 =
'No such file or directory'
> 7629: openat(AT_FDCWD,"/usr/src/sys/contrib/dev/rtw88fw/rtw8821c_fw.bin",=
O_RDONLY,00) =3D 3 (0x3)
> 7629: openat(AT_FDCWD,"rtw8821c_fw_nfs42.bin",O_WRONLY|O_CREAT|O_TRUNC,01=
00644) =3D 4 (0x4)
> 7629: copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) ERR#18 'Cro=
ss-device link'
> 7629: write(2,"cp: ",4)                     =3D 4 (0x4)

If I use 'truss' on FreeBSD's 'cp' while mounted over NFSv4.1 OR use GNU
Coreutils 'gcp' while mounted over NFSv4.2, 'copy_file_range' is not utiliz=
ed,
so the copy command succeeds.

I am unsure where the actual fault lies at the moment.  The laptop is curre=
ntly
running FreeBSD 14.0-RC4-p1, and will be updated to 14.0-RELEASE when that =
is
available via freebsd-update.  The NAS server will probably remain on
13.2-RELEASE-P5 for a few days after 14.0 is available.  Unsure if it's a b=
ug
in 13.2's NFSv4.2 bits, or 14.0-RC4's implementation of copy_file_range whe=
n on
an NFSv4.2 mountpoint, and the mounts are stacked on the server.

I will attach text files of the full truss output for the failing and
successful cases.  If additional details are needed, please let me know.  I
believe this bug should be assigned to @rmacklem, as I am somewhat certain =
that
it is NFSv4.2-related.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275047-227>