Date: Mon, 13 Nov 2023 00:35:40 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 275047] Cross-device link error when copying files over NFSv4.2 on a stacked filesystem export Message-ID: <bug-275047-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275047 Bug ID: 275047 Summary: Cross-device link error when copying files over NFSv4.2 on a stacked filesystem export Product: Base System Version: 14.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: freebsd@kumba.dev I think I've ran into a bug w/ `copy_file_range` over NFSv4.2, and I *think= * it may be a corner case alluded to by @rmacklem in a ~2019 commit (D20584), wh= ere he says "One thing I am not sure of is if it will work for stacked file systems." Some quick background on the setup in use on my network. - NAS server runs FreeBSD 13.2-RELEASE-p5 - Hosts key FreeBSD elements, like /usr/ports and /usr/src over NFS4 * /usr/src and /usr/ports are on their own ZFS datasets * Exported by direct configuration in /etc/exports - Kernel compile basedir is on a separate ZFS dataset * Mounted under /usr/src/sys/amd64/compile This is what the ZFS datasets for /usr/src look like on the NAS machine: > NAME USED AVAIL REFER MOUNTPOINT > nas/freebsd/src/13.2-REL 853M 3.32T 853M /nas/freebsd/= src/13.2-REL > nas/freebsd/src/14.0-RC4 916M 3.32T 916M /nas/freebsd/= src/14.0-RC4 > nas/freebsd/src/compile-13.2 2.72G 3.32T 2.72G /nas/freebsd/= src/13.2-REL/sys/amd64/compile > nas/freebsd/src/compile-14.0 1.23G 3.32T 1.23G /nas/freebsd/= src/14.0-RC4/sys/amd64/compile Each FreeBSD machine on my network mounts '/usr/src' off this NAS server ov= er NFSv4.2 and builds its own kernel when needed, usually when there are secur= ity or errata updates out. I also don't build kernels the usual way; e.g., using 'make CONF=3Dfoo buildkernel' and the like. Rather, I follow these steps: - cd /usr/src/sys/amd64/conf - config <path to kernel config> - cd ../compile/<kernel build dir> - make <clean|cleandepend|depend|<build>|install> * <build> means 'no make target' here to actually build the kernel Starting around the release of 13.2-RELEASE, I started noticing that some o= f my systems would error out of the kernel compile phase when attempting to do b= asic 'cp' operations, mostly on wireless firmware *.bin files. The failure mess= age was "Cross-device link". I thought this was an issue in how I have this bu= ild environment setup, and those firmware files were not needed by the specific machines, so I added 'nodevice' lines in my kernel config for them, then restarted the builds, repeating until they completed w/o error. I don't re= call these errors happening when everything was on 13.1-RELEASE, but I can't con= firm that. However, I recently acquired a cheap laptop, an Asus VivoBook 15, that I wa= nted to try and run FreeBSD on. While it was building a custom kernel, it start= ed failing on the same wireless firmware blobs, specifically in the rtw88fw folders, when trying to copy the *.bin files from the /usr/src tree locatio= n to the kernel build directory. That's the actual wireless device in the lapto= p, so I need those build operations to actually succeed this time. Since '/usr/src' and the kernel build dirs are essentially "stacked" on the= NAS server via ZFS, and being exported over NFSv4.2, I think I have triggered t= his obtuse corner case. I can also trigger the failure by manually attempting = to copy the files using FreeBSD's 'cp' tool. However, GNU Coreutils (v9.1) 'g= cp' does NOT trigger the error. Additionally, if I remount '/usr/src' on the b= uild machine as NFSv4.1, then I can use FreeBSD's 'cp' to successfully copy the file. When I triggered the failure case using 'truss', that's when I saw that the= key difference was 'copy_file_range failing witn ERR#18: > 7629: fstatat(AT_FDCWD,"/usr/src/sys/contrib/dev/rtw88fw/rtw8821c_fw.bin"= ,{ mode=3D-rw-r--r-- ,inode=3D99809,size=3D138984,blksize=3D4096 },0x0) =3D= 0 (0x0) > 7629: fstatat(AT_FDCWD,"rtw8821c_fw_nfs42.bin",0x7fffffffd430,0x0) ERR#2 = 'No such file or directory' > 7629: openat(AT_FDCWD,"/usr/src/sys/contrib/dev/rtw88fw/rtw8821c_fw.bin",= O_RDONLY,00) =3D 3 (0x3) > 7629: openat(AT_FDCWD,"rtw8821c_fw_nfs42.bin",O_WRONLY|O_CREAT|O_TRUNC,01= 00644) =3D 4 (0x4) > 7629: copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) ERR#18 'Cro= ss-device link' > 7629: write(2,"cp: ",4) =3D 4 (0x4) If I use 'truss' on FreeBSD's 'cp' while mounted over NFSv4.1 OR use GNU Coreutils 'gcp' while mounted over NFSv4.2, 'copy_file_range' is not utiliz= ed, so the copy command succeeds. I am unsure where the actual fault lies at the moment. The laptop is curre= ntly running FreeBSD 14.0-RC4-p1, and will be updated to 14.0-RELEASE when that = is available via freebsd-update. The NAS server will probably remain on 13.2-RELEASE-P5 for a few days after 14.0 is available. Unsure if it's a b= ug in 13.2's NFSv4.2 bits, or 14.0-RC4's implementation of copy_file_range whe= n on an NFSv4.2 mountpoint, and the mounts are stacked on the server. I will attach text files of the full truss output for the failing and successful cases. If additional details are needed, please let me know. I believe this bug should be assigned to @rmacklem, as I am somewhat certain = that it is NFSv4.2-related. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275047-227>