Date: Tue, 29 Nov 2022 13:41:47 -0800 From: Mark Millard <marklmi@yahoo.com> To: "mckusick@freebsd.org" <mckusick@FreeBSD.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Question(s) related to "cylinder checksum failed" during "cp" of huge files (RPi4B EDK2 UEFI/ACPI context) Message-ID: <438C7F0B-DFB5-4BB3-BEC5-63FD6FFA7879@yahoo.com> References: <438C7F0B-DFB5-4BB3-BEC5-63FD6FFA7879.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
The later included question(s) are only intended to gather background information that might help gather evidence for why certain failure reports are generated in a particular (somewhat odd) RPi4B context. I'm not suggesting any problems with the cylinder checks themselves. If I do a: # cp -aRx larger-than-RAM.tar = larger-than-RAM.tar.copied_via_RPi4B_C0T_Rev_1.5 in a UFS context with a file like, for example, -rw-r--r-- 1 root wheel 27707039744 Nov 28 19:11:23 2022 = larger-than-RAM.tar I eventually get messages like the following before the system completely fails during the attempted copy (I've been testing only 13.1-STABLE so far): UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 92, cgp: 0x0 !=3D = bp: 0x43bc4552 . . . UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 107, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 123, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 155, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 219, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 347, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 236, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 94, cgp: 0x0 !=3D = bp: 0x43bc4552 . . . UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 99, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 100, cgp: 0x0 !=3D = bp: 0x8b9e9592 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 101, cgp: 0x0 !=3D = bp: 0x43bc4552 . . . UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 293, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 295, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 296, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 298, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 302, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 310, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 326, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 358, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 55, cgp: 0xffffffff = !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 183, cgp: 0x0 !=3D = bp: 0x43bc4552 UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 72, cgp: 0xffffffff = !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 297, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 298, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 299, cgp: = 0xffffffff !=3D bp: 0x544dd2da UFS /dev/ufs/rootfs (/) cylinder checksum failed: cg 300, cgp: = 0xffffffff !=3D bp: 0x544dd2da . . . (That such messages happen, may make for better validation that media I/O is well supported in the kernel for specific contexts than has historically been the case. This may be an example of that.) (Note: My use of "-aRx" is habitual, not special to causing the messages.) I've not seen problems during basic normal operation but that might just be a context that might take a long time to have a significant probability of observing such a failure. The "cp" activity above never has completed in this recent testing. Are the above messages likely to be based on the cylinder validation updates of fairly recent times? Or are they from code for which the checks would have been involved long before such changes as far as such "cp" activity goes? (I've been guessing that the tests involved are fairly new.) Is there anything about, say, when the checks happen vs. not relative to other activity happens during a "cp" that might be important to gathering or reporting evidence? (There might be other questions that I should ask but did not manage to think of.) It seems unlikely that I'll get to the point of being able to point at specific source code that has problems. But it would be nice to have presented more/better evidence if I can gather some. Notes: The oddity with the context is using EDK2 UEFI/ACPI instead of U-Boot/DeviceTree. What is reported here only happens with UEFI/ACPI. An apparently separate problem happens for U-Boot and can happen after the above kind of messages for UEFI/ACPI if the system manages to run long enough. The only media is a USB3 SSD in my testing. I first got such messages via use of: FreeBSD-13.1-STABLE-arm64-aarch64-RPI-20221123-b51ee7ac252c-253133.img but I also got such via my somewhat older builds that have some past experimental patches for ACPI and DMA range handling that I'd been using for some time, but mostly in a ZFS context for UEFI/ACPI as far as on-going use was concerned. (I've reverted to U-Boot for that on-going-use ZFS environment that I do not want corrupted.) What started this was getting access to a 8 GiByte RPi4B that no longer has the DMA size restrictions that the original parts had (a "C0T" part instead of a "B0T" at the end of the part identification printed on the SOC top). I was testing and comparing vs. old "B0T" parts, repeating old experiments that had originally shown the ACPI support did not work for the "B0T" parts. I'd not run such tests in some time and all the failures seem to be new types of evidence. My normal builds had patches that, prior to this, I thought were handling the "B0T" DMA range limitation associated with XHCI, as presented via ACPI. Part of what I had intended was to see if the behavior still looked good for the new "C0T" RPi4B and if things were well behaved without the patches (for official FreeBSD builds). Instead I found failures spanning into the old type of tests done on a "B0T" RPi4B. I had not thought to rerun the tests as the cylinder related tests were being added. Too bad. This possibly could have been noticed earlier. So far, I've not seen problems via U-Boot/DeviceTree. So that is what I now use in every context I do not want corrupted (avoiding likely needing regeneration from scratch). (My ZFS use is for bectl use, not redundancy.) =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?438C7F0B-DFB5-4BB3-BEC5-63FD6FFA7879>