Date: Tue, 1 Feb 2022 09:58:38 -0800 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net> Cc: freebsd-arm@freebsd.org Subject: Re: Error detection for microSD-based swap, buildworld failures on pi3 Message-ID: <304EFD6E-2D92-42F6-AB13-285BE0B15363@yahoo.com> In-Reply-To: <20220201161808.GA73977@www.zefox.net> References: <20220129022255.GA59340@www.zefox.net> <6B822440-6F01-4578-803C-20A51DADF10C@yahoo.com> <20220130020546.GA63792@www.zefox.net> <1964F2B7-EC41-42C8-9C18-5E2B79EE0271@yahoo.com> <F4CAC6F9-B9E8-4BD3-BFA0-1706BE56A2AD@yahoo.com> <5B3DF910-23B1-4246-999E-0196E90269F2@yahoo.com> <20220131165333.GA69543@www.zefox.net> <9E0510D2-9FAC-4F01-89A3-E6D8C7C21FDA@yahoo.com> <20220131221405.GA70251@www.zefox.net> <14716537-6E22-44F5-B6AA-841E3EB2AD04@yahoo.com> <20220201161808.GA73977@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Feb-1, at 08:18, bob prohaska <fbsd@www.zefox.net> wrote: > [new subject, different emphasis, old problem] >=20 > On Mon, Jan 31, 2022 at 03:06:01PM -0800, Mark Millard wrote: >>=20 >> One thing that could fit the behavior is if small part(s) >> of the system c++ compiler (or libraires it uses) were >> corrupted on that specific media. In that case, nothing >> elsewhere would replicate the failures but a lot might >> work without using the corrupted part(s), making the >> failures not random.=20 >=20 > [spaced for emphasis] >=20 >> Checking on that is part of why >> I'd hoped to get a lldb report for a .sh/.cpp pair >> leading to failure on your RPi3* in question. >>=20 >=20 > If/when the stable/13 Pi3 finishes its -j1 single-user > build/install cycle I'll make a point of trying the=20 > .sh/.cpp test under lldb. =20 >=20 > For most of their operational history both troublesome Pi3 > systems have had some of their swap on microSD. If there > is no error detection at all for microSD-based storage > then undetected corruption of data from swap is a real > possibility. Getting a systematic error (SEGV) at a specific point in a compile across many attempts with various prior histories, reboots, etc. involved is not likely to be from somehow hitting the same bad page in the swap space each time. This variety has varying -jN figures, which can lead to variations in which compile get an error first. But when it is a specific file that gets the failure, the detail seems repeatable. This is true of the .sh/.cpp pairs that fail reliably for you as well --especially given that they work for me, even without swap enabled: the 1 GiBytes of RAM is enough. (Swap required for running under lldb.) If the problem is a corruption, it would most likely be in some file in use by the compiler (possibly its own file): a file in the UFS file system. > I expected that storage errors would be > reported but maybe not, especially outside file systems. =20 Not likely to be a swap space issue. > Mechanical disks have some internal error detection and > report explictly when data can't be retrieved. As I think > back on it at least one flash device (a USB thumb drive) > failed silently, no reported errors but also no-write. > That was on a filesystem, so the OS noticed and so did I. Storage media can not generally detect if the data being written is already corrupt before it is written. > Is there any error detection/correction employed by the > virtual memory system as it reads and writes mass storage?=20 >=20 No separate one that I know of. But getting a systematic error at a systematic point across a wide variety of histories is not likely to be a swap I/O problem. If I understand correctly, the normal recommendation is to avoid using microsd card for a heavily used swap space, reliability over time being an issue. (But I'm no expert.) For reference, as I understand the following is a repeatable part of the failure notice for compiling contrib/googletest/googletest/src/gtest-all.cc : 1. = /usr/obj/usr/src/arm64.aarch64/tmp/usr/include/private/gtest/internal/gtes= t-type-util.h:806:37: current parser token '{' 2. = /usr/obj/usr/src/arm64.aarch64/tmp/usr/include/private/gtest/internal/gtes= t-type-util.h:58:1: parsing namespace 'testing' But we know that when I make a copy of the .cpp/.sh pair and execute the .sh the compile works fine. This is evidence against the source code being compiled being corrupt. =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?304EFD6E-2D92-42F6-AB13-285BE0B15363>