Date: Thu, 25 Jan 2018 21:16:56 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: FreeBSD CURRENT <freebsd-current@freebsd.org> Cc: Warner Losh <imp@bsdimp.com>, Mark Johnston <markj@freebsd.org>, Michael Tuexen <tuexen@freebsd.org>, Ed Maste <emaste@freebsd.org> Subject: Re: r327359: cylinder checksum failed: cg0, cgp: 0x4515d2a3 != bp: 0xd9fba319 Dec 30 23:29:24 <0.2> Message-ID: <20180125211723.6e65329f@thor.intern.walstatt.dynvpn.de> In-Reply-To: <CANCZdfqsYr2xfM=Wjhbb0XixrrCnYKfFxF24Zm6JgWSk4uC9ew@mail.gmail.com> References: <20171231004137.4f9ad496@thor.intern.walstatt.dynvpn.de> <CANCZdfoMdgCrAAXadc-G6v1r0wA-qv=Ms_XKYPd7cFqSc5%2B9GQ@mail.gmail.com> <23651B78-E31C-4BDD-BCA3-408B8F907884@freebsd.org> <20180108153356.GA2412@raichu> <CANCZdfqsYr2xfM=Wjhbb0XixrrCnYKfFxF24Zm6JgWSk4uC9ew@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] Am Mon, 8 Jan 2018 09:12:16 -0700 Warner Losh <imp@bsdimp.com> schrieb: > On Jan 8, 2018 8:34 AM, "Mark Johnston" <markj@freebsd.org> wrote: > > On Thu, Jan 04, 2018 at 09:10:37AM +0100, Michael Tuexen wrote: > > > On 31. Dec 2017, at 02:45, Warner Losh <imp@bsdimp.com> wrote: > > > > > > On Sat, Dec 30, 2017 at 4:41 PM, O. Hartmann <ohartmann@walstatt.org> > wrote: > > > > > >> On most recent CURRENT I face the error shwon below on /tmp filesystem > > >> (UFS2) residing > > >> on a Samsung 850 Pro SSD: > > >> > > >> UFS /dev/gpt/tmp (/tmp) cylinder checksum failed: cg 0, cgp: > 0x4515d2a3 != > > >> bp: 0xd9fba319 > > >> handle_workitem_freefile: got error 5 while accessing filesystem > > >> UFS /dev/gpt/tmp (/tmp) cylinder checksum failed: cg 0, cgp: 0x4515d2a3 > > >> != bp: 0xd9fba319 > > >> handle_workitem_freefile: got error 5 while accessing filesystem > > >> UFS /dev/gpt/tmp (/tmp) cylinder checksum failed: cg 0, cgp: 0x4515d2a3 > > >> != bp: 0xd9fba319 > > >> handle_workitem_freefile: got error 5 while accessing filesystem > > >> UFS /dev/gpt/tmp (/tmp) cylinder checksum failed: cg 0, cgp: 0x4515d2a3 > > >> != bp: 0xd9fba319 > > >> handle_workitem_freefile: got error 5 while accessing filesystem > > >> UFS /dev/gpt/tmp (/tmp) cylinder checksum failed: cg 0, cgp: 0x4515d2a3 > > >> != bp: 0xd9fba319 > > >> handle_workitem_freefile: got error 5 while accessing filesystem > > >> > > >> I've already formatted the /tmp filesystem, but obviously without any > > >> success. > > >> > > >> Since I face such strange errors also on NanoBSD images dd'ed to SD > cards, > > >> I guess there > > >> is something fishy ... > > > > > > > > > It indicates a problem. We've seen these 'corruptions' on data in > motion at > > > work, but I hacked fsck to report checksum mismatches (it silently > corrects > > > them today) and we've not seen any mismatch when we unmount and fsck the > > > filesystem. > > Not sure this helps: But we have seen this also after system panics > > when having soft update journaling enabled. Having soft update journaling > > disabled, we do not observed this after several panics. > > Just to be clear: The panics are not related to this issue, > > but to other network development we do. > > I saw the same issue this morning on a mirrored root filesystem after my > workstation came up following a power failure. fsck recovered using the > journal, and I subsequently saw a number of these checksum failures. > Upon shutdown, I saw the same handle_workitem_freefile errors as above. > I then ran a full fsck from single-user mode, which didn't turn up any > inconsistencies, and after that the checksum failure errors disappeared, > presumably because fsck fixed them. > > > Yes. Fsck automatically fixes issues like that. It does it silently. I have > patched to make it noisy, and the dozen cases I saw the errors, fsck was > silent with my whiny patches. I can put them up for review if people want... > > Warner within the past couple of weeks - or since the first occurence of these strange reports, I have had mysterious crashes: when installing FreeBSD even the proper (recommended) way, the box suddenly crashes out of the blue. The symptoms are always the same and the result is also always the same: the box is unusable, the boot process is stuck at BTX halted with a list of dumped CPU registers (I guess it is the CPU registers) and the filesystem is corrupt. I have had this strange problem on several hosts with SSDs - I reported end November/beginning of December 2017 of those crashes. On on machine I refomated the SSD and did a playback from ab 'dump'-backup - since then those crashes went away. The box now in question is the last of them not being traeted that way. it seems, there is somewhere/somehow a minefield hidden and I have no clue what it could be :-( I'm going to do the very same soon with the SSD of the remaining box - dump and restore. I just wanted to note this for the record. The crash happend with FreeBSD 12.0-CURRENT #14 r328409: Thu Jan 25 20:40:27 CET amd64. Kind regards, Oliver 2018 -- O. Hartmann Ich widerspreche der Nutzung oder Übermittlung meiner Daten für Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG). [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWmo7UwAKCRDS528fyFhY lJeQAf0XYgWzSL4pWHomsqM9lPYnUYhN3hHA+pKBjv/BdPWKVsn4vLOjADwmn/Xn f2nyB6LIabgQ9HnAThOCPeFXoEjyAf9Y5KQDS0n+6WC/TpL4HBSQYXjW9Kx2yTBu EgDJ9XIRiZiSaQ3+unW/q7LmNZaNL7sj340RIxNJ1E8HgDbCHoza =DIAb -----END PGP SIGNATURE-----help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180125211723.6e65329f>
