Date: Thu, 22 Feb 2018 08:37:07 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: Gary Jennejohn <gljennjohn@gmail.com> Cc: "Chris H" <bsd-lists@BSDforge.com>, "FreeBSD Current" <freebsd-current@freebsd.org>, Warner Losh <imp@bsdimp.com>, Ed Maste <emaste@freebsd.org>, Michael Tuexen <tuexen@freebsd.org>, Mark Johnston <markj@freebsd.org> Subject: Re: kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d Message-ID: <20180222083707.73ae3036@freyja.zeit4.iv.bundesimmobilien.de> In-Reply-To: <20180220123953.5e987691@ernst.home> References: <f7ffa21203887e43e2acd399cf93871d@udns.ultimatedns.net> <20180220123953.5e987691@ernst.home>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 20 Feb 2018 12:39:53 +0100 Gary Jennejohn <gljennjohn@gmail.com> wrote: > On Mon, 19 Feb 2018 14:18:15 -0800 > "Chris H" <bsd-lists@BSDforge.com> wrote: > > > I'm seeing a number of messages like the following: > > kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d > > > > and was wondering if it's anything to be concerned with, or whether > > fsck(8) is fixing them. > > This began to happen when the power went out on a new install: > > FreeBSD dns0 12.0-CURRENT FreeBSD 12.0-CURRENT #0: Wed Dec 13 06:07:59 PST > > 2017 root@dns0:/usr/obj/usr/src/amd64.amd64/sys/DNS0 amd64 > > which hadn't yet been hooked up to the UPS. > > I performed an fsck in single user mode upon power-up. Which ended with the > > mount points being masked CLEAN. I was asked if I wanted to use the JOURNAL. > > I answered Y. > > FWIW the systems are UFS2 (ffs) have gpart labels, and were newfs'd thusly: > > newfs -U -j > > > > Thank you for all your time, and consideration. > > > > fsck fixes these errors only when the user does NOT use the journal. > You should re-do the fsck. > When first these mysterious errors occured on several boxes running CURRENT, that was in December 2017 if I'm right, I also whitnessed mysterious and frequent crashes on several SSD driven machines, where this error described above occured. While the error vanished somehow in the meanwhile while CURRENT proceeds, the crashes continued - on two boxes, I dumped restore the OS on the system's SSD by reformatting the SSD from sratch (UFS2, soft update+ journaling). On those boxes the mysterious crashes vanished since then! On box left so far, my workstation. And this box continous to crash now and started crashing today again while compiling world/kernel. The fun-part is: even after a clean shutdown, where I can not detect any filesystem inconsistencies and rebooting and, again: no reported inconsistencies on the console/messages/logs, the box crashes spontanously. Now (today) I could trigger the reboot by starting "make -j4 buildworld buildkernel" and after showing the initial compiler statements/build framework statements, the box went to Nirwana. A well known phenomenon right now. I checked now the consistency of the filesystem, here is the result of the /usr/obj tree, which is a dedicated GPT partition (label: /dev/gpt/usr.obj): [...] root@box1:~ # fsck -fy /dev/gpt/usr.obj ** /dev/gpt/usr.obj ** Last Mounted on /usr/obj ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames UNALLOCATED I=515 OWNER=root MODE=0 SIZE=0 MTIME=Feb 22 07:25 2018 NAME=/usr/src/amd64.amd64/sys/BOX1/config.c.new UNEXPECTED SOFT UPDATE INCONSISTENCY REMOVE? yes DIRECTORY CORRUPTED I=169691 OWNER=root MODE=40775 SIZE=1536 MTIME=Feb 22 05:16 2018 DIR=/usr/src/amd64.amd64/sys/BOX1/modules/usr/src/sys/modules/nfsd UNEXPECTED SOFT UPDATE INCONSISTENCY SALVAGE? yes ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? yes SUMMARY INFORMATION BAD SALVAGE? yes BLK(S) MISSING IN BIT MAPS SALVAGE? yes 126922 files, 848197 used, 1178482 free (89210 frags, 136159 blocks, 4.4% fragmentation) ***** FILE SYSTEM MARKED DIRTY ***** ***** FILE SYSTEM WAS MODIFIED ***** ***** PLEASE RERUN FSCK ***** [...] When doing a installworld, I pre-emptively perform in single user mode before mounting the partitions a "fsck -yf" two times. In most cases, the filesystem are reported clean, but sometimes especially those under high I/O (/usr/src and mostly /usr/obj on this build machine) there are reports of corruption. As I reported, the very same behaviour occured on three boxes simultanously and I got rid of it by completely reformatting the SSDs (never had issues so far with HDD based boxes!). I hope I can refurbish this weekend the remaining box and I could report, if desired, whether this box returns to a healthy state as the others or if my observation was a simple coincidence of issues ... Thanks for the patience, Oliver
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180222083707.73ae3036>