From owner-freebsd-current@freebsd.org Thu Feb 22 07:37:38 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AFCEF0DB56 for ; Thu, 22 Feb 2018 07:37:38 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 721E67281F; Thu, 22 Feb 2018 07:37:36 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from freyja.zeit4.iv.bundesimmobilien.de ([87.138.105.249]) by mail.gmx.com (mrgmx001 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MdKDb-1f614i2L8F-00IS7p; Thu, 22 Feb 2018 08:37:15 +0100 Date: Thu, 22 Feb 2018 08:37:07 +0100 From: "O. Hartmann" To: Gary Jennejohn Cc: "Chris H" , "FreeBSD Current" , Warner Losh , Ed Maste , Michael Tuexen , Mark Johnston Subject: Re: kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d Message-ID: <20180222083707.73ae3036@freyja.zeit4.iv.bundesimmobilien.de> In-Reply-To: <20180220123953.5e987691@ernst.home> References: <20180220123953.5e987691@ernst.home> Organization: Walstatt MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:mBhoc3LXizpjMdyPgwwIo79PCO47vV2eZG5DJhx1eQEgRJb+WAa YdBNpZg7CTbnBrHj46kuRTQmVqzjdpAK+U5BU7bFHM8U8uB7z2SVmvTFPVrWb/sNfOzbAzV k6e9lsFroi58n6KyR4Q+WrqfcoHcqaG0qdNGQ25naOuS3112QonADoMAD6RJRr9jJrUT3Pw zIjY2jaXtxLnJiEyVmgpA== X-UI-Out-Filterresults: notjunk:1;V01:K0:xjO9ULfIYqY=:pvamZBuK/nBCxPbVvMd9PO BDwPbVOcRwwZ7ergKabOgkTVpPtN81dqp1UpjgUxK/jflSu9POML+QKD1hpqFB6EubG/e65fB Cy0hUlugVmfN57P9HW7zDbWycsaK0Ez42tIzyswD4pQg5HzVttGnG07sGiTZaZsIqQiZJWCZ+ jNdNWqXVGfhOz1JOLPQ7eZ11uWDJgeAp7P57vZUcta3lZMAo9XEZJtMWfjTMDY6Xin0MML99v GtggKQhOIMycEbh195Wpq+xDl0OTQIkH5+E+nlTXf+M2X3xtuF29OSo1FPnKlr95eL9pkGUnT 063p8QtGl9EAHr9rGBNHp+8L4udViyER8Gr3fyP9Ke2y7oXX8KQYF8aZsQIPJonLR6IiC0Wf2 rRMCIm8vJ/1ux+yyCN4zir/qpirMadUFc3diSXCERADp3bCRPanzzozGEMsnOiVrZBB/a26hm uQez3A3asosI4bjUUDLjXBoP4BQPBzs0GKOsqLWmdAw7HYToeuHRWvFKCpV0RBKKq5WiTryqw +BhBdbJjso7NnoEE9vpfrz9kCr7/Rl1KcHrHx8KzTO1+4jZPX0KO5JuLERvquSJh09hjYKFxs GozQivclEINBt5/vcXE4GTCRlvIMUaJj+BN9Fz9z78EIA+TprAT5i0fy99Q2LqfMltW1Dxc+3 xUTv0WsFy8IO4kLBICiSMzBzuWv+aBrz66D7AXgswX3wd+1INP4U8bikRnw5QlHawtIv11cj/ +DfvitukD9q0tSqFDtPYS6tc9dlQ4C+PsdyqSEA1FGp40T2YeGfP2pPjmiWIoJB1phODXXv/o 4ZqKYOpo/dW4wXJkxDDaFWnbGaFBxHh7MButDU2GfaV4hzplCo= X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Feb 2018 07:37:38 -0000 On Tue, 20 Feb 2018 12:39:53 +0100 Gary Jennejohn wrote: > On Mon, 19 Feb 2018 14:18:15 -0800 > "Chris H" wrote: > > > I'm seeing a number of messages like the following: > > kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d > > > > and was wondering if it's anything to be concerned with, or whether > > fsck(8) is fixing them. > > This began to happen when the power went out on a new install: > > FreeBSD dns0 12.0-CURRENT FreeBSD 12.0-CURRENT #0: Wed Dec 13 06:07:59 PST > > 2017 root@dns0:/usr/obj/usr/src/amd64.amd64/sys/DNS0 amd64 > > which hadn't yet been hooked up to the UPS. > > I performed an fsck in single user mode upon power-up. Which ended with the > > mount points being masked CLEAN. I was asked if I wanted to use the JOURNAL. > > I answered Y. > > FWIW the systems are UFS2 (ffs) have gpart labels, and were newfs'd thusly: > > newfs -U -j > > > > Thank you for all your time, and consideration. > > > > fsck fixes these errors only when the user does NOT use the journal. > You should re-do the fsck. > When first these mysterious errors occured on several boxes running CURRENT, that was in December 2017 if I'm right, I also whitnessed mysterious and frequent crashes on several SSD driven machines, where this error described above occured. While the error vanished somehow in the meanwhile while CURRENT proceeds, the crashes continued - on two boxes, I dumped restore the OS on the system's SSD by reformatting the SSD from sratch (UFS2, soft update+ journaling). On those boxes the mysterious crashes vanished since then! On box left so far, my workstation. And this box continous to crash now and started crashing today again while compiling world/kernel. The fun-part is: even after a clean shutdown, where I can not detect any filesystem inconsistencies and rebooting and, again: no reported inconsistencies on the console/messages/logs, the box crashes spontanously. Now (today) I could trigger the reboot by starting "make -j4 buildworld buildkernel" and after showing the initial compiler statements/build framework statements, the box went to Nirwana. A well known phenomenon right now. I checked now the consistency of the filesystem, here is the result of the /usr/obj tree, which is a dedicated GPT partition (label: /dev/gpt/usr.obj): [...] root@box1:~ # fsck -fy /dev/gpt/usr.obj ** /dev/gpt/usr.obj ** Last Mounted on /usr/obj ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames UNALLOCATED I=515 OWNER=root MODE=0 SIZE=0 MTIME=Feb 22 07:25 2018 NAME=/usr/src/amd64.amd64/sys/BOX1/config.c.new UNEXPECTED SOFT UPDATE INCONSISTENCY REMOVE? yes DIRECTORY CORRUPTED I=169691 OWNER=root MODE=40775 SIZE=1536 MTIME=Feb 22 05:16 2018 DIR=/usr/src/amd64.amd64/sys/BOX1/modules/usr/src/sys/modules/nfsd UNEXPECTED SOFT UPDATE INCONSISTENCY SALVAGE? yes ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? yes SUMMARY INFORMATION BAD SALVAGE? yes BLK(S) MISSING IN BIT MAPS SALVAGE? yes 126922 files, 848197 used, 1178482 free (89210 frags, 136159 blocks, 4.4% fragmentation) ***** FILE SYSTEM MARKED DIRTY ***** ***** FILE SYSTEM WAS MODIFIED ***** ***** PLEASE RERUN FSCK ***** [...] When doing a installworld, I pre-emptively perform in single user mode before mounting the partitions a "fsck -yf" two times. In most cases, the filesystem are reported clean, but sometimes especially those under high I/O (/usr/src and mostly /usr/obj on this build machine) there are reports of corruption. As I reported, the very same behaviour occured on three boxes simultanously and I got rid of it by completely reformatting the SSDs (never had issues so far with HDD based boxes!). I hope I can refurbish this weekend the remaining box and I could report, if desired, whether this box returns to a healthy state as the others or if my observation was a simple coincidence of issues ... Thanks for the patience, Oliver