From owner-freebsd-current@freebsd.org Thu Feb 22 08:26:23 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F4E3F12E97 for ; Thu, 22 Feb 2018 08:26:23 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: from mail-wm0-x22d.google.com (mail-wm0-x22d.google.com [IPv6:2a00:1450:400c:c09::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D6E375123; Thu, 22 Feb 2018 08:26:23 +0000 (UTC) (envelope-from gljennjohn@gmail.com) Received: by mail-wm0-x22d.google.com with SMTP id m207so1549208wma.2; Thu, 22 Feb 2018 00:26:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; bh=CvZYg5uwbcWi/Hs4Z9Fdyj/05MMadhHCodFtcfVN1go=; b=rGpU2yKI5BXzu0UiwsODN42bIKpyOZWRtpoGYkYXA+7LRfb4vVPGaVK210eVAwETx3 3GFmig9SlupNzX3zku5Yd3es0rrWUx+1H3KO3kSHrwLsyTnRR91AQENu0NBBSagBJuR8 MG9ekjvW8nPE3ERr987K5ZGDmpkJXBJoy5sfXOerCxMKMGp29veWmeL/9ehBIc5I3zLr DpM9eaN0LupBMsn+irerb8kI2gxahoZU6pe3WuR4vz8l12IuswOYBfbRijewLxa61prU 561LDc6nRtx9KPwMukiuWpSMZJzRwgNhVb/pNFGrjLEQ9h1PeMc/Mxpa8LjHkTM3LfDc OqwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:mime-version:content-transfer-encoding; bh=CvZYg5uwbcWi/Hs4Z9Fdyj/05MMadhHCodFtcfVN1go=; b=SsKZWB7VnwfBRBr8SNISEq/PuzPd35LI8VfwwdhCGGpcxcejDUPTHyYsUp9okZ458U 7Cf7GoJFf53asslq68nfcK/deBb26Y6+QdE45FPGaf26Bjlso/D2p4xZA4g9d75ZHuw0 uFO+q85BB2XV8vxCY2ax8l4Mxd27lsPlsOSQSYNmomdsS84e7NR4Wuli/CePxlBqPhCW eBfy3Giblr+9ktduaKrVNXl8IJgi6bD7gRMpQECCfDs1mDI0IHfRXAEJ2Fzm9MOa/EZD 1aAraW3OUV/7lRFFRvlFgcbRNGlTE+8R6IyFXhCoBwnl2vTKd8+oze846dSDpLJO69nS 0mjw== X-Gm-Message-State: APf1xPB7PbOidg8f/rAnmPyrxddCqv9js6/Ci08zHzG31AjFUyKxD0h9 BQ71dHGOCnXe+UswlAUR5BGdNCYy X-Google-Smtp-Source: AH8x226Qs2cBVnSXcJWv6eB61pj6/5JkIjgWDRFruirawv8m29y8SPK0f9txiCswWnRpzTmmTF+mXA== X-Received: by 10.28.216.82 with SMTP id p79mr2636669wmg.8.1519287982068; Thu, 22 Feb 2018 00:26:22 -0800 (PST) Received: from ernst.home (p5B02324A.dip0.t-ipconnect.de. [91.2.50.74]) by smtp.gmail.com with ESMTPSA id t141sm41258633wmd.34.2018.02.22.00.26.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 22 Feb 2018 00:26:21 -0800 (PST) Date: Thu, 22 Feb 2018 09:26:20 +0100 From: Gary Jennejohn To: "O. Hartmann" Cc: "Chris H" , "FreeBSD Current" , Warner Losh , Ed Maste , Michael Tuexen , Mark Johnston Subject: Re: kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d Message-ID: <20180222092620.7c327329@ernst.home> In-Reply-To: <20180222083707.73ae3036@freyja.zeit4.iv.bundesimmobilien.de> References: <20180220123953.5e987691@ernst.home> <20180222083707.73ae3036@freyja.zeit4.iv.bundesimmobilien.de> Reply-To: gljennjohn@gmail.com X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.31; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Feb 2018 08:26:23 -0000 On Thu, 22 Feb 2018 08:37:07 +0100 "O. Hartmann" wrote: > On Tue, 20 Feb 2018 12:39:53 +0100 > Gary Jennejohn wrote: > > > On Mon, 19 Feb 2018 14:18:15 -0800 > > "Chris H" wrote: > > > > > I'm seeing a number of messages like the following: > > > kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d > > > > > > and was wondering if it's anything to be concerned with, or whether > > > fsck(8) is fixing them. > > > This began to happen when the power went out on a new install: > > > FreeBSD dns0 12.0-CURRENT FreeBSD 12.0-CURRENT #0: Wed Dec 13 06:07:59 PST > > > 2017 root@dns0:/usr/obj/usr/src/amd64.amd64/sys/DNS0 amd64 > > > which hadn't yet been hooked up to the UPS. > > > I performed an fsck in single user mode upon power-up. Which ended with the > > > mount points being masked CLEAN. I was asked if I wanted to use the JOURNAL. > > > I answered Y. > > > FWIW the systems are UFS2 (ffs) have gpart labels, and were newfs'd thusly: > > > newfs -U -j > > > > > > Thank you for all your time, and consideration. > > > > > > > fsck fixes these errors only when the user does NOT use the journal. > > You should re-do the fsck. > > > > When first these mysterious errors occured on several boxes running CURRENT, > that was in December 2017 if I'm right, I also whitnessed mysterious and > frequent crashes on several SSD driven machines, where this error described > above occured. > > While the error vanished somehow in the meanwhile while CURRENT proceeds, the > crashes continued - on two boxes, I dumped restore the OS on the system's SSD > by reformatting the SSD from sratch (UFS2, soft update+ journaling). On those > boxes the mysterious crashes vanished since then! > > On box left so far, my workstation. And this box continous to crash now and > started crashing today again while compiling world/kernel. > > The fun-part is: even after a clean shutdown, where I can not detect any > filesystem inconsistencies and rebooting and, again: no reported > inconsistencies on the console/messages/logs, the box crashes spontanously. Now > (today) I could trigger the reboot by starting "make -j4 buildworld > buildkernel" and after showing the initial compiler statements/build framework > statements, the box went to Nirwana. A well known phenomenon right now. > > I checked now the consistency of the filesystem, here is the result of > the /usr/obj tree, which is a dedicated GPT partition > (label: /dev/gpt/usr.obj): > > > [...] > root@box1:~ # fsck -fy /dev/gpt/usr.obj > ** /dev/gpt/usr.obj > ** Last Mounted on /usr/obj > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > UNALLOCATED I=515 OWNER=root MODE=0 > SIZE=0 MTIME=Feb 22 07:25 2018 > NAME=/usr/src/amd64.amd64/sys/BOX1/config.c.new > > UNEXPECTED SOFT UPDATE INCONSISTENCY > > REMOVE? yes > > DIRECTORY CORRUPTED I=169691 OWNER=root MODE=40775 > SIZE=1536 MTIME=Feb 22 05:16 2018 > DIR=/usr/src/amd64.amd64/sys/BOX1/modules/usr/src/sys/modules/nfsd > > UNEXPECTED SOFT UPDATE INCONSISTENCY > > SALVAGE? yes > > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > FREE BLK COUNT(S) WRONG IN SUPERBLK > SALVAGE? yes > > SUMMARY INFORMATION BAD > SALVAGE? yes > > BLK(S) MISSING IN BIT MAPS > SALVAGE? yes > > 126922 files, 848197 used, 1178482 free (89210 frags, 136159 blocks, 4.4% > fragmentation) > > ***** FILE SYSTEM MARKED DIRTY ***** > > ***** FILE SYSTEM WAS MODIFIED ***** > > ***** PLEASE RERUN FSCK ***** > > [...] > > When doing a installworld, I pre-emptively perform in single user mode before > mounting the partitions a "fsck -yf" two times. In most cases, the filesystem > are reported clean, but sometimes especially those under high I/O (/usr/src and > mostly /usr/obj on this build machine) there are reports of corruption. > > As I reported, the very same behaviour occured on three boxes simultanously and > I got rid of it by completely reformatting the SSDs (never had issues so far > with HDD based boxes!). > > I hope I can refurbish this weekend the remaining box and I could report, if > desired, whether this box returns to a healthy state as the others or if my > observation was a simple coincidence of issues ... > > Thanks for the patience, > I also see such problems only with SSDs. Probably because the SSDs are buffering writes internally which never make it into the flash chips, although the SSDs report that the writes were completed. HDDs apparently don't do that, or have a smaller cache. I then also run fsck in single-user mode, but I explicitly do NOT use the journal, i.e., I do NOT run fsck -y. But I guess that using fsck -fy is equivalent to not using the journal. In my case the SSDs are error free after doing the fsck without using the jounal until the next crash happens. My box with a Ryzen 5 1600 tends to hang for no apparent reason, so I see these errors fairly frequently because I have to reset the box :( -- Gary Jennejohn