From owner-freebsd-current@freebsd.org Thu Dec 3 23:07:50 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F359EA40F03 for ; Thu, 3 Dec 2015 23:07:49 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CC3BB18C9 for ; Thu, 3 Dec 2015 23:07:49 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tB3N7mMl001027; Thu, 3 Dec 2015 15:07:48 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201512032307.tB3N7mMl001027@chez.mckusick.com> From: Kirk McKusick To: Mateusz Guzik Subject: Re: panic "ffs_checkblk: bad block" on recent -head kernels cc: Rick Macklem , FreeBSD Current In-reply-to: <20151203224752.GA19134@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1025.1449184068.1@chez.mckusick.com> Content-Transfer-Encoding: quoted-printable Date: Thu, 03 Dec 2015 15:07:48 -0800 X-Mailman-Approved-At: Thu, 03 Dec 2015 23:18:16 +0000 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 23:07:50 -0000 > Date: Thu, 3 Dec 2015 23:47:52 +0100 > From: Mateusz Guzik > To: Rick Macklem > Cc: FreeBSD Current > Subject: Re: panic "ffs_checkblk: bad block" on recent -head kernels > = > On Thu, Dec 03, 2015 at 05:08:27PM -0500, Rick Macklem wrote: >> Hi, >> = >> I get a fairly reproducible panic when doing a full kernel build >> on a 256Mbyte single core i386 when running recent kernels from -head. >> = >> The panic is "ffs_checkblk: bad block ..". I don't actually have the >> block # (although I think it's just 0xfffffffffffffff, given the backtr= ace), >> because it runs off the screen. (I looked up the message via the debugg= er >> from the first arg. to panic.) >> = >> Here's the backtrace without all the numbers: >> panic(c14f4b55, ffffffff, ffffffff, 0, 64,...) >> ffs_checkblk(ffffffff, 8000, fffffff9c, ffffffff, c4a02454,...) >> ffs_reallocblks >> VOP_REALLOCBLKS_APV >> cluster_write >> ffs_write >> VOP_WRITE_APV >> vn_write >> vn_io_fault_doio >> vn_io_fault1 >> vn_io_fault >> dofilewrite >> kern_writev >> sys_write >> syscall >> = >> It doesn't happen on a kernel dated Sep. 30, but does happen on a Nov. = 30 one. >> (I was away from home, so I didn't upgrade kernels for 2 months.) >> = >> I am slowly doing a binary search for the first kernel rev. where it oc= curs, >> but since each build takes hours, it's going to take a while;-). >> = >> At this point, it doesn't appear to happen on r289278 (just before jeff= @'s buffer >> cache patch). >> With kernels between r289279-->r290480, I get into the "R" state that >> was fixed by r290481 before I get a crash. >> I tried reverting r289405 and r290047 from a recent kernel and the cras= hes still >> occurred, so it doesn't appear to be these commits. >> = >> I am currently testing r290481 to see if the crash occurs for this rev. >> = >> If anyone has some insight into which commit might cause this, >> please let me know. > = > Well, did it crash with r291460 or later? > = > If so, try the kernel just before that and if that helps, try: > = > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index ff37de8..0ad6ef7 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -2783,6 +2783,7 @@ _vdrop(struct vnode *vp, bool locked) > vp->v_op =3D NULL; > #endif > bzero(&vp->v_un, sizeof(vp->v_un)); > + vp->v_lasta =3D vp->v_clen =3D vp->v_cstart =3D vp->v_lastw =3D = 0; > vp->v_iflag =3D 0; > vp->v_vflag =3D 0; > bo->bo_flag =3D 0; > = > -- = > Mateusz Guzik I concur with trying this suggestion. starting with r291460 these fields were no longer zero'ed when allocating the vnode. So you may have some residual values in there that are causing trouble. Kirk McKusick