From owner-freebsd-bugs@FreeBSD.ORG Wed Oct 17 07:40:06 2007 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B52AB16A418 for ; Wed, 17 Oct 2007 07:40:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 91E3813C44B for ; Wed, 17 Oct 2007 07:40:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9H7e6s1053293 for ; Wed, 17 Oct 2007 07:40:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l9H7e6IG053292; Wed, 17 Oct 2007 07:40:06 GMT (envelope-from gnats) Date: Wed, 17 Oct 2007 07:40:06 GMT Message-Id: <200710170740.l9H7e6IG053292@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Nate Eldredge Cc: Subject: Re: bin/115174: growfs(8) needs zero-writing for safe filesystem expansion [PATCH] X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Nate Eldredge List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2007 07:40:06 -0000 The following reply was made to PR bin/115174; it has been noted by GNATS. From: Nate Eldredge To: bug-followup@FreeBSD.org, etc@fluffles.net, scottl@freebsd.org Cc: Subject: Re: bin/115174: growfs(8) needs zero-writing for safe filesystem expansion [PATCH] Date: Wed, 17 Oct 2007 00:32:57 -0700 (PDT) I took a look at this. Let me say up front that I am not an fs expert, I'm just reading the code, so any of this could be wrong. It looks like this was introduced in revision 1.23 of src/sbin/growfs/growfs.c, commited by scottl. There are actually two different bugs for UFS1 and UFS2. The idea is that inodes have to be initialized to zero before they are used, and fsck complains if this isn't the case, even if those inodes are not in use according to the cylinder group bitmap. In UFS1, newfs is expected to initialize all inodes. In UFS2, inode initialization is "lazy"; there is a field cg_initediblk in the cylinder group indicating how many inodes have been initialized, starting from the first. (It looks like this and cg_niblk count inodes, not blocks, despite the name.) If more are needed, the kernel initializes them as necessary. For some reason, newfs initializes the first two blocks worth of inodes in every cg. growfs works by creating new cylinder groups in the filesystem, each of which comes with its own chunk of inodes. Prior to revision 1.23, growfs initialized the inodes in all the cgs it created, for both UFS1 and UFS2. Revision 1.23 attempted to optimize this behavior. For UFS2, it removed the inode initialization completely, but did not adjust cg_initediblk accordingly, so the kernel and fsck still thought that all inodes were initialized when actually they were not. My patch fixes this by setting cg_initedblk to zero. It would be possible to behave like newfs and initialize the first few inodes, but I don't see why that is necessary. It is simpler to skip it entirely, and in my tests it works fine. For UFS1, revision 1.23 changes it so that all *but* the first two blocks of inodes are initialized. It looks like the author was confused by some out-of-context code from newfs (which I'm not sure is exactly right itself), since this doesn't make any sense. My patch changes it back to initializing all inodes, and removes some leftover UFS2 initialization code as well, which is no longer used. I tested this patch on a md filesystem, filling it with random bytes, creating and filling a 100 MB filesystem, then growing it to 200 MB. fsck passes the new filesystem and all files remain intact, with both UFS1 and UFS2. --- /usr/src/sbin/growfs/growfs.c Tue Nov 29 23:20:16 2005 +++ ./growfs.c Tue Oct 16 23:07:47 2007 @@ -376,7 +376,6 @@ long d, dlower, dupper, blkno, start; ufs2_daddr_t i, cbase, dmax; struct ufs1_dinode *dp1; - struct ufs2_dinode *dp2; struct csum *cs; if (iobuf == NULL && (iobuf = malloc(sblock.fs_bsize)) == NULL) { @@ -401,7 +400,7 @@ acg.cg_magic = CG_MAGIC; acg.cg_cgx = cylno; acg.cg_niblk = sblock.fs_ipg; - acg.cg_initediblk = sblock.fs_ipg; + acg.cg_initediblk = 0; acg.cg_ndblk = dmax - cbase; if (sblock.fs_contigsumsize > 0) acg.cg_nclusterblks = acg.cg_ndblk / sblock.fs_frag; @@ -445,26 +444,16 @@ setbit(cg_inosused(&acg), i); acg.cg_cs.cs_nifree--; } - /* - * XXX Newfs writes out two blocks of initialized inodes - * unconditionally. Should we check here to make sure that they - * were actually written? - */ if (sblock.fs_magic == FS_UFS1_MAGIC) { bzero(iobuf, sblock.fs_bsize); - for (i = 2 * sblock.fs_frag; i < sblock.fs_ipg / INOPF(&sblock); + for (i = 0; i < sblock.fs_ipg / INOPF(&sblock); i += sblock.fs_frag) { dp1 = (struct ufs1_dinode *)iobuf; - dp2 = (struct ufs2_dinode *)iobuf; #ifdef FSIRAND - for (j = 0; j < INOPB(&sblock); j++) - if (sblock.fs_magic == FS_UFS1_MAGIC) { - dp1->di_gen = random(); - dp1++; - } else { - dp2->di_gen = random(); - dp2++; - } + for (j = 0; j < INOPB(&sblock); j++) { + dp1->di_gen = random(); + dp1++; + } #endif wtfs(fsbtodb(&sblock, cgimin(&sblock, cylno) + i), sblock.fs_bsize, iobuf, fso, Nflag); -- Nate Eldredge nge@cs.hmc.edu