From owner-svn-src-all@FreeBSD.ORG Wed Feb 4 01:02:57 2009 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD1DA1065673; Wed, 4 Feb 2009 01:02:56 +0000 (UTC) (envelope-from mckusick@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id C98988FC21; Wed, 4 Feb 2009 01:02:56 +0000 (UTC) (envelope-from mckusick@FreeBSD.org) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id n1412ulW047521; Wed, 4 Feb 2009 01:02:56 GMT (envelope-from mckusick@svn.freebsd.org) Received: (from mckusick@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id n1412u40047515; Wed, 4 Feb 2009 01:02:56 GMT (envelope-from mckusick@svn.freebsd.org) Message-Id: <200902040102.n1412u40047515@svn.freebsd.org> From: Kirk McKusick Date: Wed, 4 Feb 2009 01:02:56 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r188110 - head/sbin/fsck_ffs X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Feb 2009 01:02:57 -0000 Author: mckusick Date: Wed Feb 4 01:02:56 2009 New Revision: 188110 URL: http://svn.freebsd.org/changeset/base/188110 Log: Update the actions previously attempted by the -D option to make them robust. With these changes fsck is now able to detect and reliably rebuild corrupted cylinder group maps. The -D option is no longer necessary as it has been replaced by a prompt asking whether the corrupted cylinder group should be rebuilt and doing so when requested. These actions are only offered and taken when running fsck in manual mode. Corrupted cylinder groups found during preen mode cause the fsck to fail. Add the -r option to free up excess unused inodes. Decreasing the number of preallocated inodes reduces the running time of future runs of fsck and frees up space that can allocated to files. The -r option is ignored when running in preen mode. Reviewed by: Xin LI Sponsored by: Rsync.net Modified: head/sbin/fsck_ffs/fsck.h head/sbin/fsck_ffs/fsck_ffs.8 head/sbin/fsck_ffs/fsutil.c head/sbin/fsck_ffs/inode.c head/sbin/fsck_ffs/main.c head/sbin/fsck_ffs/pass1.c Modified: head/sbin/fsck_ffs/fsck.h ============================================================================== --- head/sbin/fsck_ffs/fsck.h Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/fsck.h Wed Feb 4 01:02:56 2009 (r188110) @@ -270,7 +270,7 @@ char yflag; /* assume a yes response * int bkgrdflag; /* use a snapshot to run on an active system */ int bflag; /* location of alternate super block */ int debug; /* output debugging info */ -char damagedflag; /* run in damaged mode */ +int inoopt; /* trim out unused inodes */ char ckclean; /* only do work if not cleanly unmounted */ int cvtlevel; /* convert to newer file system format */ int bkgrdcheck; /* determine if background check is possible */ @@ -337,7 +337,7 @@ void cacheino(union dinode *dp, ino_t i void catch(int); void catchquit(int); int changeino(ino_t dir, const char *name, ino_t newnum); -void check_cgmagic(int cg, struct cg *cgp); +int check_cgmagic(int cg, struct cg *cgp); int chkrange(ufs2_daddr_t blk, int cnt); void ckfini(int markclean); int ckinode(union dinode *dp, struct inodesc *); @@ -362,7 +362,7 @@ int ftypeok(union dinode *dp); void getblk(struct bufarea *bp, ufs2_daddr_t blk, long size); struct bufarea *getdatablk(ufs2_daddr_t blkno, long size); struct inoinfo *getinoinfo(ino_t inumber); -union dinode *getnextinode(ino_t inumber); +union dinode *getnextinode(ino_t inumber, int rebuildcg); void getpathname(char *namebuf, ino_t curdir, ino_t ino); union dinode *ginode(ino_t inumber); void infohandler(int sig); Modified: head/sbin/fsck_ffs/fsck_ffs.8 ============================================================================== --- head/sbin/fsck_ffs/fsck_ffs.8 Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/fsck_ffs.8 Wed Feb 4 01:02:56 2009 (r188110) @@ -38,7 +38,7 @@ .Nd file system consistency check and interactive repair .Sh SYNOPSIS .Nm -.Op Fl BDFpfny +.Op Fl BFprfny .Op Fl b Ar block .Op Fl c Ar level .Op Fl m Ar mode @@ -216,22 +216,6 @@ are being converted at once. The format of a file system can be determined from the first line of output from .Xr dumpfs 8 . -.It Fl D -Run -.Nm -in 'damaged recovery' mode, which will enable certain aggressive -operations that can make -.Nm -to survive with file systems that has very serious data damage, which -is an useful last resort when on disk data damage is very serious -and causes -.Nm -to crash otherwise. Be -.Em very careful -using this flag, it is dangerous if there are data transmission hazards -because a false positive cylinder group magic number mismatch could -cause -.Em irrevertible data loss! .Pp This option implies the .Fl f @@ -259,6 +243,15 @@ which is assumed to be affirmative; do not open the file system for writing. .It Fl p Preen file systems (see above). +.It Fl r +Free up excess unused inodes. +Decreasing the number of preallocated inodes reduces the +running time of future runs of +.Nm +and frees up space that can allocated to files. +The +.Fl r +option is ignored when running in preen mode. .It Fl y Assume a yes response to all questions asked by .Nm ; Modified: head/sbin/fsck_ffs/fsutil.c ============================================================================== --- head/sbin/fsck_ffs/fsutil.c Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/fsutil.c Wed Feb 4 01:02:56 2009 (r188110) @@ -333,9 +333,13 @@ ckfini(int markclean) if (!markclean) rerun = 1; } - } else if (!preen && !markclean) { - printf("\n***** FILE SYSTEM STILL DIRTY *****\n"); - rerun = 1; + } else if (!preen) { + if (markclean) { + printf("\n***** FILE SYSTEM IS CLEAN *****\n"); + } else { + printf("\n***** FILE SYSTEM STILL DIRTY *****\n"); + rerun = 1; + } } if (debug && totalreads > 0) printf("cache missed %ld of %ld (%d%%)\n", diskreads, @@ -418,32 +422,73 @@ blwrite(int fd, char *buf, ufs2_daddr_t } /* - * Check cg's magic number. If catastrophic mode is enabled and the cg's - * magic number is bad, offer an option to clear the whole cg. + * Verify cylinder group's magic number and other parameters. If the + * test fails, offer an option to rebuild the whole cylinder group. */ -void +int check_cgmagic(int cg, struct cg *cgp) { - if (!cg_chkmagic(cgp)) { - pwarn("CG %d: BAD MAGIC NUMBER\n", cg); - if (damagedflag) { - if (reply("CLEAR CG")) { - memset(cgp, 0, (size_t)sblock.fs_cgsize); - cgp->cg_initediblk = sblock.fs_ipg; - cgp->cg_old_niblk = sblock.fs_ipg; - cgp->cg_old_ncyl = sblock.fs_old_cpg; - cgp->cg_cgx = cg; - cgp->cg_niblk = sblock.fs_ipg; - cgp->cg_ndblk = sblock.fs_size - cgbase(&sblock, cg); - cgp->cg_magic = CG_MAGIC; - cgdirty(); - printf("PLEASE RERUN FSCK.\n"); - rerun = 1; - } - } else - printf("YOU MAY NEED TO RERUN FSCK WITH -D IF IT CRASHED.\n"); + /* + * Extended cylinder group checks. + */ + if (cg_chkmagic(cgp) && + ((sblock.fs_magic == FS_UFS1_MAGIC && + cgp->cg_old_niblk == sblock.fs_ipg && + cgp->cg_ndblk <= sblock.fs_fpg && + cgp->cg_old_ncyl == sblock.fs_old_cpg) || + (sblock.fs_magic == FS_UFS2_MAGIC && + cgp->cg_niblk == sblock.fs_ipg && + cgp->cg_ndblk <= sblock.fs_fpg && + cgp->cg_initediblk <= sblock.fs_ipg))) { + return (1); + } + pfatal("CYLINDER GROUP %d: BAD MAGIC NUMBER", cg); + if (!reply("REBUILD CYLINDER GROUP")) { + printf("YOU WILL NEED TO RERUN FSCK.\n"); + rerun = 1; + return (1); } + /* + * Zero out the cylinder group and then initialize critical fields. + * Bit maps and summaries will be recalculated by later passes. + */ + memset(cgp, 0, (size_t)sblock.fs_cgsize); + cgp->cg_magic = CG_MAGIC; + cgp->cg_cgx = cg; + cgp->cg_niblk = sblock.fs_ipg; + cgp->cg_initediblk = sblock.fs_ipg < 2 * INOPB(&sblock) ? + sblock.fs_ipg : 2 * INOPB(&sblock); + if (cgbase(&sblock, cg) + sblock.fs_fpg < sblock.fs_size) + cgp->cg_ndblk = sblock.fs_fpg; + else + cgp->cg_ndblk = sblock.fs_size - cgbase(&sblock, cg); + cgp->cg_iusedoff = &cgp->cg_space[0] - (u_char *)(&cgp->cg_firstfield); + if (sblock.fs_magic == FS_UFS1_MAGIC) { + cgp->cg_niblk = 0; + cgp->cg_initediblk = 0; + cgp->cg_old_ncyl = sblock.fs_old_cpg; + cgp->cg_old_niblk = sblock.fs_ipg; + cgp->cg_old_btotoff = cgp->cg_iusedoff; + cgp->cg_old_boff = cgp->cg_old_btotoff + + sblock.fs_old_cpg * sizeof(int32_t); + cgp->cg_iusedoff = cgp->cg_old_boff + + sblock.fs_old_cpg * sizeof(u_int16_t); + } + cgp->cg_freeoff = cgp->cg_iusedoff + howmany(sblock.fs_ipg, CHAR_BIT); + cgp->cg_nextfreeoff = cgp->cg_freeoff + howmany(sblock.fs_fpg,CHAR_BIT); + if (sblock.fs_contigsumsize > 0) { + cgp->cg_nclusterblks = cgp->cg_ndblk / sblock.fs_frag; + cgp->cg_clustersumoff = + roundup(cgp->cg_nextfreeoff, sizeof(u_int32_t)); + cgp->cg_clustersumoff -= sizeof(u_int32_t); + cgp->cg_clusteroff = cgp->cg_clustersumoff + + (sblock.fs_contigsumsize + 1) * sizeof(u_int32_t); + cgp->cg_nextfreeoff = cgp->cg_clusteroff + + howmany(fragstoblks(&sblock, sblock.fs_fpg), CHAR_BIT); + } + cgdirty(); + return (0); } /* @@ -470,7 +515,8 @@ allocblk(long frags) } cg = dtog(&sblock, i + j); getblk(&cgblk, cgtod(&sblock, cg), sblock.fs_cgsize); - check_cgmagic(cg, cgp); + if (!check_cgmagic(cg, cgp)) + return (0); baseblk = dtogd(&sblock, i + j); for (k = 0; k < frags; k++) { setbmap(i + j + k); Modified: head/sbin/fsck_ffs/inode.c ============================================================================== --- head/sbin/fsck_ffs/inode.c Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/inode.c Wed Feb 4 01:02:56 2009 (r188110) @@ -309,10 +309,12 @@ static long readcnt, readpercg, fullcnt, static caddr_t inodebuf; union dinode * -getnextinode(ino_t inumber) +getnextinode(ino_t inumber, int rebuildcg) { + int j; long size; - ufs2_daddr_t dblk; + mode_t mode; + ufs2_daddr_t ndb, dblk; union dinode *dp; static caddr_t nextinop; @@ -336,6 +338,54 @@ getnextinode(ino_t inumber) nextinop = inodebuf; } dp = (union dinode *)nextinop; + if (rebuildcg && nextinop == inodebuf) { + /* + * Try to determine if we have reached the end of the + * allocated inodes. + */ + mode = DIP(dp, di_mode) & IFMT; + if (mode == 0) { + if (memcmp(dp->dp2.di_db, ufs2_zino.di_db, + NDADDR * sizeof(ufs2_daddr_t)) || + memcmp(dp->dp2.di_ib, ufs2_zino.di_ib, + NIADDR * sizeof(ufs2_daddr_t)) || + dp->dp2.di_mode || dp->dp2.di_size) + return (NULL); + goto inodegood; + } + if (!ftypeok(dp)) + return (NULL); + ndb = howmany(DIP(dp, di_size), sblock.fs_bsize); + if (ndb < 0) + return (NULL); + if (mode == IFBLK || mode == IFCHR) + ndb++; + if (mode == IFLNK) { + /* + * Fake ndb value so direct/indirect block checks below + * will detect any garbage after symlink string. + */ + if (DIP(dp, di_size) < (off_t)sblock.fs_maxsymlinklen) { + ndb = howmany(DIP(dp, di_size), + sizeof(ufs2_daddr_t)); + if (ndb > NDADDR) { + j = ndb - NDADDR; + for (ndb = 1; j > 1; j--) + ndb *= NINDIR(&sblock); + ndb += NDADDR; + } + } + } + for (j = ndb; ndb < NDADDR && j < NDADDR; j++) + if (DIP(dp, di_db[j]) != 0) + return (NULL); + for (j = 0, ndb -= NDADDR; ndb > 0; j++) + ndb /= NINDIR(&sblock); + for (; j < NIADDR; j++) + if (DIP(dp, di_ib[j]) != 0) + return (NULL); + } +inodegood: if (sblock.fs_magic == FS_UFS1_MAGIC) nextinop += sizeof(struct ufs1_dinode); else @@ -617,7 +667,8 @@ allocino(ino_t request, int type) return (0); cg = ino_to_cg(&sblock, ino); getblk(&cgblk, cgtod(&sblock, cg), sblock.fs_cgsize); - check_cgmagic(cg, cgp); + if (!check_cgmagic(cg, cgp)) + return (0); setbit(cg_inosused(cgp), ino % sblock.fs_ipg); cgp->cg_cs.cs_nifree--; switch (type & IFMT) { Modified: head/sbin/fsck_ffs/main.c ============================================================================== --- head/sbin/fsck_ffs/main.c Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/main.c Wed Feb 4 01:02:56 2009 (r188110) @@ -81,8 +81,8 @@ main(int argc, char *argv[]) sync(); skipclean = 1; - damagedflag = 0; - while ((ch = getopt(argc, argv, "b:Bc:CdDfFm:npy")) != -1) { + inoopt = 0; + while ((ch = getopt(argc, argv, "b:Bc:CdfFm:npry")) != -1) { switch (ch) { case 'b': skipclean = 0; @@ -106,10 +106,6 @@ main(int argc, char *argv[]) debug++; break; - case 'D': - damagedflag = 1; - /* FALLTHROUGH */ - case 'f': skipclean = 0; break; @@ -138,6 +134,10 @@ main(int argc, char *argv[]) ckclean++; break; + case 'r': + inoopt++; + break; + case 'y': yflag++; nflag = 0; @@ -606,7 +606,7 @@ static void usage(void) { (void) fprintf(stderr, - "usage: %s [-BCFpfny] [-b block] [-c level] [-m mode] " + "usage: %s [-BFprfny] [-b block] [-c level] [-m mode] " "filesystem ...\n", getprogname()); exit(1); Modified: head/sbin/fsck_ffs/pass1.c ============================================================================== --- head/sbin/fsck_ffs/pass1.c Wed Feb 4 00:45:25 2009 (r188109) +++ head/sbin/fsck_ffs/pass1.c Wed Feb 4 01:02:56 2009 (r188110) @@ -54,17 +54,17 @@ static ufs2_daddr_t badblk; static ufs2_daddr_t dupblk; static ino_t lastino; /* last inode in use */ -static void checkinode(ino_t inumber, struct inodesc *); +static int checkinode(ino_t inumber, struct inodesc *, int rebuildcg); void pass1(void) { struct inostat *info; struct inodesc idesc; - ino_t inumber, inosused; + ino_t inumber, inosused, mininos; ufs2_daddr_t i, cgd; u_int8_t *cp; - int c; + int c, rebuildcg; /* * Set file system reserved blocks in used block map. @@ -93,7 +93,10 @@ pass1(void) inumber = c * sblock.fs_ipg; setinodebuf(inumber); getblk(&cgblk, cgtod(&sblock, c), sblock.fs_cgsize); - if (sblock.fs_magic == FS_UFS2_MAGIC) { + rebuildcg = 0; + if (!check_cgmagic(c, &cgrp)) + rebuildcg = 1; + if (!rebuildcg && sblock.fs_magic == FS_UFS2_MAGIC) { inosused = cgrp.cg_initediblk; if (inosused > sblock.fs_ipg) inosused = sblock.fs_ipg; @@ -117,9 +120,7 @@ pass1(void) * to find the inodes that are really in use, and then * read only those inodes in from disk. */ - if (preen && usedsoftdep) { - if (!cg_chkmagic(&cgrp)) - pfatal("CG %d: BAD MAGIC NUMBER\n", c); + if ((preen || inoopt) && usedsoftdep && !rebuildcg) { cp = &cg_inosused(&cgrp)[(inosused - 1) / CHAR_BIT]; for ( ; inosused > 0; inosused -= CHAR_BIT, cp--) { if (*cp == 0) @@ -152,24 +153,60 @@ pass1(void) */ for (i = 0; i < inosused; i++, inumber++) { if (inumber < ROOTINO) { - (void)getnextinode(inumber); + (void)getnextinode(inumber, rebuildcg); continue; } - checkinode(inumber, &idesc); + /* + * NULL return indicates probable end of allocated + * inodes during cylinder group rebuild attempt. + * We always keep trying until we get to the minimum + * valid number for this cylinder group. + */ + if (checkinode(inumber, &idesc, rebuildcg) == 0 && + i > cgrp.cg_initediblk) + break; } - lastino += 1; - if (inosused < sblock.fs_ipg || inumber == lastino) + /* + * This optimization speeds up future runs of fsck + * by trimming down the number of inodes in cylinder + * groups that formerly had many inodes but now have + * fewer in use. + */ + mininos = roundup(inosused + INOPB(&sblock), INOPB(&sblock)); + if (inoopt && !preen && !rebuildcg && + sblock.fs_magic == FS_UFS2_MAGIC && + cgrp.cg_initediblk > 2 * INOPB(&sblock) && + mininos < cgrp.cg_initediblk) { + i = cgrp.cg_initediblk; + if (mininos < 2 * INOPB(&sblock)) + cgrp.cg_initediblk = 2 * INOPB(&sblock); + else + cgrp.cg_initediblk = mininos; + pwarn("CYLINDER GROUP %d: RESET FROM %ju TO %d %s\n", + c, i, cgrp.cg_initediblk, "VALID INODES"); + cgdirty(); + } + if (inosused < sblock.fs_ipg) continue; + lastino += 1; + if (lastino < (c * sblock.fs_ipg)) + inosused = 0; + else + inosused = lastino - (c * sblock.fs_ipg); + if (rebuildcg && inosused > cgrp.cg_initediblk && + sblock.fs_magic == FS_UFS2_MAGIC) { + cgrp.cg_initediblk = roundup(inosused, INOPB(&sblock)); + pwarn("CYLINDER GROUP %d: FOUND %d VALID INODES\n", c, + cgrp.cg_initediblk); + } /* * If we were not able to determine in advance which inodes * were in use, then reduce the size of the inoinfo structure * to the size necessary to describe the inodes that we * really found. */ - if (lastino < (c * sblock.fs_ipg)) - inosused = 0; - else - inosused = lastino - (c * sblock.fs_ipg); + if (inumber == lastino) + continue; inostathead[c].il_numalloced = inosused; if (inosused == 0) { free(inostathead[c].il_stat); @@ -187,8 +224,8 @@ pass1(void) freeinodebuf(); } -static void -checkinode(ino_t inumber, struct inodesc *idesc) +static int +checkinode(ino_t inumber, struct inodesc *idesc, int rebuildcg) { union dinode *dp; off_t kernmaxfilesize; @@ -196,7 +233,8 @@ checkinode(ino_t inumber, struct inodesc mode_t mode; int j, ret, offset; - dp = getnextinode(inumber); + if ((dp = getnextinode(inumber, rebuildcg)) == NULL) + return (0); mode = DIP(dp, di_mode) & IFMT; if (mode == 0) { if ((sblock.fs_magic == FS_UFS1_MAGIC && @@ -220,7 +258,7 @@ checkinode(ino_t inumber, struct inodesc } } inoinfo(inumber)->ino_state = USTATE; - return; + return (1); } lastino = inumber; /* This should match the file size limit in ffs_mountfs(). */ @@ -352,7 +390,7 @@ checkinode(ino_t inumber, struct inodesc if (preen) printf(" (CORRECTED)\n"); else if (reply("CORRECT") == 0) - return; + return (1); if (bkgrdflag == 0) { dp = ginode(inumber); DIP_SET(dp, di_blocks, idesc->id_entryno); @@ -368,7 +406,7 @@ checkinode(ino_t inumber, struct inodesc rwerror("ADJUST INODE BLOCK COUNT", cmd.value); } } - return; + return (1); unknown: pfatal("UNKNOWN FILE TYPE I=%lu", (u_long)inumber); inoinfo(inumber)->ino_state = FCLEAR; @@ -378,6 +416,7 @@ unknown: clearinode(dp); inodirty(); } + return (1); } int