From owner-svn-src-all@FreeBSD.ORG Thu May 28 00:11:38 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 34F16C22; Thu, 28 May 2015 00:11:38 +0000 (UTC) (envelope-from mckusick@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 007C59BE; Thu, 28 May 2015 00:11:38 +0000 (UTC) (envelope-from mckusick@FreeBSD.org) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id t4S0BbvA029417; Thu, 28 May 2015 00:11:37 GMT (envelope-from mckusick@FreeBSD.org) Received: (from mckusick@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id t4S0BbiQ029414; Thu, 28 May 2015 00:11:37 GMT (envelope-from mckusick@FreeBSD.org) Message-Id: <201505280011.t4S0BbiQ029414@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: mckusick set sender to mckusick@FreeBSD.org using -f From: Kirk McKusick Date: Thu, 28 May 2015 00:11:37 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r283640 - in stable/10/sys/ufs: ffs ufs X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 May 2015 00:11:38 -0000 Author: mckusick Date: Thu May 28 00:11:36 2015 New Revision: 283640 URL: https://svnweb.freebsd.org/changeset/base/283640 Log: MFC of 269533: Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@) Modified: stable/10/sys/ufs/ffs/ffs_alloc.c stable/10/sys/ufs/ffs/ffs_vfsops.c stable/10/sys/ufs/ufs/inode.h Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/ufs/ffs/ffs_alloc.c ============================================================================== --- stable/10/sys/ufs/ffs/ffs_alloc.c Wed May 27 22:30:21 2015 (r283639) +++ stable/10/sys/ufs/ffs/ffs_alloc.c Thu May 28 00:11:36 2015 (r283640) @@ -112,8 +112,7 @@ static void ffs_blkfree_trim_task(void * #ifdef INVARIANTS static int ffs_checkblk(struct inode *, ufs2_daddr_t, long); #endif -static ufs2_daddr_t ffs_clusteralloc(struct inode *, u_int, ufs2_daddr_t, int, - int); +static ufs2_daddr_t ffs_clusteralloc(struct inode *, u_int, ufs2_daddr_t, int); static ino_t ffs_dirpref(struct inode *); static ufs2_daddr_t ffs_fragextend(struct inode *, u_int, ufs2_daddr_t, int, int); @@ -460,10 +459,16 @@ nospace: SYSCTL_NODE(_vfs, OID_AUTO, ffs, CTLFLAG_RW, 0, "FFS filesystem"); static int doasyncfree = 1; -SYSCTL_INT(_vfs_ffs, OID_AUTO, doasyncfree, CTLFLAG_RW, &doasyncfree, 0, ""); +SYSCTL_INT(_vfs_ffs, OID_AUTO, doasyncfree, CTLFLAG_RW, &doasyncfree, 0, +"do not force synchronous writes when blocks are reallocated"); static int doreallocblks = 1; -SYSCTL_INT(_vfs_ffs, OID_AUTO, doreallocblks, CTLFLAG_RW, &doreallocblks, 0, ""); +SYSCTL_INT(_vfs_ffs, OID_AUTO, doreallocblks, CTLFLAG_RW, &doreallocblks, 0, +"enable block reallocation"); + +static int maxclustersearch = 10; +SYSCTL_INT(_vfs_ffs, OID_AUTO, maxclustersearch, CTLFLAG_RW, &maxclustersearch, +0, "max number of cylinder group to search for contigous blocks"); #ifdef DEBUG static volatile int prtrealloc = 0; @@ -510,7 +515,7 @@ ffs_reallocblks_ufs1(ap) ufs1_daddr_t soff, newblk, blkno; ufs2_daddr_t pref; struct indir start_ap[NIADDR + 1], end_ap[NIADDR + 1], *idp; - int i, len, start_lvl, end_lvl, ssize; + int i, cg, len, start_lvl, end_lvl, ssize; vp = ap->a_vp; ip = VTOI(vp); @@ -597,18 +602,39 @@ ffs_reallocblks_ufs1(ap) ebap = (ufs1_daddr_t *)ebp->b_data; } /* - * Find the preferred location for the cluster. + * Find the preferred location for the cluster. If we have not + * previously failed at this endeavor, then follow our standard + * preference calculation. If we have failed at it, then pick up + * where we last ended our search. */ UFS_LOCK(ump); - pref = ffs_blkpref_ufs1(ip, start_lbn, soff, sbap); + if (ip->i_nextclustercg == -1) + pref = ffs_blkpref_ufs1(ip, start_lbn, soff, sbap); + else + pref = cgdata(fs, ip->i_nextclustercg); /* * Search the block map looking for an allocation of the desired size. + * To avoid wasting too much time, we limit the number of cylinder + * groups that we will search. + */ + cg = dtog(fs, pref); + for (i = min(maxclustersearch, fs->fs_ncg); i > 0; i--) { + if ((newblk = ffs_clusteralloc(ip, cg, pref, len)) != 0) + break; + cg += 1; + if (cg >= fs->fs_ncg) + cg = 0; + } + /* + * If we have failed in our search, record where we gave up for + * next time. Otherwise, fall back to our usual search citerion. */ - if ((newblk = ffs_hashalloc(ip, dtog(fs, pref), pref, - len, len, ffs_clusteralloc)) == 0) { + if (newblk == 0) { + ip->i_nextclustercg = cg; UFS_UNLOCK(ump); goto fail; } + ip->i_nextclustercg = -1; /* * We have found a new contiguous block. * @@ -737,7 +763,7 @@ ffs_reallocblks_ufs2(ap) ufs_lbn_t start_lbn, end_lbn; ufs2_daddr_t soff, newblk, blkno, pref; struct indir start_ap[NIADDR + 1], end_ap[NIADDR + 1], *idp; - int i, len, start_lvl, end_lvl, ssize; + int i, cg, len, start_lvl, end_lvl, ssize; vp = ap->a_vp; ip = VTOI(vp); @@ -824,18 +850,39 @@ ffs_reallocblks_ufs2(ap) ebap = (ufs2_daddr_t *)ebp->b_data; } /* - * Find the preferred location for the cluster. + * Find the preferred location for the cluster. If we have not + * previously failed at this endeavor, then follow our standard + * preference calculation. If we have failed at it, then pick up + * where we last ended our search. */ UFS_LOCK(ump); - pref = ffs_blkpref_ufs2(ip, start_lbn, soff, sbap); + if (ip->i_nextclustercg == -1) + pref = ffs_blkpref_ufs2(ip, start_lbn, soff, sbap); + else + pref = cgdata(fs, ip->i_nextclustercg); /* * Search the block map looking for an allocation of the desired size. + * To avoid wasting too much time, we limit the number of cylinder + * groups that we will search. + */ + cg = dtog(fs, pref); + for (i = min(maxclustersearch, fs->fs_ncg); i > 0; i--) { + if ((newblk = ffs_clusteralloc(ip, cg, pref, len)) != 0) + break; + cg += 1; + if (cg >= fs->fs_ncg) + cg = 0; + } + /* + * If we have failed in our search, record where we gave up for + * next time. Otherwise, fall back to our usual search citerion. */ - if ((newblk = ffs_hashalloc(ip, dtog(fs, pref), pref, - len, len, ffs_clusteralloc)) == 0) { + if (newblk == 0) { + ip->i_nextclustercg = cg; UFS_UNLOCK(ump); goto fail; } + ip->i_nextclustercg = -1; /* * We have found a new contiguous block. * @@ -1785,12 +1832,11 @@ gotit: * take the first one that we find following bpref. */ static ufs2_daddr_t -ffs_clusteralloc(ip, cg, bpref, len, unused) +ffs_clusteralloc(ip, cg, bpref, len) struct inode *ip; u_int cg; ufs2_daddr_t bpref; int len; - int unused; { struct fs *fs; struct cg *cgp; Modified: stable/10/sys/ufs/ffs/ffs_vfsops.c ============================================================================== --- stable/10/sys/ufs/ffs/ffs_vfsops.c Wed May 27 22:30:21 2015 (r283639) +++ stable/10/sys/ufs/ffs/ffs_vfsops.c Thu May 28 00:11:36 2015 (r283640) @@ -1687,6 +1687,7 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags ip->i_dev = dev; ip->i_number = ino; ip->i_ea_refs = 0; + ip->i_nextclustercg = -1; #ifdef QUOTA { int i; Modified: stable/10/sys/ufs/ufs/inode.h ============================================================================== --- stable/10/sys/ufs/ufs/inode.h Wed May 27 22:30:21 2015 (r283639) +++ stable/10/sys/ufs/ufs/inode.h Thu May 28 00:11:36 2015 (r283640) @@ -87,6 +87,8 @@ struct inode { daddr_t *snapblklist; /* Collect expunged snapshot blocks. */ } i_un; + int i_nextclustercg; /* last cg searched for cluster */ + /* * Data for extended attribute modification. */