Date: Wed, 3 Jul 2013 12:24:15 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Kirk McKusick <mckusick@FreeBSD.org> Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org Subject: Re: svn commit: r252527 - head/sys/ufs/ffs Message-ID: <20130703113444.S1064@besplex.bde.org> In-Reply-To: <201307022107.r62L786A087838@svn.freebsd.org> References: <201307022107.r62L786A087838@svn.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2 Jul 2013, Kirk McKusick wrote: > Log: > Make better use of metadata area by avoiding using it for data blocks > that no should no longer immediately follow their indirect blocks. I use the following changes for allocation at the start of cylinder group. They seem to be related. % diff -u2 ffs_alloc.c~ ffs_alloc.c % --- ffs_alloc.c~ Sun Jun 20 12:53:30 2004 % +++ ffs_alloc.c Mon Jan 2 16:52:39 2012 % @@ -1010,5 +1062,6 @@ % * Select the desired position for the next block in a file. The file is % * logically divided into sections. The first section is composed of the % - * direct blocks. Each additional section contains fs_maxbpg blocks. % + * direct blocks and the next fs_maxbpg blocks. Each additional section % + * contains fs_maxbpg blocks. % * % * If no blocks have been allocated in the first section, the policy is to % @@ -1022,12 +1075,11 @@ % * continues until a cylinder group with greater than the average number % * of free blocks is found. If the allocation is for the first block in an % - * indirect block, the information on the previous allocation is unavailable; % + * indirect block or the previous block is a hole, then the information on % + * the previous allocation is unavailable; % * here a best guess is made based upon the logical block number being % * allocated. % * % * If a section is already partially allocated, the policy is to % - * contiguously allocate fs_maxcontig blocks. The end of one of these % - * contiguous blocks and the beginning of the next is laid out % - * contiguously if possible. % + * allocate blocks contiguously within the section if possible. % */ % ufs2_daddr_t IIRC, the comments were out of date before this change. Especially the final paragraph -- fs_maxcontig is not used for anything. % @@ -1039,12 +1091,18 @@ % { % struct fs *fs; % - int cg; % - int avgbfree, startcg; % + int avgbfree, cg, firstsection, newsection, startcg; BTW, I don't like changing the type of these variables to u_int in -current. This asks for new sign extension bugs to break the warnings about old ones. Even 16-bit signed int is almost large enough for cg, so it is far from necessary to change it from signed to unsigned to double its range. % % fs = ip->i_fs; % - if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) { % - if (lbn < NDADDR + NINDIR(fs)) { % + if (lbn < NDADDR + fs->fs_maxbpg) { % + firstsection = 1; % + newsection = 0; % + } else { % + firstsection = 0; % + newsection = ((lbn - NDADDR) % fs->fs_maxbpg == 0); % + } % + if (indx == 0 || bap[indx - 1] == 0 || newsection) { % + if (firstsection) { % cg = ino_to_cg(fs, ip->i_number); % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % /* % @@ -1052,8 +1110,17 @@ % * unused data blocks. % */ % - if (indx == 0 || bap[indx - 1] == 0) % - startcg = % - ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg; % - else % + if (indx == 0 || bap[indx - 1] == 0) { % + cg = ino_to_cg(fs, ip->i_number) + % + (lbn - NDADDR) / fs->fs_maxbpg; % + if (!newsection) { % + /* % + * Actually, use our best guess if the % + * section is not new. % + */ % + cg %= fs->fs_ncg; % + return (cgdmin(fs, cg)); % + } % + startcg = cg; % + } else % startcg = dtog(fs, bap[indx - 1]) + 1; % startcg %= fs->fs_ncg; % @@ -1062,16 +1129,13 @@ % if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { % fs->fs_cgrotor = cg; % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % for (cg = 0; cg <= startcg; cg++) % if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { % fs->fs_cgrotor = cg; % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % return (0); % } % - /* % - * We just always try to lay things out contiguously. % - */ % return (bap[indx - 1] + fs->fs_frag); % } % @@ -1095,5 +1159,5 @@ % if (lbn < NDADDR + NINDIR(fs)) { % cg = ino_to_cg(fs, ip->i_number); % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % /* % @@ -1111,10 +1175,10 @@ % if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { % fs->fs_cgrotor = cg; % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % for (cg = 0; cg <= startcg; cg++) % if (fs->fs_cs(fs, cg).cs_nbfree >= avgbfree) { % fs->fs_cgrotor = cg; % - return (fs->fs_fpg * cg + fs->fs_frag); % + return (cgdmin(fs, cg)); % } % return (0); I remembered the changes to use cgdmin() as just style fixes, but looking at the macro expansions now, this is far from clear. The macros are deeply nested and depend on the ffs version. The inline expressions don't depend on the ffs version, and are also missing addition of fs_dblkno. I now remember that all the changes are related to correctly skipping over the early blocks whose count is given by fs_dblkno. The direct expression gives a block number that is too small to be for a data block. Blocks early in the cg are already allocated, so the preference for using them given by returning the too-small value is ignored and the caller skips over the allocated blocks. However, IIRC this results in a tiny cluster of data blocks being allocated early in the cg, probably because all the blocks before fs_dblkno are not allocated for metadata. This showed up in tests for contiguous allocation. Large files on new file systems should be allocated perfectly contiguously on each cg after the first (accept for indirect blocks), but instead they have a gap at the start of each cg. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130703113444.S1064>