Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 May 2016 05:11:01 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        fs@freebsd.org
Subject:   Re: quick fix for slow directory shrinking in ffs
Message-ID:  <20160518035413.L4357@besplex.bde.org>
In-Reply-To: <20160517111715.GC89104@kib.kiev.ua>
References:  <20160517072705.F2157@besplex.bde.org> <20160517082050.GX89104@kib.kiev.ua> <20160517192933.U4573@besplex.bde.org> <20160517111715.GC89104@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 May 2016, Konstantin Belousov wrote:

> On Tue, May 17, 2016 at 08:26:26PM +1000, Bruce Evans wrote:
>> On Tue, 17 May 2016, Konstantin Belousov wrote:
>>
>>> On Tue, May 17, 2016 at 07:54:27AM +1000, Bruce Evans wrote:
>>>> ffs does very slow shrinking of directories after removing some files
>>>> leaves unused blocks at the end, by always doing synchronous truncation.
>>>> ...
>>>> X Index: ufs_lookup.c
>>>> X ===================================================================
>>>> X --- ufs_lookup.c	(revision 299263)
>>>> X +++ ufs_lookup.c	(working copy)
>>>> X @@ -1131,9 +1131,9 @@
>>>> X  		if (tvp != NULL)
>>>> X  			VOP_UNLOCK(tvp, 0);
>>>> X  		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
>>>> X -		    IO_NORMAL | IO_SYNC, cr);
>>>> X +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
>>>> X  		if (error != 0)
>>>> X -			vprint("ufs_direnter: failted to truncate", dvp);
>>>> X +			vprint("ufs_direnter: failed to truncate", dvp);

I keep looking at wrong versions.  I checked the old version and now see
another problem with this "failted" message (which you fixed).  It is
debugging code and shouldn't be printed at all.  Old versions ignored
errors from the truncation since the truncation is supposed to be optional
but that was broken for dirhash so r262812 added error handling.  If
the error handling actually works, then this becomes a non-error.

>> Some relevant code in ffs_truncate:

This was from an old versions.  Perhaps r181717.  FreeBSD-8 is
similar, but FreeBSD-9+ has most of my DOINGASYNC() additions and
-current has just 2 more of them than FreeBSD-9.

>> Y 	/*
>> Y 	 * Shorten the size of the file. If the file is not being
>> Y 	 * truncated to a block boundary, the contents of the
>> Y 	 * partial block following the end of the file must be
>> Y 	 * zero'ed in case it ever becomes accessible again because
>> Y 	 * of subsequent file growth. Directories however are not
>> Y 	 * zero'ed as they should grow back initialized to empty.
>> Y 	 */
>> Y 	offset = blkoff(fs, length);
>> Y 	if (offset == 0) {
>> Y 		ip->i_size = length;
>> Y 		DIP_SET(ip, i_size, length);
>> Y 	} else {
>> Y 		lbn = lblkno(fs, length);
>> Y 		flags |= BA_CLRBUF;
>> Y 		error = UFS_BALLOC(vp, length - 1, 1, cred, flags, &bp);
>> Y 		if (error) {
>> Y 			return (error);
>> Y 		}
>> Y 		/*
>> Y 		 * When we are doing soft updates and the UFS_BALLOC
>> Y 		 * above fills in a direct block hole with a full sized
>> Y 		 * block that will be truncated down to a fragment below,
>> Y 		 * we must flush out the block dependency with an FSYNC
>> Y 		 * so that we do not get a soft updates inconsistency
>> Y 		 * when we create the fragment below.
>> Y 		 */
>> Y 		if (DOINGSOFTDEP(vp) && lbn < NDADDR &&
>> Y 		    fragroundup(fs, blkoff(fs, length)) < fs->fs_bsize &&
>> Y 		    (error = ffs_syncvnode(vp, MNT_WAIT)) != 0)
>> Y 			return (error);
>> Y 		ip->i_size = length;
>> Y 		DIP_SET(ip, i_size, length);
>> Y 		size = blksize(fs, ip, lbn);
>> Y 		if (vp->v_type != VDIR)
>> Y 			bzero((char *)bp->b_data + offset,
>> Y 			    (u_int)(size - offset));
>> Y 		/* Kirk's code has reallocbuf(bp, size, 1) here */
>> Y 		allocbuf(bp, size);
>> Y 		if (bp->b_bufsize == fs->fs_bsize)
>> Y 			bp->b_flags |= B_CLUSTEROK;
>> Y 		if (flags & IO_SYNC)
>> Y 			bwrite(bp);
>> Y 		else
>> Y 			bawrite(bp);

FreeBSD-9+ already has my DOINGASYNC() fix here.  However, an async
write is still done when DOINGASYNC().  It is done by vtruncbuf() 50
lines after here.  vtruncbuf() doesn't know about DOINGASYNC().  It
turns delayed writes into unconditional async ones.

>> Y 	}
>>
>> I think we usually arrive here and honor the IO_SYNC flag.  This is correct.
>> Otherwise, we always do an async write, but that is wrong for async mounts.
>> Here is my old fix for this:
>>
>> Z diff -u2 ffs_inode.c~ ffs_inode.c
>> Z --- ffs_inode.c~	Wed Apr  7 21:22:26 2004
>> Z +++ ffs_inode.c	Sat Mar 23 01:23:16 2013
>> Z @@ -345,4 +431,6 @@
>> Z  		if (flags & IO_SYNC)
>> Z  			bwrite(bp);
>> Z +		else if (DOINGASYNC(ovp))
>> Z +			bdwrite(bp);
>> Z  		else
>> Z  			bawrite(bp);
>>
>> This fix must be sprinkled in most places where there is a bwrite()/
>> bawrite() decision.
> No, I do not think that it would be correct for SU mounts.  It is essential

SU silently ignores the async mount flag (by silently killing it instead
of ignoring it later), so the DOINGASYNC() checks don't affect it.

> for the correct operation of e.g. ffs_indirtrunc() that writes for SU
> case are synchronous, since no dependencies on the indirect block updates
> are recorded. The fact that syncvnode() is done before is similarly
> important, because no existing dependencies are cleared.
>
> On the other hand, I agree with the note that the final ffs_update()
> must honour IO_SYNC requests.
>
> Anyway, my point was that your patch does not change the hardest source
> of sync writes, only the write of the final block.  I will commit the
> following.

Er, it fixes all cases of directory shrinking for async mounts.

All cases should probably use watermarks and shrink at block or frag
boundaries instead of 512-boundaries.  E.g., for small directories,
shrink if size - endoff >= fs_fsize && <shrinking would leave at
least half of fs_fsize for expansion without adding another frag>.
WIth fs_fsize = 2K, this gives for example:
- size <= 2K: never shrink
- size nearly 4K but endoff between 1K and 2K: don't shrink, because
   shrinking would free a frag but not leave much space for expansion.

> diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
> index 0202820..50b456b 100644
> --- a/sys/ufs/ffs/ffs_inode.c
> +++ b/sys/ufs/ffs/ffs_inode.c
> @@ -610,7 +610,7 @@ extclean:
> 		softdep_journal_freeblocks(ip, cred, length, IO_EXT);
> 	else
> 		softdep_setup_freeblocks(ip, length, IO_EXT);
> -	return (ffs_update(vp, !DOINGASYNC(vp)));
> +	return (ffs_update(vp, (flags & IO_SYNC) != 0 || !DOINGASYNC(vp)));
> }
>
> /*

Oops, this needs fixing in my version, but in -current the fix has
little effect since in -current ffs_update() still dishonors the waitfor
flag for its bwrite()/bdwrite() decision if DOINGASYNC().  This is
essentially the same as dishonoring the IO_SYNC flag here.

ffs_update() needs the same fix in 4 more places.

> diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c
> index 43b4e5c..53536ff 100644
> --- a/sys/ufs/ufs/ufs_lookup.c
> +++ b/sys/ufs/ufs/ufs_lookup.c
> @@ -1131,7 +1131,7 @@ ufs_direnter(dvp, tvp, dirp, cnp, newdirbp, isrename)
> 		if (tvp != NULL)
> 			VOP_UNLOCK(tvp, 0);
> 		error = UFS_TRUNCATE(dvp, (off_t)dp->i_endoff,
> -		    IO_NORMAL | IO_SYNC, cr);
> +		    IO_NORMAL | (DOINGASYNC(dvp) ? 0 : IO_SYNC), cr);
> 		if (error != 0)
> 			vprint("ufs_direnter: failed to truncate", dvp);
> #ifdef UFS_DIRHASH
>

OK.  I want this to avoid _any_ sync writes here for async mounts even
after the excessive truncations are fixed.

Perhaps vtruncbuf() should just check the async mount flag to avoid
async writes (except possibly when buf_dirty_count_severe()).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160518035413.L4357>