Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Nov 2016 14:53:10 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r308026 - in head/sys: kern sys ufs/ffs
Message-ID:  <20161101125310.GD54029@kib.kiev.ua>
In-Reply-To: <20161101000246.GQ27748@FreeBSD.org>
References:  <201610281143.u9SBhxrN008547@repo.freebsd.org> <20161101000246.GQ27748@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 31, 2016 at 05:02:46PM -0700, Gleb Smirnoff wrote:
>   Hi,
> 
> On Fri, Oct 28, 2016 at 11:43:59AM +0000, Konstantin Belousov wrote:
> K> Author: kib
> K> Date: Fri Oct 28 11:43:59 2016
> K> New Revision: 308026
> K> URL: https://svnweb.freebsd.org/changeset/base/308026
> K> 
> K> Log:
> K>   Generalize UFS buffer pager to allow it serving other filesystems
> K>   which also use buffer cache.
> K>   
> K>   Most important addition to the code is the handling of filesystems
> K>   where the block size is less than the machine page size, which might
> K>   require reading several buffers to validate single page.
> K>   
> K>   Tested by:	pho
> K>   Sponsored by:	The FreeBSD Foundation
> K>   MFC after:	2 weeks
> K> 
> K> Modified:
> K>   head/sys/kern/vfs_bio.c
> K>   head/sys/sys/buf.h
> K>   head/sys/ufs/ffs/ffs_vnops.c
> K> 
> K> Modified: head/sys/kern/vfs_bio.c
> K> ==============================================================================
> K> --- head/sys/kern/vfs_bio.c	Fri Oct 28 11:35:06 2016	(r308025)
> K> +++ head/sys/kern/vfs_bio.c	Fri Oct 28 11:43:59 2016	(r308026)
> K> @@ -75,9 +75,10 @@ __FBSDID("$FreeBSD$");
> K>  #include <vm/vm.h>
> K>  #include <vm/vm_param.h>
> K>  #include <vm/vm_kern.h>
> K> -#include <vm/vm_pageout.h>
> K> -#include <vm/vm_page.h>
> K>  #include <vm/vm_object.h>
> K> +#include <vm/vm_page.h>
> K> +#include <vm/vm_pageout.h>
> K> +#include <vm/vm_pager.h>
> K>  #include <vm/vm_extern.h>
> K>  #include <vm/vm_map.h>
> K>  #include <vm/swap_pager.h>
> K> @@ -4636,6 +4637,161 @@ bdata2bio(struct buf *bp, struct bio *bi
> K>  	}
> K>  }
> K>  
> K> +static int buf_pager_relbuf;
> K> +SYSCTL_INT(_vfs, OID_AUTO, buf_pager_relbuf, CTLFLAG_RWTUN,
> K> +    &buf_pager_relbuf, 0,
> K> +    "Make buffer pager release buffers after reading");
> K> +
> K> +/*
> K> + * The buffer pager.  It uses buffer reads to validate pages.
> K> + *
> K> + * In contrast to the generic local pager from vm/vnode_pager.c, this
> K> + * pager correctly and easily handles volumes where the underlying
> K> + * device block size is greater than the machine page size.  The
> K> + * buffer cache transparently extends the requested page run to be
> K> + * aligned at the block boundary, and does the necessary bogus page
> K> + * replacements in the addends to avoid obliterating already valid
> K> + * pages.
> K> + *
> K> + * The only non-trivial issue is that the exclusive busy state for
> K> + * pages, which is assumed by the vm_pager_getpages() interface, is
> K> + * incompatible with the VMIO buffer cache's desire to share-busy the
> K> + * pages.  This function performs a trivial downgrade of the pages'
> K> + * state before reading buffers, and a less trivial upgrade from the
> K> + * shared-busy to excl-busy state after the read.
> 
> IMHO, should be noted that the pager ignores requested rbehind and rahead
> values, and does the rbehind and rahead sizes that he prefers.
Pager interface considers the ahead/behind pages' page-in as unsignificant,
in particular because the pages can be recycled or invalidated during the
pager operation, when pager drops the object lock.

More important, this pager de-facto uses the optimal filesystem-depended
aligned io size due to its structure, comparing with the bmap pager.
For this reason, I consider additional attempts to follow optional
upper-level hints not very useful.  Measurements show no difference in
the real workload times, and marginal improvements for microbenchmarks
(5% scale).

I might do something more aggressive when upper-level specified rahead is
(significantly) above the natural block size limit, like using breadn()
instead of bread().  Practice suggests that this would not help or even
be a pessimisation due to higher buf cache trashing.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161101125310.GD54029>