From owner-svn-src-all@freebsd.org Wed Nov 9 23:31:11 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29794C37B2E; Wed, 9 Nov 2016 23:31:11 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (glebi.us [96.95.210.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F296AA54; Wed, 9 Nov 2016 23:31:10 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.15.2/8.15.2) with ESMTPS id uA9NV9cl099629 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 9 Nov 2016 15:31:09 -0800 (PST) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebi.us (8.15.2/8.15.2/Submit) id uA9NV9Kk099628; Wed, 9 Nov 2016 15:31:09 -0800 (PST) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 9 Nov 2016 15:31:09 -0800 From: Gleb Smirnoff To: Konstantin Belousov Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r308026 - in head/sys: kern sys ufs/ffs Message-ID: <20161109233109.GY27748@FreeBSD.org> References: <201610281143.u9SBhxrN008547@repo.freebsd.org> <20161101000246.GQ27748@FreeBSD.org> <20161101125310.GD54029@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161101125310.GD54029@kib.kiev.ua> User-Agent: Mutt/1.7.0 (2016-08-17) X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2016 23:31:11 -0000 Konstantin, On Tue, Nov 01, 2016 at 02:53:10PM +0200, Konstantin Belousov wrote: K> > K> +static int buf_pager_relbuf; K> > K> +SYSCTL_INT(_vfs, OID_AUTO, buf_pager_relbuf, CTLFLAG_RWTUN, K> > K> + &buf_pager_relbuf, 0, K> > K> + "Make buffer pager release buffers after reading"); K> > K> + K> > K> +/* K> > K> + * The buffer pager. It uses buffer reads to validate pages. K> > K> + * K> > K> + * In contrast to the generic local pager from vm/vnode_pager.c, this K> > K> + * pager correctly and easily handles volumes where the underlying K> > K> + * device block size is greater than the machine page size. The K> > K> + * buffer cache transparently extends the requested page run to be K> > K> + * aligned at the block boundary, and does the necessary bogus page K> > K> + * replacements in the addends to avoid obliterating already valid K> > K> + * pages. K> > K> + * K> > K> + * The only non-trivial issue is that the exclusive busy state for K> > K> + * pages, which is assumed by the vm_pager_getpages() interface, is K> > K> + * incompatible with the VMIO buffer cache's desire to share-busy the K> > K> + * pages. This function performs a trivial downgrade of the pages' K> > K> + * state before reading buffers, and a less trivial upgrade from the K> > K> + * shared-busy to excl-busy state after the read. K> > K> > IMHO, should be noted that the pager ignores requested rbehind and rahead K> > values, and does the rbehind and rahead sizes that he prefers. K> Pager interface considers the ahead/behind pages' page-in as unsignificant, K> in particular because the pages can be recycled or invalidated during the K> pager operation, when pager drops the object lock. K> K> More important, this pager de-facto uses the optimal filesystem-depended K> aligned io size due to its structure, comparing with the bmap pager. K> For this reason, I consider additional attempts to follow optional K> upper-level hints not very useful. Measurements show no difference in K> the real workload times, and marginal improvements for microbenchmarks K> (5% scale). The buildworld isn't the only true workload. If we do readbehind or readahead we allocate pages for that, which means that some other pages need to be purged. There are cases, when the pager has absolutely no idea about what is optimal. So, not following hints from the upper layers is a bug. Note, that I don't ask you to fix it. I'm just asking to document that behaviour. K> I might do something more aggressive when upper-level specified rahead is K> (significantly) above the natural block size limit, like using breadn() K> instead of bread(). Practice suggests that this would not help or even K> be a pessimisation due to higher buf cache trashing. -- Totus tuus, Glebius.