From owner-freebsd-fs@freebsd.org Sun Mar 27 04:34:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2532EADF889 for ; Sun, 27 Mar 2016 04:34:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 154731D40 for ; Sun, 27 Mar 2016 04:34:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 11198ADF888; Sun, 27 Mar 2016 04:34:10 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0E7C7ADF887 for ; Sun, 27 Mar 2016 04:34:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 878151D3F for ; Sun, 27 Mar 2016 04:34:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c110-21-41-193.carlnfd1.nsw.optusnet.com.au (c110-21-41-193.carlnfd1.nsw.optusnet.com.au [110.21.41.193]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 705AAD60207 for ; Sun, 27 Mar 2016 15:34:00 +1100 (AEDT) Date: Sun, 27 Mar 2016 15:34:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: fs@freebsd.org Subject: nfs pessimized by vnode pages changes Message-ID: <20160327144755.Y4269@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0 a=73JWPhLeruqQCjN69UNZtQ==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=djkEBVEjuCfxqb0UTpoA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Mar 2016 04:34:10 -0000 I debugged another pessimization of nfs. ncl_getpages() is now almost always called with a count of 1 page, due to the change changing the count from faultcount to 1 in r292373 in vm_fault(). The only exception seems to be for the initial pagein for exec -- this is still normally the nfs rsize. ncl_getpages() doesn't do any readahead stuff like vnode_pager_generic_getpages() does, so it normally does read RPCs of size 1 page instead of the the nfs rsize. This gives the following increases in read RPCs for makeworld of an old world: - with rsize = 16K, from 24k to 39k (the worst case would be 4 times as many) - with rsize = 8K, from 39k to 44k (the worst case would be 2 times as many). Also, nfs_getpages() has buggy logic which works accidentally if the count is 1: X diff -c2 ./fs/nfsclient/nfs_clbio.c~ ./fs/nfsclient/nfs_clbio.c X *** ./fs/nfsclient/nfs_clbio.c~ Sun Mar 27 01:31:38 2016 X --- ./fs/nfsclient/nfs_clbio.c Sun Mar 27 02:35:32 2016 X *************** X *** 135,140 **** X */ X VM_OBJECT_WLOCK(object); X ! if (pages[npages - 1]->valid != 0 && --npages == 0) X goto out; X VM_OBJECT_WUNLOCK(object); X X --- 135,155 ---- X */ X VM_OBJECT_WLOCK(object); X ! #if 0 X ! /* This matches the comment. but doesn't work (has little effect). */ X ! if (pages[0]->valid != 0) X goto out; The comment still says that the code checks the requested page, but that is no longer passed to the function in a_reqpage. The first page is a better geuss of the requested page than the last one, but when npages is 1 these pages are the same. X + #else X + if (pages[0]->valid != 0) X + printf("ncl_getpages: page 0 valid; npages %d\n", npages); X + for (i = 0; i < npages; i++) X + if (pages[i]->valid != 0) X + printf("ncl_getpages: page %d valid; npages %d\n", X + i, npages); X + for (i = 0; i < npages; i++) X + if (pages[i]->valid != 0) X + npages = i; X + if (npages == 0) X + goto out; X + #endif Debugging and more forceful guessing code. This makes little difference except of course to spam the console. X VM_OBJECT_WUNLOCK(object); X X *************** X *** 199,202 **** X --- 214,220 ---- X KASSERT(m->dirty == 0, X ("nfs_getpages: page %p is dirty", m)); X + printf("ncl_getpages: partial page %d of %d %s\n", X + i, npages, X + pages[i]->valid != 0 ? "valid" : "invalid"); X } else { X /* X *************** X *** 210,215 **** X --- 228,239 ---- X */ X ; X + printf("ncl_getpages: short page %d of %d %s\n", X + i, npages, X + pages[i]->valid != 0 ? "valid" : "invalid"); X } X } X + for (i = 0; i < npages; i++) X + printf("ncl_getpages: page %d of %d %s\n", X + i, npages, pages[i]->valid != 0 ? "valid" : "invalid"); X out: X VM_OBJECT_WUNLOCK(object); Further debugging code. Similar debugging code in the old working version shows that normal operation for paging in a 15K file with an rsize of 16K is: - call here with npages = 4 - page in 3 full pages and 1 partial page using 1 RPC - call here again with npages = 1 for the partial page - use the optimization of returning early for this page -- don't do another RPC The buggy version does: - call here with npages = 1; page in 1 full page using 1 RPC - call here with npages = 1; page in 1 full page using 1 RPC - call here with npages = 1; page in 1 full page using 1 RPC - call here with npages = 1; page in 1 partial page using 1 RPC - call here again with npages = 1 for the partial page; the optimization works as before. The partial page isn't handled very well, but at least there is no extra physical i/o for it, at least if it is at EOF. vfs clustering handles partial pages even worse than this. Bruce