From owner-freebsd-fs@freebsd.org Sun Nov 1 21:00:41 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AED63A23F68 for ; Sun, 1 Nov 2015 21:00:41 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A25BB1F43 for ; Sun, 1 Nov 2015 21:00:41 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA1L0fqH077034 for ; Sun, 1 Nov 2015 21:00:41 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201511012100.tA1L0fqH077034@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 01 Nov 2015 21:00:41 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Nov 2015 21:00:41 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- New | 203492 | mount_unionfs -o below causes panic Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 4 problems total for which you should take action. From owner-freebsd-fs@freebsd.org Mon Nov 2 09:32:26 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 118B6A24CC5 for ; Mon, 2 Nov 2015 09:32:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id F0CAA1AB0 for ; Mon, 2 Nov 2015 09:32:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id F0614A24CC4; Mon, 2 Nov 2015 09:32:25 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F004CA24CC3 for ; Mon, 2 Nov 2015 09:32:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id BAC481AAE for ; Mon, 2 Nov 2015 09:32:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id A3035423A03 for ; Mon, 2 Nov 2015 20:06:32 +1100 (AEDT) Date: Mon, 2 Nov 2015 20:06:31 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: fs@freebsd.org Subject: an easy (?) question on namecache sizing Message-ID: <20151102193756.L1475@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=Lc8tAO_lvDoG1r83-oEA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 09:32:26 -0000 At least in old versions before cache_changesize() (should be nc_chsize()) existed, the name cache is supposed to have size about 2 * desiredvnodes, but its effective size seems to be only about desiredvnodes / 4? Why is this? This shows up in du -s on a large directory like /usr. Whenever the directory has more than about desiredvnodes / 4 entries under it, the namecache thrashes. The number of cached vnodes is also limited to about desiredvnodes / 4. The problem might actually be in vnode caching. Indeed, if all the data in the directory is read using tar cf /dev/zero, then at least if it all fits in the data and vnode caches , then the vnode cache starts working and caches more than desirevnodes / 4 files. The name caches then starts working and caches more than desirevnodes / 4 files too. The test directory had 6896 directories and 49643 files under it. With desiredvnodes = 123141, du -s caches only about 34000 vnodes. This is less than 48643 and the namecache thrashed with repeated du -s's. The vnode cache probably thrashed too, but this was not so easy to see. This was on an nfs client where it gave a slowdown of 20-30 times. On the server, desiredvnodes was only 70240 and only about 17000 vnodes were cached and the problem was not so evident (I think because the VMIO cache actually works when the data + metadata is not too large to fit in it; reconsituting vnodes from it wastes a lot of CPU but is not as slow as fetching the metadata again from a disk or network). Bruce From owner-freebsd-fs@freebsd.org Mon Nov 2 10:07:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3562A236F6 for ; Mon, 2 Nov 2015 10:07:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 922D9187E for ; Mon, 2 Nov 2015 10:07:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA2A7S8U007317 for ; Mon, 2 Nov 2015 10:07:28 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203201] on zfs extattr behaviour broken after unlink Date: Mon, 02 Nov 2015 10:07:28 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: Open X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 10:07:28 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203201 --- Comment #9 from commit-hook@freebsd.org --- A commit references this bug: Author: avg Date: Mon Nov 2 10:07:21 UTC 2015 New revision: 290266 URL: https://svnweb.freebsd.org/changeset/base/290266 Log: zfs: allow the lookup of extended attributes of an unlinked file That's required for extattr_get_fd(2) and the like to work properly. PR: 203201 MFC after: 17 days Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Mon Nov 2 10:07:55 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DE4CBA23755 for ; Mon, 2 Nov 2015 10:07:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CADBC1992 for ; Mon, 2 Nov 2015 10:07:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA2A7t4W007731 for ; Mon, 2 Nov 2015 10:07:55 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203201] on zfs extattr behaviour broken after unlink Date: Mon, 02 Nov 2015 10:07:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 10:07:56 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203201 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Status|Open |In Progress -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Mon Nov 2 10:09:34 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16D37A23868 for ; Mon, 2 Nov 2015 10:09:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0357A1D0E for ; Mon, 2 Nov 2015 10:09:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA2A9XBb009186 for ; Mon, 2 Nov 2015 10:09:33 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203201] on zfs extattr behaviour broken after unlink Date: Mon, 02 Nov 2015 10:09:33 +0000 X-Bugzilla-Reason: AssignedTo CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 10:09:34 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203201 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-fs@FreeBSD.org Assignee|freebsd-fs@FreeBSD.org |freebsd-bugs@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. From owner-freebsd-fs@freebsd.org Mon Nov 2 10:09:47 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DB97AA238AD for ; Mon, 2 Nov 2015 10:09:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C822B1DD0 for ; Mon, 2 Nov 2015 10:09:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA2A9lSS009430 for ; Mon, 2 Nov 2015 10:09:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 203201] on zfs extattr behaviour broken after unlink Date: Mon, 02 Nov 2015 10:09:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: avg@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 10:09:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203201 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |avg@FreeBSD.org -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@freebsd.org Mon Nov 2 11:30:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F158AA2301E for ; Mon, 2 Nov 2015 11:30:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id DAE0D1E01 for ; Mon, 2 Nov 2015 11:30:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id D949FA2301C; Mon, 2 Nov 2015 11:30:12 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8D17A2301B for ; Mon, 2 Nov 2015 11:30:12 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 80A8C1E00 for ; Mon, 2 Nov 2015 11:30:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 0D425D40A3F; Mon, 2 Nov 2015 22:29:56 +1100 (AEDT) Date: Mon, 2 Nov 2015 22:29:56 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing In-Reply-To: <20151102193756.L1475@besplex.bde.org> Message-ID: <20151102210750.S1908@besplex.bde.org> References: <20151102193756.L1475@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=HVedyWbAMsSLzbH5fYkA:9 a=O0TBwVC9xROfhtfE:21 a=WfTo9GhQe92Om4TV:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 11:30:13 -0000 On Mon, 2 Nov 2015, Bruce Evans wrote: > At least in old versions before cache_changesize() (should be nc_chsize()) > existed, the name cache is supposed to have size about 2 * desiredvnodes, > but its effective size seems to be only about desiredvnodes / 4? Why is > this? > > This shows up in du -s on a large directory like /usr. Whenever the > directory has more than about desiredvnodes / 4 entries under it, the > namecache thrashes. The number of cached vnodes is also limited to > about desiredvnodes / 4. > > The problem might actually be in vnode caching. ... This was easy to answer. The problem is in vnode caching. Its only relationship with the namecache is that if you increase the bogus vnode cache limit then cache_changesize() now adjusts the associated namecache limit to match but doesn't increases the associated non-bogus vnode catch limits to match. >From vfs_subr.c: X /* X * Number of vnodes we want to exist at any one time. This is mostly used X * to size hash tables in vnode-related code. It is normally not used in X * getnewvnode(), as wantfreevnodes is normally nonzero.) X * X * XXX desiredvnodes is historical cruft and should not exist. X */ X int desiredvnodes; I probably helped eivind write the XXX comment in 2000. I only just noticed the error in the main part of the comment. This is not the number of vnodes that we want to exist, but about 4 times that number. X .... X SYSCTL_PROC(_kern, KERN_MAXVNODES, maxvnodes, X CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, &desiredvnodes, 0, X sysctl_update_desiredvnodes, "I", "Maximum number of vnodes"); The maximum is not bogus. Only "desired" in the name is bogus. Note that the sysctl name doesn't say "desired". But systat -v still uses the raw variable name (abbreviated to 'desvn"). It is important for understanding systat -v output and the source code to know that this variable is actually the maximum and not the desired number, unlike what its name suggests. X SYSCTL_ULONG(_kern, OID_AUTO, minvnodes, CTLFLAG_RW, X &wantfreevnodes, 0, "Minimum number of vnodes (legacy)"); Further obfuscations. The desiredvnodes / 4 number number from here. This value really is the "wanted" or "desired" number of vnodes. The sysctl obfuscates it by renaming it to "minvnodes", it is not a minimum except in the sense that when the current number exceeds it, the vnrlu daemon tries (not very hard) to reduce this limit. The description of this sysctl as legacy is confusing. Perhaps the name of this sysctl is legacy, but its value is less legacy than that of desiredvnodes by any name. This is further obfuscated by exporting this variable twice (once here and once with its correct name under vfs). See below for an example of sysctl output. X ... X wantfreevnodes = desiredvnodes / 4; This is where the length used below is initialized. X ... X /* X * Attempt to keep the free list at wantfreevnodes length. X */ X static void X vnlru_free(int count) Misplaced comment. This function actually attempts to keep the list at a certain length that is decided elsewhere. X ... X static void X vnlru_proc(void) X { X ... X for (;;) { X kproc_suspend_check(p); X mtx_lock(&vnode_free_list_mtx); X if (freevnodes > wantfreevnodes) X vnlru_free(freevnodes - wantfreevnodes); This is where the length used above is passed. This length is defaulted by the above initialization and may be changed by either of the 2 sysctls for it. But the attempt usually fails, and then the vnode cache works better by growing nearly 4 times as large as is "wanted", up to nearly its "desired" size which is actually the limit on its size. In my du -s test, the attempt succeeds and breaks the caching almost perfectly when the number of files is slightly larger than wantfreevnodes = desiredvnodes / 4, but when the files are read the attempt fails and the caching works when the number of files is smaller than desiredvnodes. Reading the files is probably closer to normal operation. On freefall now: "sysctl -a | grep vnode" gives: kern.maxvnodes: 485993 kern.minvnodes: 121498 vfs.freevnodes: 121453 vfs.wantfreevnodes: 121498 vfs.vnodes_created: 360808607 vfs.numvnodes: 408313 Note that the current number is about 3.5 times as large as the "wanted" number. This shows that the attempts to reduce to the "wanted" number usually fail, so the cache is is amost 4 times as large as is "wanted". The kern values are limits, with a hard maximum and a soft minimum. The kern.minvnodes numbers is duplicated under its better name vfs.wantfreevnodes. The sysctl for desiredvnodes is now the SYSCTL_PROC() show above. The function for this now updates vfs_hash and the namecache, but not wantfreevnodes. Code earlier in vfs_subr.c shows that it is wantfreevnodes that is primary and its duplication for minvnodes really is legacy. It is vfs.wantvnodes that should be the SYSCTL_PROC(): X /* X * Free vnode target. Free vnodes may simply be files which have been stat'd X * but not read. This is somewhat common, and a small cache of such files X * should be kept to avoid recreation costs. X */ X static u_long wantfreevnodes; X SYSCTL_ULONG(_vfs, OID_AUTO, wantfreevnodes, CTLFLAG_RW, &wantfreevnodes, 0, ""); According to this, keeping vnodes as free but inactive for files that have been stat'ed but not read is intentional. The du -s example shows that this works almost perfectly as foot-shooting, except VMIO stops the foot being blown very far away. X /* Number of vnodes in the free list. */ X static u_long freevnodes; X SYSCTL_ULONG(_vfs, OID_AUTO, freevnodes, CTLFLAG_RD, &freevnodes, 0, X "Number of vnodes in the free list"); Data for the foot-shooting: kern.maxvnodes: 70000 kern.minvnodes: 30774 vfs.numvnodes: 38157 vfs.vnodes_created: 461556 vfs.wantfreevnodes: 30774 vfs.freevnodes: 30775 Here maxvnodes started at 4*30774 but I reduced it to 70000 for comparison with another system. I didn't reduce minvnodes from 30774 since neither I nor the sysctl knew about it. Then du -s on a directory tree with 49683 files gave the (now even more misconfigured) "wanted" number of free vnodes almost perfectly. 70000 - 30774 = 39276 is less than the number of files, so this asks for thrashing of the vnode and name caches. 38157 instead of 39726 vnodes were left cached. The default misconfiguration gives more mysterious numbers: kern.maxvnodes: 123096 kern.minvnodes: 30774 vfs.numvnodes: 38143 vfs.vnodes_created: 50136 vfs.wantfreevnodes: 30774 vfs.freevnodes: 30774 Now if 1/4 of maxvnodes can be forced to be free without limiting the number of non-free ones too much -- 3/4 can remain used; 3/4 of maxvnodes is about 90000 and that is plenty for caching about 49000 files. Apparently, the freeing is too active, so when there aren't many vnodes in use it reaches the target by discard useful vnodes. For the du -s access pattern with lots of stat'ed files, the total number of vnodes in use never grows large enough to justify freeing any. But on freefall or any system that has been up for a while doing a variety of tasks, the number of vnodes in use is large, so discarding some earlier than necessary works right. A large du -s then discards lots of old vnodes but not the ones that it looks at unless there are just too many. Bruce From owner-freebsd-fs@freebsd.org Mon Nov 2 12:42:37 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95447A24CC6 for ; Mon, 2 Nov 2015 12:42:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 7ACD41BD9 for ; Mon, 2 Nov 2015 12:42:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 79258A24CC5; Mon, 2 Nov 2015 12:42:37 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5EBF9A24CC4 for ; Mon, 2 Nov 2015 12:42:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 7BC331BD8 for ; Mon, 2 Nov 2015 12:42:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 528A21A073E; Mon, 2 Nov 2015 23:10:08 +1100 (AEDT) Date: Mon, 2 Nov 2015 23:10:07 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing In-Reply-To: <20151102210750.S1908@besplex.bde.org> Message-ID: <20151102224910.E2203@besplex.bde.org> References: <20151102193756.L1475@besplex.bde.org> <20151102210750.S1908@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=bIdjGDvwme-37HNbKVAA:9 a=yhAzks1SUNIc4Cd7:21 a=rElAy0kzJFS6DwtZ:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Nov 2015 12:42:37 -0000 On Mon, 2 Nov 2015, Bruce Evans wrote: > On Mon, 2 Nov 2015, Bruce Evans wrote: > >> At least in old versions before cache_changesize() (should be nc_chsize()) >> existed, the name cache is supposed to have size about 2 * desiredvnodes, >> but its effective size seems to be only about desiredvnodes / 4? Why is >> this? >> >> This shows up in du -s on a large directory like /usr. Whenever the >> directory has more than about desiredvnodes / 4 entries under it, the >> namecache thrashes. The number of cached vnodes is also limited to >> about desiredvnodes / 4. >> >> The problem might actually be in vnode caching. ... > > This was easy to answer. The problem is in vnode caching. Its only > ... > X .... > X SYSCTL_PROC(_kern, KERN_MAXVNODES, maxvnodes, > X CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, &desiredvnodes, 0, > X sysctl_update_desiredvnodes, "I", "Maximum number of vnodes"); > > The maximum is not bogus. Only "desired" in the name is bogus. Note > that the sysctl name doesn't say "desired". But systat -v still uses > the raw variable name (abbreviated to 'desvn"). It is important for > understanding systat -v output and the source code to know that this > variable is actually the maximum and not the desired number, unlike > what its name suggests. > > X SYSCTL_ULONG(_kern, OID_AUTO, minvnodes, CTLFLAG_RW, > X &wantfreevnodes, 0, "Minimum number of vnodes (legacy)"); > > Further obfuscations. The desiredvnodes / 4 number number from here. > This value really is the "wanted" or "desired" number of vnodes. > The sysctl obfuscates it by renaming it to "minvnodes", it is not > a minimum except in the sense that when the current number exceeds > it, the vnrlu daemon tries (not very hard) to reduce this limit. > The description of this sysctl as legacy is confusing. Perhaps the > name of this sysctl is legacy, but its value is less legacy than > that of desiredvnodes by any name. This is further obfuscated by > exporting this variable twice (once here and once with its correct > name under vfs). See below for an example of sysctl output. > > X ... > X wantfreevnodes = desiredvnodes / 4; > > This is where the length used below is initialized. In old versions, only desiredvnodes was bogus, and the policy works correctly. wantfreevnodes defaulted to only 25 and the goal of reaching that was independent of the goal of reaching the minimum number of vnodes. Now wantfreevnodes is ridiculously large (it is about 1 million when desiredvnodes is 4 million on freefall), and the goal of reaching a minimum number of vnodes is as confused in the code as in the variable names and is only reached accidentally. My version of some relevant code from old version (with some changes, but only to style): X int X getnewvnode(tag, mp, vops, vpp) X ... X /* X * Attempt to reuse a vnode already on the free list, allocating X * a new vnode if we can't find one or if we have not reached a X * good minimum for good LRU performance. X */ X if (freevnodes >= wantfreevnodes && numvnodes >= minvnodes) { Current version: X int X getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, X struct vnode **vpp) X { X ... X /* X * Lend our context to reclaim vnodes if they've exceeded the max. X */ X if (freevnodes > wantfreevnodes) X vnlru_free(1); This no longer attemps to reach a good minimum, and in fact only reaches one accidentally. Bruce From owner-freebsd-fs@freebsd.org Tue Nov 3 04:47:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50BBDA24AE3 for ; Tue, 3 Nov 2015 04:47:51 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 3CF92157C for ; Tue, 3 Nov 2015 04:47:51 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: by mailman.ysv.freebsd.org (Postfix) id 3B650A24AE1; Tue, 3 Nov 2015 04:47:51 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B059A24AE0 for ; Tue, 3 Nov 2015 04:47:51 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 05CB2157B for ; Tue, 3 Nov 2015 04:47:50 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tA34lo5O090332; Mon, 2 Nov 2015 20:47:50 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201511030447.tA34lo5O090332@chez.mckusick.com> From: Kirk McKusick To: Bruce Evans Subject: Re: an easy (?) question on namecache sizing cc: fs@freebsd.org In-reply-to: <20151102224910.E2203@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <90330.1446526070.1@chez.mckusick.com> Date: Mon, 02 Nov 2015 20:47:50 -0800 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2015 04:47:51 -0000 You seem to be proposing several approaches. One is to make wantfreevnodes bigger (half or three-quarters of the maximum). Another seems to be reverting to the previous (freevnodes >= wantfreevnodes && numvnodes >= minvnodes). So what is your proposed change? Kirk McKusick From owner-freebsd-fs@freebsd.org Tue Nov 3 09:04:56 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 03FAAA1E0F9 for ; Tue, 3 Nov 2015 09:04:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id DA18C1BE1 for ; Tue, 3 Nov 2015 09:04:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id DAA50A1E0F8; Tue, 3 Nov 2015 09:04:55 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA433A1E0F6 for ; Tue, 3 Nov 2015 09:04:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53BC81BDE for ; Tue, 3 Nov 2015 09:04:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tA394o5D080675 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 3 Nov 2015 11:04:50 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tA394o5D080675 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tA394m8T080673; Tue, 3 Nov 2015 11:04:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 3 Nov 2015 11:04:48 +0200 From: Konstantin Belousov To: Kirk McKusick Cc: Bruce Evans , fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing Message-ID: <20151103090448.GC2257@kib.kiev.ua> References: <20151102224910.E2203@besplex.bde.org> <201511030447.tA34lo5O090332@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201511030447.tA34lo5O090332@chez.mckusick.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2015 09:04:56 -0000 On Mon, Nov 02, 2015 at 08:47:50PM -0800, Kirk McKusick wrote: > You seem to be proposing several approaches. One is to make > wantfreevnodes bigger (half or three-quarters of the maximum). > Another seems to be reverting to the previous (freevnodes >= wantfreevnodes > && numvnodes >= minvnodes). So what is your proposed change? Free vnodes could be freed in the soft fashion by vnlru daemon, or in hard manner, by the getnewvnode(), when the max for the vnode count is reached. The 'soft' way skips vnodes which are directories, to make it more probable that vn_fullpath() would succeed, and also has threshold for the count of cached pages. The 'hard' way waits up to 1 sec for the vnlru daemon to succeed, before forcing a recycle for any vnode, regardless of the 'soft' stoppers. This causes the ticking behaviour of the system when only one vnode operation in single thread succeeds in a second. Large wantfreevnodes value is the safety measure to prevent the tick steps in practice. My initial reaction on the complain was to just suggest to increase desiredvnodes, at least this is what I do on machines where there is a lot of both KVA and memory and intensive file loads are expected. From owner-freebsd-fs@freebsd.org Tue Nov 3 10:17:27 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94BB6A23908 for ; Tue, 3 Nov 2015 10:17:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 7C710116E for ; Tue, 3 Nov 2015 10:17:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 79F42A23905; Tue, 3 Nov 2015 10:17:27 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5FE49A23902 for ; Tue, 3 Nov 2015 10:17:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 0D37D116D for ; Tue, 3 Nov 2015 10:17:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id BE1EF7841B4; Tue, 3 Nov 2015 21:17:15 +1100 (AEDT) Date: Tue, 3 Nov 2015 21:17:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Kirk McKusick cc: fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing In-Reply-To: <201511030447.tA34lo5O090332@chez.mckusick.com> Message-ID: <20151103173042.K1103@besplex.bde.org> References: <201511030447.tA34lo5O090332@chez.mckusick.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=--HVLufID-KFUJQOt9cA:9 a=OLDHnaEm-nmj-5Ct:21 a=wVcrO8NxHoPPZAWr:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2015 10:17:27 -0000 On Mon, 2 Nov 2015, Kirk McKusick wrote: > You seem to be proposing several approaches. One is to make > wantfreevnodes bigger (half or three-quarters of the maximum). > Another seems to be reverting to the previous (freevnodes >= wantfreevnodes > && numvnodes >= minvnodes). So what is your proposed change? For a quick fix, I will try: wantfreevnodes = current value (perhaps too large) minvnodes = maxvnodes = desiredevnodes with the old code that creates new vnodes up to maxvnodes instead of attempting to recycle old vnodes well below maxvnodes. Only one variable is needed for this, and the very old name desiredvnodes is best for this, but the separate variables are useful for trying variations. More dynamic configuration when the variables are changed is needed. Old versions of FreeBSD already have the separate variables, but changing them using SYSCTL_INT() doesn't work. E.g., reducing numvnodes below numvnodes doesn't eventually reduce numvnodes, but eventually causes deadlock. The old version in its default configuration didn't really work for du -s either. It appears to work initially, but it basically asks for thrashing through just 25 vnodes when only stat()s are done, so even ls -l /bin thrashes once the system has created 1/4 of its "desired" number of vnodes. The caches actually work initially because the silly limits are inactive initially. To see the silly behavior in an old version of FreeBSD: - read more than minvnodes files. This enters the silly region where the sully wantfreevnodes limit starts being applied - do a few du's and ls's in loops and watch them using systat -v. Verify that numvnodes > minvnodes is fairly stable and freevnodes <= wantfreevnodes = 25 (default) - choose any directory with > 25 files in it that has not been looked at before. Reapeat the previous test with du or ls -l on this directory. Observe that at least the namecache is broken (I see 11% hits for namecache and 51% for dircache for a directory with 31 entries (counting "." but not "..") - increase wantfreenodes to the number of entries in the directory (possibly counting both "." and "..") and repeat the previous test. Observe that this unbreaks at least the namecache. The old minvnodes limit (default desiredvnodes / 4) limited the silly behaviour to above that limit, but since numvnodes was (still is?) never reduced (even by unmount), the silly region is reached in normal operation (after reading lots of files) and is fairly sticky after that (unmount does help by creating lots of free vnodes and then it takes reading lots of files to reach the silly region again Now there is no minvnodes limit, and the wantfreevnodes defaults to the old minvnodes default. This gives slightly worse behaviour as than the old version with the default for wantfreevnodes changed from 25 to desiredvnodes / 4. 25 was far too small and desiredvnodes / 4 is probably too large for most purposes. However, desiredvnode can be enlarged to leave space for a larger than necessary wantfreevnodes, and the larger than necessary wantfreevnodes is sometimes useful. Many problems remain, especially for initialization. The old defaults worked perfectly for initialization up to the minvnodes limit. vnodes were never recycled below that. Now the default for wantfreevnodes gives an identical limit with different semantics. Silly caching for stat()ed files sometimes occurs below this limit instead of always occurring above this limit. E.g., soon after booting, numvnodes is about 200 and freevnodes is about 100. In the old version, all stat()s of new files increase both numvnodes and freevnodes until the limit is reached -- the caching works. In the current version, stat()s of new files cycle through the old free vnodes if possible -- the caching doesn't work, but instead thrashes especially well when freevnodes is small. It takes non-stat() accesses to files to increase maxvnodes. Eventually maxvnodes becomes large enough for freevnodes to also become large on average, so the cases with perfect thrashing become rare. But the special caching for stat() still gives a determistic thrashing case. That us when although wantfreevnodes is larger than necessary for most cases (and freevnodes is almost as large), it is not large enough to hold the current working set of stat()s, as can easily happen for tree walks. Sometimes you want to walk the tree more than once and know that it all should fit in caches, but the recycling makes the caches ineffective. I think there is a worse subscase of this, like the one for initialization. Suppose that freevnodes is small at the the start of a tree walk. Then I think the vnode caching prefers to recycle with this small number than to create new vnodes. Reading directories makes freevnodes even smaller. The quick fix is supposed to make the free vnodes management almost null. A non-quick fix would only turn it off when numvnodes < maxvnodes. When numvnodes >= maxvnodes, recycling is still bad if it is mostly through free vnodes and freevnodes is small. Here "small" is relative. Anything smaller than the working set is too small, but if the working set is too large to fit then it is better to let it thrash in a small part of the cache than in a lage part. I would try letting freevnodes grow to 3/4 of desiredvnodes for tree walks but try to keep it lower than 1/4 of desiredvnodes in normal use. The current limit of 1/4 of desiredvnodes works better on larger systems. Such systems might never reach the limits, especially with the slow ramp-up of numvnodes. E.g., ref11-amd64 is not nearly as large as freefall, but now has 11 users and has been up for 29 days; it still hasn't reached the maxvnodes limit: kern.maxvnodes: 621596 kern.minvnodes: 155399 vfs.freevnodes: 154789 vfs.wantfreevnodes: 155399 vfs.vnodes_created: 117384264 vfs.numvnodes: 537910 Most FreeBSD systems run big tree walks every night. This one has about 6M inodes in / and 900M inodes elsewhere. It would soon reach maxvnodes if it cached all of these. Limitiing it to 537910 instead of 621596 is not useful, but this type of automatic big tree walk where the results are rarely used shouldn't be allowed to thrash through more than a small fraction of maxvnodes. So it can't be automatic to go from a fraction of 1/4 to 3/4. Bruce From owner-freebsd-fs@freebsd.org Tue Nov 3 17:36:50 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABF6BA2556F for ; Tue, 3 Nov 2015 17:36:50 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 90286182A for ; Tue, 3 Nov 2015 17:36:50 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: by mailman.ysv.freebsd.org (Postfix) id 8E6F4A2556E; Tue, 3 Nov 2015 17:36:50 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E04CA2556D for ; Tue, 3 Nov 2015 17:36:50 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mail.michaelwlucas.com (mail.michaelwlucas.com [104.236.197.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2764F1829 for ; Tue, 3 Nov 2015 17:36:49 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mail.michaelwlucas.com (localhost [127.0.0.1]) by mail.michaelwlucas.com (8.15.2/8.15.2) with ESMTP id tA3HZQZX017332 for ; Tue, 3 Nov 2015 12:35:26 -0500 (EST) (envelope-from mwlucas@mail.michaelwlucas.com) Received: (from mwlucas@localhost) by mail.michaelwlucas.com (8.15.2/8.15.2/Submit) id tA3HZQRg017331 for fs@freebsd.org; Tue, 3 Nov 2015 12:35:26 -0500 (EST) (envelope-from mwlucas) Date: Tue, 3 Nov 2015 12:35:26 -0500 From: "Michael W. Lucas" To: fs@freebsd.org Subject: hast exec vs devd for handling CARP events Message-ID: <20151103173526.GA17299@mail.michaelwlucas.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail.michaelwlucas.com X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mail.michaelwlucas.com [127.0.0.1]); Tue, 03 Nov 2015 12:35:27 -0500 (EST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2015 17:36:50 -0000 Hi, There's lots of recipes out there for HAST failover based on devd. HAST also has the ability to run scripts on events, with the exec function in hast.conf. It *seems* it make more sense to have HAST mount filesystems and start processes when it claims the master role for a resource, as opposed to triggering that mount via a devd event and waiting for HAST to perform the switch. Thanks for any insight. I'm researching the "specialty filesystems" book, and want to give the best advice. ==ml -- Michael W. Lucas - mwlucas@michaelwlucas.com, Twitter @mwlauthor http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/ From owner-freebsd-fs@freebsd.org Wed Nov 4 16:41:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89ADDA2662F for ; Wed, 4 Nov 2015 16:41:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 75DDB1197 for ; Wed, 4 Nov 2015 16:41:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA4GfLnp035404 for ; Wed, 4 Nov 2015 16:41:21 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204254] page fault kernel panic on ZFS operations Date: Wed, 04 Nov 2015 16:41:21 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2015 16:41:21 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204254 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-fs@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Nov 4 17:04:13 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D14CA26F21 for ; Wed, 4 Nov 2015 17:04:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 38A6F17AE for ; Wed, 4 Nov 2015 17:04:13 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA4H4D9L017035 for ; Wed, 4 Nov 2015 17:04:13 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204037] Poudriere panics zfs with 'solaris assert: zrl->zr_refcount == 0 (0x6 == 0x0)' Date: Wed, 04 Nov 2015 17:04:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: kib@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2015 17:04:13 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204037 Konstantin Belousov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kib@FreeBSD.org --- Comment #2 from Konstantin Belousov --- Created attachment 162782 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=162782&action=edit Possible fix -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Nov 4 17:28:46 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07877A262D8 for ; Wed, 4 Nov 2015 17:28:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E796711FC for ; Wed, 4 Nov 2015 17:28:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA4HSjAW062736 for ; Wed, 4 Nov 2015 17:28:45 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204037] Poudriere panics zfs with 'solaris assert: zrl->zr_refcount == 0 (0x6 == 0x0)' Date: Wed, 04 Nov 2015 17:28:45 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: will@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2015 17:28:46 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204037 Will Andrews changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |will@FreeBSD.org --- Comment #3 from Will Andrews --- I filed an issue against zrlock a while ago: https://www.illumos.org/issues/3746 Here is a rebased version of the original patch against FreeBSD/head: http://people.freebsd.org/~will/patches/zrlock-fix-atomicity.diff I haven't tested this specific version of the patch, but another version of it has been in our tree in production for 3 years. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Nov 4 19:49:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24B69A2626D for ; Wed, 4 Nov 2015 19:49:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0B5BB1D81 for ; Wed, 4 Nov 2015 19:49:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 079A3A2626C; Wed, 4 Nov 2015 19:49:21 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 062EAA26269 for ; Wed, 4 Nov 2015 19:49:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id A506C1D80 for ; Wed, 4 Nov 2015 19:49:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 1985F3C576F; Thu, 5 Nov 2015 06:49:16 +1100 (AEDT) Date: Thu, 5 Nov 2015 06:49:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Kirk McKusick , Bruce Evans , fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing In-Reply-To: <20151103090448.GC2257@kib.kiev.ua> Message-ID: <20151105043607.K3175@besplex.bde.org> References: <20151102224910.E2203@besplex.bde.org> <201511030447.tA34lo5O090332@chez.mckusick.com> <20151103090448.GC2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=jm51WUJoxU2LgMh7JXUA:9 a=bnPOoLb-eUIJAd_c:21 a=zAURBNcse8cAlum9:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Nov 2015 19:49:21 -0000 On Tue, 3 Nov 2015, Konstantin Belousov wrote: > On Mon, Nov 02, 2015 at 08:47:50PM -0800, Kirk McKusick wrote: >> You seem to be proposing several approaches. One is to make >> wantfreevnodes bigger (half or three-quarters of the maximum). >> Another seems to be reverting to the previous (freevnodes >= wantfreevnodes >> && numvnodes >= minvnodes). So what is your proposed change? > > Free vnodes could be freed in the soft fashion by vnlru daemon, or in > hard manner, by the getnewvnode(), when the max for the vnode count is > reached. The 'soft' way skips vnodes which are directories, to make it > more probable that vn_fullpath() would succeed, and also has threshold > for the count of cached pages. The 'hard' way waits up to 1 sec for > the vnlru daemon to succeed, before forcing a recycle for any vnode, > regardless of the 'soft' stoppers. This causes the ticking behaviour of > the system when only one vnode operation in single thread succeeds in a > second. This ticking behaviour seems to mostly wrong. numvnodes is allowed to grow to its nominal limit (desiredvnodes) in most places (except an off by 1 error allows it to grow 1 larger), but vnrlu_proc() uses a limit of 9/10 of that (without the off by 1 error). So the usual behaviour is to grow a little larger than the 9/10 limit, then churn by reclaiming too much to significantly below the 9/10 limit, then build up to above the 9/10 again... I didn't see the 1 vnode per second behaviour, except when I set desiredvnodes to a values like 1, 10 or 100 I saw very interesting behaviours. A little larger than 100 works almost normally, but 10 should give 1 vnode per second since 1 is 1/10 of 10. I think it actually gives 2 vnodes per second due to the off by 1 error. I need to know if the churn is intentional before removing it completely. See the comment below. > Large wantfreevnodes value is the safety measure to prevent the tick > steps in practice. My initial reaction on the complain was to just > suggest to increase desiredvnodes, at least this is what I do on > machines where there is a lot of both KVA and memory and intensive file > loads are expected. Yes, desiredvnodes should be sized large enough to allow space for plenty of "free" vnodes, but there are some problems with this: - non-problem: dynamic resizing of all the limits now works quite well. Better than I said earlier. -current ramps up both the total and free vnodes better than I said earlier - not many people know about this magic - i386's don't have enough KVA for much enlargement. The defaults here give 70K for desiredvnodes with 1GB of RAM and 123K with 3G of usable RAM. Even 70K is really too many. Memory use for each vnode seems to be unreasonably large (more than 1K average even when most vnodes are free). That gives about 100 MB of wired memory. - there are some bugs with large numbers of vnodes. 300K of them almost fit and almost work on i386. However: - in old versions of FreeBSD, perhaps with VM_BCACHE_SIZE_MAX larger than the default, null pointer panics occur - at least in -current, uma is reluctant to free vnode allocations. Reducing desiredvnodes now works (on the next tick) to reduce numvnodes, but uma doesn't free anything, since we don't ask it to. I think it will free later if memory becomes short - in -current, there is a bug freeing memory when memory becomes short. I saw this for cached data, not vnodes. Most memory was "Inact" for VMIO, with a very large number of non-free but non-referenced vnodes referencing the cached data. Then the tar programs used to set this up get killed by an out of memory error (IIRC, the kernel one, not malloc failure). I don't use swap but haven't seen this error for 10-15 years except in much larger programs. Here is my work in progress: X diff -u2 vfs_subr.c~ vfs_subr.c X --- vfs_subr.c~ 2015-09-28 06:29:43.000000000 +0000 X +++ vfs_subr.c 2015-11-04 14:25:42.476369000 +0000 X @@ -147,14 +147,15 @@ X X /* X - * Free vnode target. Free vnodes may simply be files which have been stat'd X - * but not read. This is somewhat common, and a small cache of such files X - * should be kept to avoid recreation costs. X + * "Free" vnode target. Free vnodes are rarely actually free. Usually X + * they are for files which have been stat'd but not read; these usually X + * have inode and namecache data attached to them. A large cache of such X + * vnodes should be kept to minimise recreation costs. X */ X static u_long wantfreevnodes; X -SYSCTL_ULONG(_vfs, OID_AUTO, wantfreevnodes, CTLFLAG_RW, &wantfreevnodes, 0, ""); X -/* Number of vnodes in the free list. */ X +SYSCTL_ULONG(_vfs, OID_AUTO, wantfreevnodes, CTLFLAG_RW, X + &wantfreevnodes, 0, "Target for minimum number of \"free\" vnodes"); X static u_long freevnodes; X -SYSCTL_ULONG(_vfs, OID_AUTO, freevnodes, CTLFLAG_RD, &freevnodes, 0, X - "Number of vnodes in the free list"); X +SYSCTL_ULONG(_vfs, OID_AUTO, freevnodes, CTLFLAG_RD, X + &freevnodes, 0, "Number of \"free\" vnodes"); X X static int vlru_allow_cache_src; The comments were very out of date. The cache needs to be large (thousands), or small (25). Also improve descriptions and fix style bugs in sysctls. X @@ -275,9 +276,9 @@ X X /* X - * Number of vnodes we want to exist at any one time. This is mostly used X - * to size hash tables in vnode-related code. It is normally not used in X - * getnewvnode(), as wantfreevnodes is normally nonzero.) X + * Target for maximum number of vnodes. X * X - * XXX desiredvnodes is historical cruft and should not exist. X + * XXX "desired" is a not very good historical name for this. We only X + * desire to use the full maximum under loads where caching vnodes is X + * the best use of memory resources. X */ X int desiredvnodes; This was so out of date that it was almost correct again except for the parts saying that it was out of date. X @@ -292,4 +293,5 @@ X return (error); X if (old_desiredvnodes != desiredvnodes) { X + wantfreevnodes = desiredvnodes / 4; X vfs_hash_changesize(desiredvnodes); X cache_changesize(desiredvnodes); Keep this in sync. I considered telling uma the new allocation size here (also in init), but that shouldn't be needed and would need to be done elsewhere if done. desiredvnodes is a soft limit so uma should not be limited to that, and this is too early to reduce. It would be OK for vnrlu_proc() to tell uma to reduce to the current active allocation. X @@ -300,7 +302,7 @@ X SYSCTL_PROC(_kern, KERN_MAXVNODES, maxvnodes, X CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, &desiredvnodes, 0, X - sysctl_update_desiredvnodes, "I", "Maximum number of vnodes"); X + sysctl_update_desiredvnodes, "I", "Target for maximum number of vnodes"); X SYSCTL_ULONG(_kern, OID_AUTO, minvnodes, CTLFLAG_RW, X - &wantfreevnodes, 0, "Minimum number of vnodes (legacy)"); X + &wantfreevnodes, 0, "Old name for vfs.wantfreevnodes (legacy)"); X static int vnlru_nowhere; X SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, X @@ -844,5 +846,5 @@ X X /* X - * Attempt to keep the free list at wantfreevnodes length. X + * Attempt to reduce the free list by the requested amount. X */ X static void X @@ -922,7 +924,31 @@ X kproc_suspend_check(p); X mtx_lock(&vnode_free_list_mtx); X - if (freevnodes > wantfreevnodes) X - vnlru_free(freevnodes - wantfreevnodes); X - if (numvnodes <= desiredvnodes * 9 / 10) { X + if (numvnodes > desiredvnodes && X + freevnodes > wantfreevnodes) X + vnlru_free(ulmin(numvnodes - desiredvnodes, X + freevnodes - wantfreevnodes)); Magic related to the 9/10 factor is not removed by this, but moved below and changed. This just lets freevnodes grow larger than wantfreevnodes if it safely can. X + /* X + * Sleep if there are not too many allocated vnodes or X + * exactly the right number of allocated vnodes but too X + * few free vnodes. When below the target maximum, we X + * will prefer to allocate a new vnode to reusing a "free" X + * vnode. When at the target maximum, we prefer to discard X + * some non-free vnodes now to cycling through a too-small X + * number of free vnodes later. When above the target X + * maximum, we must discard some vnodes now to reduce to X + * the maximum. The discarding here is not smart and tends X + * to discard too many vnodes with wrong choices, but it is X + * good enough since it rarely done. Occasional excessive X + * random discarding may even be a feature for evolution. X + * X + * Old versions did even more discarding here, by allowing X + * the target maximum to be reached elsewhere but reducing X + * to 9/10 of that here. I think this was intended to X + * reduce activity here, but actually gave the bug of the X + * target maximum being 9/10 of its requested value, and X + * the evolutionary feature. X + */ X + if (numvnodes < desiredvnodes || (numvnodes == desiredvnodes && X + freevnodes >= wantfreevnodes)) { X vnlruproc_sig = 0; X wakeup(&vnlruproc_sig); The comment describes this too verbosely. I plan to remove the numvnodes == desiredvnodes subcase here, but limit numvnodes elsewhere if necessary to allow room for expansion of freevnodes up to wantfreevnodes without numvnodes growing too large. The above gives churning much like before, but only above the full limit instead of above 9/10 of that, or in the subcase. The 9/10 limit is often exceeded, but the new ones rarely are. The planned version will make the subcase unreachable. X @@ -1031,5 +1057,5 @@ X X /* X - * Wait for available vnodes. X + * Wait if necessary for a new vnode to become available. X */ X static int X @@ -1038,12 +1064,11 @@ X X mtx_assert(&vnode_free_list_mtx, MA_OWNED); X - if (numvnodes > desiredvnodes) { X + if (numvnodes >= desiredvnodes) { Fix off by 1 error. This allowed creation of 1 more than the max. This bug was masked by the 9/10 magic reducing the actual limit to a smaller fuzzier value. E.g., a limit of 100000 allows 100001 here, but the actual limit is 90000 in vnrlu_proc(). It is still 100001 here, but it is hard to reach much above 90000 after reducing to less than that. X if (suspended) { X /* X - * File system is beeing suspended, we cannot risk a X - * deadlock here, so allocate new vnode anyway. X + * The file system is being suspended. We cannot X + * risk a deadlock here, so allow allocation of X + * another vnode even if this would give too many. X */ X - if (freevnodes > wantfreevnodes) X - vnlru_free(freevnodes - wantfreevnodes); Fix spelling and improve wording in comment. Remove some code. The caller can do it and now does. X return (0); X } X @@ -1055,5 +1080,5 @@ X "vlruwk", hz); X } X - return (numvnodes > desiredvnodes ? ENFILE : 0); X + return (numvnodes >= desiredvnodes ? ENFILE : 0); Fix off by 1 error. X } X X @@ -1113,8 +1138,10 @@ X mtx_lock(&vnode_free_list_mtx); X /* X - * Lend our context to reclaim vnodes if they've exceeded the max. X + * Lend our context to reclaim vnodes if one more would be too many X + * and there are plenty to reclaim. X */ X - if (freevnodes > wantfreevnodes) X - vnlru_free(1); X + if (numvnodes >= desiredvnodes && freevnodes > wantfreevnodes) X + vnlru_free(ulmin(numvnodes + 1 - desiredvnodes, X + freevnodes - wantfreevnodes)); Policy change. This is supposed to be the same as above except for adjustments to add 1 for the new vnode, but is buggy. It should do: - if numvnodes < desiredvnodes, don't use the free list - if numvnodes > desiredvnodes (rare), always use the free list if it is nonempty - if numvnodes == desiredvnodes (usual), use the free list if it is large enough (perhaps just nonempty, or the old threshold of 25) Then if the free list ends up below wantvnodes, depend on the churn in vnrlu_proc() to grow it. Since the condition for waiting (with the old off by 1 error or the above bugs) rarely occurs, the churn normally only occurs once per second. That limits its damage but also limits the rate of growing the free list. Planned version: - if numvnodes - freevnodes >= desiredvnodes - wantfreevnodes, then the cache is becoming unbalanced with low chances of recovery (it has too many non-free vnodes and not enough space to add free ones). Then wait more often here, as in the above numvnodes == desiredvnodes case. Whenever we wait, we wake up vnlru_proc(). We must do this more often than before to grow the free list, but not too often. It does the right things except for also shrinking the free list. X error = getnewvnode_wait(mp != NULL && (mp->mnt_kern_flag & X MNTK_SUSPEND)); This doesn't really fix the original problem. du -s now works better starting with a cold cache by allowing freevnodes to grow much larger than wantfreevnodes while numvnodes is not near its limit. But when warm, numvnodes is near its limit and freevnodes may grow small. Then the modifications are to attempt to grow freevnodes up to wantvnodes, but it would be unreasonable to discard many non-free vnodes to grow freevnodes larger than that. The churning might do that accidentally, after perhaps too long, if the usage pattern changes from mostly reads to mosty stats. Bruce From owner-freebsd-fs@freebsd.org Thu Nov 5 11:48:49 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 011C0A2516C for ; Thu, 5 Nov 2015 11:48:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D73491CF2 for ; Thu, 5 Nov 2015 11:48:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA5BmmEX036424 for ; Thu, 5 Nov 2015 11:48:48 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204037] Poudriere panics zfs with 'solaris assert: zrl->zr_refcount == 0 (0x6 == 0x0)' Date: Thu, 05 Nov 2015 11:48:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: Andrew@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: Andrew@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 11:48:49 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204037 Andrew Turner changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-fs@FreeBSD.org |Andrew@FreeBSD.org --- Comment #4 from Andrew Turner --- I think the issue is atomic_cas_32 is broken on arm64. We use the version in sys/cddl/compat/opensolaris/kern/opensolaris_atomic.c when we need to write it in assembler to use the correct atomic instructions. I've hacked in the code from atomic_cmpset_32 for testing. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Thu Nov 5 18:56:57 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5E233A264F4 for ; Thu, 5 Nov 2015 18:56:57 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 496951B6C for ; Thu, 5 Nov 2015 18:56:57 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: by mailman.ysv.freebsd.org (Postfix) id 471B4A264F0; Thu, 5 Nov 2015 18:56:57 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45CA6A264EF for ; Thu, 5 Nov 2015 18:56:57 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2B0A61B6A for ; Thu, 5 Nov 2015 18:56:57 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tA5Iutge064958; Thu, 5 Nov 2015 10:56:55 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201511051856.tA5Iutge064958@chez.mckusick.com> From: Kirk McKusick To: Konstantin Belousov Subject: Re: an easy (?) question on namecache sizing cc: Bruce Evans , fs@freebsd.org In-reply-to: <20151103090448.GC2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <64956.1446749815.1@chez.mckusick.com> Date: Thu, 05 Nov 2015 10:56:55 -0800 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 18:56:57 -0000 > Date: Tue, 3 Nov 2015 11:04:48 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Subject: Re: an easy (?) question on namecache sizing > > Free vnodes could be freed in the soft fashion by vnlru daemon, or in > hard manner, by the getnewvnode(), when the max for the vnode count is > reached. The 'soft' way skips vnodes which are directories, to make it > more probable that vn_fullpath() would succeed, and also has threshold > for the count of cached pages. The 'hard' way waits up to 1 sec for > the vnlru daemon to succeed, before forcing a recycle for any vnode, > regardless of the 'soft' stoppers. This causes the ticking behaviour of > the system when only one vnode operation in single thread succeeds in a > second. > > Large wantfreevnodes value is the safety measure to prevent the tick > steps in practice. My initial reaction on the complain was to just > suggest to increase desiredvnodes, at least this is what I do on > machines where there is a lot of both KVA and memory and intensive file > loads are expected. I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() so that it tracks the change in desiredvnodes: Index: /sys/kern/vfs_subr.c =================================================================== --- /sys/kern/vfs_subr.c (revision 290387) +++ /sys/kern/vfs_subr.c (working copy) @@ -293,6 +293,7 @@ if (old_desiredvnodes != desiredvnodes) { vfs_hash_changesize(desiredvnodes); cache_changesize(desiredvnodes); + wantfreevnodes = desiredvnodes / 4; } return (0); } Otherwise bumping up desiredvnodes will be less effective than expected. I see that Bruce has also suggested this change in his more extensive revisions. Kirk McKusick From owner-freebsd-fs@freebsd.org Thu Nov 5 19:56:59 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E788AA273EA for ; Thu, 5 Nov 2015 19:56:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id CB6C4112D for ; Thu, 5 Nov 2015 19:56:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id CAB17A273E9; Thu, 5 Nov 2015 19:56:58 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA487A273E8 for ; Thu, 5 Nov 2015 19:56:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5D09B112C for ; Thu, 5 Nov 2015 19:56:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tA5JuqAp001976 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 5 Nov 2015 21:56:52 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tA5JuqAp001976 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tA5JumUg001974; Thu, 5 Nov 2015 21:56:49 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 5 Nov 2015 21:56:48 +0200 From: Konstantin Belousov To: Kirk McKusick Cc: Bruce Evans , fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing Message-ID: <20151105195648.GK2257@kib.kiev.ua> References: <20151103090448.GC2257@kib.kiev.ua> <201511051856.tA5Iutge064958@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201511051856.tA5Iutge064958@chez.mckusick.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 19:56:59 -0000 On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote: > > Date: Tue, 3 Nov 2015 11:04:48 +0200 > > From: Konstantin Belousov > > To: Kirk McKusick > > Subject: Re: an easy (?) question on namecache sizing > > > > Free vnodes could be freed in the soft fashion by vnlru daemon, or in > > hard manner, by the getnewvnode(), when the max for the vnode count is > > reached. The 'soft' way skips vnodes which are directories, to make it > > more probable that vn_fullpath() would succeed, and also has threshold > > for the count of cached pages. The 'hard' way waits up to 1 sec for > > the vnlru daemon to succeed, before forcing a recycle for any vnode, > > regardless of the 'soft' stoppers. This causes the ticking behaviour of > > the system when only one vnode operation in single thread succeeds in a > > second. > > > > Large wantfreevnodes value is the safety measure to prevent the tick > > steps in practice. My initial reaction on the complain was to just > > suggest to increase desiredvnodes, at least this is what I do on > > machines where there is a lot of both KVA and memory and intensive file > > loads are expected. > > I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() > so that it tracks the change in desiredvnodes: > > Index: /sys/kern/vfs_subr.c > =================================================================== > --- /sys/kern/vfs_subr.c (revision 290387) > +++ /sys/kern/vfs_subr.c (working copy) > @@ -293,6 +293,7 @@ > if (old_desiredvnodes != desiredvnodes) { > vfs_hash_changesize(desiredvnodes); > cache_changesize(desiredvnodes); > + wantfreevnodes = desiredvnodes / 4; > } > return (0); > } > > Otherwise bumping up desiredvnodes will be less effective than expected. > > I see that Bruce has also suggested this change in his more extensive > revisions. I think the idea is right, but the implementation is not. Just changing wantfreevnodes after desirevnodes was reduced, creates a window where an other thread could see small value for desiredvnodes, but large value for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should ensure that the observable values are non-contradictory. From owner-freebsd-fs@freebsd.org Thu Nov 5 20:25:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72B47A278F1 for ; Thu, 5 Nov 2015 20:25:40 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 5F0F511A7 for ; Thu, 5 Nov 2015 20:25:40 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: by mailman.ysv.freebsd.org (Postfix) id 5BCA6A278F0; Thu, 5 Nov 2015 20:25:40 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B623A278EF for ; Thu, 5 Nov 2015 20:25:40 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 409A111A6 for ; Thu, 5 Nov 2015 20:25:40 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tA5KPcLF066724; Thu, 5 Nov 2015 12:25:38 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201511052025.tA5KPcLF066724@chez.mckusick.com> From: Kirk McKusick To: Konstantin Belousov Subject: Re: an easy (?) question on namecache sizing cc: fs@freebsd.org In-reply-to: <20151105195648.GK2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <66722.1446755138.1@chez.mckusick.com> Date: Thu, 05 Nov 2015 12:25:38 -0800 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 20:25:40 -0000 > Date: Thu, 5 Nov 2015 21:56:48 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Subject: Re: an easy (?) question on namecache sizing > Cc: fs@freebsd.org > > On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote: >> >> I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() >> so that it tracks the change in desiredvnodes: >> >> Index: /sys/kern/vfs_subr.c >> =================================================================== >> --- /sys/kern/vfs_subr.c (revision 290387) >> +++ /sys/kern/vfs_subr.c (working copy) >> @@ -293,6 +293,7 @@ >> if (old_desiredvnodes != desiredvnodes) { >> + wantfreevnodes = desiredvnodes / 4; >> vfs_hash_changesize(desiredvnodes); >> cache_changesize(desiredvnodes); >> } >> return (0); >> } >> >> Otherwise bumping up desiredvnodes will be less effective than expected. >> >> I see that Bruce has also suggested this change in his more extensive >> revisions. > > I think the idea is right, but the implementation is not. Just changing > wantfreevnodes after desirevnodes was reduced, creates a window where an > other thread could see small value for desiredvnodes, but large value > for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should > ensure that the observable values are non-contradictory. Does moving the setting of wantfreevnodes before the cache size changes (as redone above) close the window enough? The vlrureclaim() function operates slowly enough that a brief period of inconsistency seems unimportant. Changing desiredvnodes happens very rarely. And at the moment we are not correcting wantfreevnodes at all. Or am I missing some key point? Kirk McKusick From owner-freebsd-fs@freebsd.org Thu Nov 5 20:53:01 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E04F7A27F55 for ; Thu, 5 Nov 2015 20:53:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C4FED1246 for ; Thu, 5 Nov 2015 20:53:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id C33C0A27F54; Thu, 5 Nov 2015 20:53:00 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C2D3FA27F52 for ; Thu, 5 Nov 2015 20:53:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D9421243 for ; Thu, 5 Nov 2015 20:53:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tA5KqtHJ014920 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 5 Nov 2015 22:52:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tA5KqtHJ014920 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tA5Kqt2m014919; Thu, 5 Nov 2015 22:52:55 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 5 Nov 2015 22:52:55 +0200 From: Konstantin Belousov To: Kirk McKusick Cc: fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing Message-ID: <20151105205255.GL2257@kib.kiev.ua> References: <20151105195648.GK2257@kib.kiev.ua> <201511052025.tA5KPcLF066724@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201511052025.tA5KPcLF066724@chez.mckusick.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 20:53:01 -0000 On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote: > > Date: Thu, 5 Nov 2015 21:56:48 +0200 > > From: Konstantin Belousov > > To: Kirk McKusick > > Subject: Re: an easy (?) question on namecache sizing > > Cc: fs@freebsd.org > > > > On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote: > >> > >> I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() > >> so that it tracks the change in desiredvnodes: > >> > >> Index: /sys/kern/vfs_subr.c > >> =================================================================== > >> --- /sys/kern/vfs_subr.c (revision 290387) > >> +++ /sys/kern/vfs_subr.c (working copy) > >> @@ -293,6 +293,7 @@ > >> if (old_desiredvnodes != desiredvnodes) { > >> + wantfreevnodes = desiredvnodes / 4; > >> vfs_hash_changesize(desiredvnodes); > >> cache_changesize(desiredvnodes); > >> } > >> return (0); > >> } > >> > >> Otherwise bumping up desiredvnodes will be less effective than expected. > >> > >> I see that Bruce has also suggested this change in his more extensive > >> revisions. > > > > I think the idea is right, but the implementation is not. Just changing > > wantfreevnodes after desirevnodes was reduced, creates a window where an > > other thread could see small value for desiredvnodes, but large value > > for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should > > ensure that the observable values are non-contradictory. > > Does moving the setting of wantfreevnodes before the cache size changes > (as redone above) close the window enough? The vlrureclaim() function > operates slowly enough that a brief period of inconsistency seems > unimportant. Changing desiredvnodes happens very rarely. And at the moment > we are not correcting wantfreevnodes at all. Or am I missing some key point? I think wantfreevnodes should be set before the cache size changes when desiredvnodes is decreased, but kept at the place in your patch for the increasing case. From owner-freebsd-fs@freebsd.org Thu Nov 5 21:07:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F811A27215 for ; Thu, 5 Nov 2015 21:07:28 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 3B6B51BCC for ; Thu, 5 Nov 2015 21:07:28 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: by mailman.ysv.freebsd.org (Postfix) id 37917A27214; Thu, 5 Nov 2015 21:07:28 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37223A27213 for ; Thu, 5 Nov 2015 21:07:28 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 198841BCA for ; Thu, 5 Nov 2015 21:07:28 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tA5L7QBB067460; Thu, 5 Nov 2015 13:07:26 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201511052107.tA5L7QBB067460@chez.mckusick.com> From: Kirk McKusick To: Konstantin Belousov Subject: Re: an easy (?) question on namecache sizing cc: fs@freebsd.org In-reply-to: <20151105205255.GL2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <67458.1446757646.1@chez.mckusick.com> Content-Transfer-Encoding: quoted-printable Date: Thu, 05 Nov 2015 13:07:26 -0800 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 21:07:28 -0000 > Date: Thu, 5 Nov 2015 22:52:55 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Subject: Re: an easy (?) question on namecache sizing > Cc: fs@freebsd.org > = > On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote: >> = >> Does moving the setting of wantfreevnodes before the cache size changes >> (as redone above) close the window enough? The vlrureclaim() function >> operates slowly enough that a brief period of inconsistency seems >> unimportant. Changing desiredvnodes happens very rarely. And at the mom= ent >> we are not correcting wantfreevnodes at all. Or am I missing some key p= oint? > = > I think wantfreevnodes should be set before the cache size changes when > desiredvnodes is decreased, but kept at the place in your patch for the > increasing case. What is the benefit of waiting until after the caches are resized for setting wantfreevnodes when desiredvnodes is increasing? It seems like it just complicates the code to conditionally do the update in two places, so I am inclined to just do it at the beginning as there is good reason for doing it there when downsizing and for the upsizing it does not really matter much. Kirk McKusick From owner-freebsd-fs@freebsd.org Thu Nov 5 23:08:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08E27A2731A for ; Thu, 5 Nov 2015 23:08:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E245C1C3C for ; Thu, 5 Nov 2015 23:08:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id DED19A27319; Thu, 5 Nov 2015 23:08:09 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C49EBA27318 for ; Thu, 5 Nov 2015 23:08:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 595F01C3B for ; Thu, 5 Nov 2015 23:08:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 6282D3CA31E; Fri, 6 Nov 2015 10:07:59 +1100 (AEDT) Date: Fri, 6 Nov 2015 10:07:58 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Kirk McKusick , fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing In-Reply-To: <20151105205255.GL2257@kib.kiev.ua> Message-ID: <20151106094022.N3939@besplex.bde.org> References: <20151105195648.GK2257@kib.kiev.ua> <201511052025.tA5KPcLF066724@chez.mckusick.com> <20151105205255.GL2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=R6/+YolX c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=pGLkceISAAAA:8 a=2rIISSrIAAAA:8 a=6I5d2MoRAAAA:8 a=yuHgCiGg1UHkK4P56rYA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Nov 2015 23:08:10 -0000 On Thu, 5 Nov 2015, Konstantin Belousov wrote: > On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote: >>> Date: Thu, 5 Nov 2015 21:56:48 +0200 >>> From: Konstantin Belousov >>> To: Kirk McKusick >>> Subject: Re: an easy (?) question on namecache sizing >>> Cc: fs@freebsd.org >>> >>> On Thu, Nov 05, 2015 at 10:56:55AM -0800, Kirk McKusick wrote: >>>> >>>> I propose that we update wantfreevnodes in sysctl_update_desiredvnodes() >>>> so that it tracks the change in desiredvnodes: >>>> >>>> Index: /sys/kern/vfs_subr.c >>>> =================================================================== >>>> --- /sys/kern/vfs_subr.c (revision 290387) >>>> +++ /sys/kern/vfs_subr.c (working copy) >>>> @@ -293,6 +293,7 @@ >>>> if (old_desiredvnodes != desiredvnodes) { >>>> + wantfreevnodes = desiredvnodes / 4; >>>> vfs_hash_changesize(desiredvnodes); >>>> cache_changesize(desiredvnodes); >>>> } >>>> return (0); >>>> } >>>> >>>> Otherwise bumping up desiredvnodes will be less effective than expected. >>>> >>>> I see that Bruce has also suggested this change in his more extensive >>>> revisions. >>> >>> I think the idea is right, but the implementation is not. Just changing >>> wantfreevnodes after desirevnodes was reduced, creates a window where an >>> other thread could see small value for desiredvnodes, but large value >>> for wantfreevnodes. Then, e.g. vlrureclaim() would go wild. IMO it should >>> ensure that the observable values are non-contradictory. I found that this isn't a problem, provided some care is taken to work around bogus variable types in expressions like (desiredvnodes - wantfreevnodes). (These variables are bogusly unsigned, so expressions like this can give unsign extension bugs by having huge unsigned values instead of small negative ones. The variables are also bogusly long, so they they can have preposterous values which overflow when in expressions like vnrlu_free(freevnodes - wantfreevnodes) since vnlru_free()'s arg type is not consistently bogus. Their longness is hard to change since it is part of the sysctl ABI.) The code is quite robust to inconsistent and preposterous values. vlrureclaim() explicitly fixes up the preposterous user value of desiredvnodes <= 0 (it must be precisely 0 then, and the compiler should warn about the '<' part), but most fixups occur implicitly. It has always been possible to set wantfreevnodes to a much larger value than desiredvnodes. Nothing bad happens, but I found it convenient that a change like the above made wantfreevnodes reasonable when making large changes to desiredvnodes. >> Does moving the setting of wantfreevnodes before the cache size changes >> (as redone above) close the window enough? The vlrureclaim() function >> operates slowly enough that a brief period of inconsistency seems >> unimportant. Changing desiredvnodes happens very rarely. And at the moment >> we are not correcting wantfreevnodes at all. Or am I missing some key point? What happens is that there can be a long period of inconsistency before vlrureclaim() runs. It then makes everything consistent. You already depend on this by not bothering to call vlrureclaim() from the sysctl. There are also some unlocked accesses to the variables (including in the sysctl?), so we depend on wrong values of the variables due to races only giving transient inconsistencies. (The long types also ask for races :-).) Perfect locking for all of this would be painful. Bruce From owner-freebsd-fs@freebsd.org Fri Nov 6 06:24:24 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 061FEA27509 for ; Fri, 6 Nov 2015 06:24:24 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id DC18312E9 for ; Fri, 6 Nov 2015 06:24:23 +0000 (UTC) (envelope-from josh@tcbug.org) Received: by mailman.ysv.freebsd.org (Postfix) id D97FBA27508; Fri, 6 Nov 2015 06:24:23 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BF36EA27507 for ; Fri, 6 Nov 2015 06:24:23 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 92E5A12E8 for ; Fri, 6 Nov 2015 06:24:22 +0000 (UTC) (envelope-from josh@tcbug.org) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 2991820242 for ; Fri, 6 Nov 2015 01:24:21 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Fri, 06 Nov 2015 01:24:21 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=EoLDP7rTRF12id9 e83v9zZeQH1Q=; b=VP8q2YPgdDPPngR04WG3zReLKjSSrhCbo61QaiNG0LBu56l A9/MIqvLEuVtdKRllImOXLkAX/XNHY/EB8geddhpQZragujfxurc0JmEn21kbK2i /Bdds7XMNw1yi24I54EiAA9r3/fpDrmq0IJEKX6x/SCUUQej0PoLyGdil+Pg= X-Sasl-enc: 0Nh0t4QQ8uUJKliPgMxee/Te8USCB3FZjuzA3kXUlH8D 1446791060 Received: from [192.168.8.142] (184-158-23-49.dyn.centurytel.net [184.158.23.49]) by mail.messagingengine.com (Postfix) with ESMTPA id C808BC016DB; Fri, 6 Nov 2015 01:24:20 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: hast exec vs devd for handling CARP events From: Josh Paetzel X-Mailer: iPhone Mail (13B143) In-Reply-To: <20151103173526.GA17299@mail.michaelwlucas.com> Date: Fri, 6 Nov 2015 00:24:20 -0600 Cc: fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <64BC4405-50B7-424F-9CB6-00A6475C4204@tcbug.org> References: <20151103173526.GA17299@mail.michaelwlucas.com> To: "Michael W. Lucas" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 06:24:24 -0000 Both hast and carp have an amazing propensity to go split brain. This has de= vastating affects. Hundreds of years ago sailing vessels could determine their latitude by shoo= ting the sun with an astrolabe and knowing the precise time. (You can also c= alculate the time if you know your latitude and have an astrolabe) The wisdom of that time was go to sea with one clock or three. If you have o= ne clock you have no choice but to believe it. If you have three you believe= the two that agree. If you have two clocks and they disagree you are in a w= orld of hurt. Such is it with computers too. You really need three systems to do HA. The b= est you can do with two systems is require a positive ACK from both nodes be= fore one node will take over, otherwise you are forced to go passive-passive= and require administrative action to proceed. Any other strategy will event= ually hit a situation where it goes split brain. If you are in a directly connected universe you can use SPC-3 SCSI reservati= ons to lock the disks and prevent the nodes from destroying the storage. In a= hast universe you really need quorum. Thanks, Josh Paetzel > On Nov 3, 2015, at 11:35 AM, Michael W. Lucas w= rote: >=20 > Hi, >=20 > There's lots of recipes out there for HAST failover based on devd. >=20 > HAST also has the ability to run scripts on events, with the exec > function in hast.conf. >=20 > It *seems* it make more sense to have HAST mount filesystems and start > processes when it claims the master role for a resource, as opposed to > triggering that mount via a devd event and waiting for HAST to perform > the switch. >=20 > Thanks for any insight. I'm researching the "specialty filesystems" > book, and want to give the best advice. >=20 > =3D=3Dml >=20 > --=20 > Michael W. Lucas - mwlucas@michaelwlucas.com, Twitter @mwlauthor=20 > http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/ > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Fri Nov 6 13:33:36 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 22653A26D84 for ; Fri, 6 Nov 2015 13:33:36 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 033BE1389 for ; Fri, 6 Nov 2015 13:33:36 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 01FA4A26D83; Fri, 6 Nov 2015 13:33:36 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DBD2DA26D82 for ; Fri, 6 Nov 2015 13:33:35 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-wm0-x231.google.com (mail-wm0-x231.google.com [IPv6:2a00:1450:400c:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 76ECA1388 for ; Fri, 6 Nov 2015 13:33:35 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: by wmww144 with SMTP id w144so30277307wmw.1 for ; Fri, 06 Nov 2015 05:33:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PPuigyGif3pIoBFhlVPjPXtkvuo+8vM2GQFLAer/IVY=; b=ediiTjOJf4jB5cK+fk0Nz3wPwVIAj/CdcfWV5tx3/v+QGX73UrQbdNLiinWoKTEsV6 rNMW5ssL//PqXt0K5GAbgsJ3seyM3nPbJVnLmNz6J5VbOW9eMl7q9xPyXcJBRI3i22Lv z1Kh0v2vEwhqof8aiTgJWmpFmpDMS0RwXn7TvrfN+kK8s6t+l2WVE0gezxI6UU6ijWVu Z3okLPnbzrxx/uprShawAKtDQ+EtkliFdQgK4rDxN6JdAxsFwjUZ8juq3X/qvlv4tQSX SKIL8RjP+sd8iDStnA30kML+o3ksw0d8YnoRnM37eqpylguzVnTyIFaJqfXOWShGa3D4 w0PA== MIME-Version: 1.0 X-Received: by 10.28.18.3 with SMTP id 3mr10761130wms.67.1446816814062; Fri, 06 Nov 2015 05:33:34 -0800 (PST) Received: by 10.194.16.231 with HTTP; Fri, 6 Nov 2015 05:33:34 -0800 (PST) In-Reply-To: <64BC4405-50B7-424F-9CB6-00A6475C4204@tcbug.org> References: <20151103173526.GA17299@mail.michaelwlucas.com> <64BC4405-50B7-424F-9CB6-00A6475C4204@tcbug.org> Date: Fri, 6 Nov 2015 07:33:34 -0600 Message-ID: Subject: Re: hast exec vs devd for handling CARP events From: Adam Vande More To: Josh Paetzel Cc: "Michael W. Lucas" , fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 13:33:36 -0000 On Fri, Nov 6, 2015 at 12:24 AM, Josh Paetzel wrote: > Both hast and carp have an amazing propensity to go split brain. This has > devastating affects. > HAST has no more propensity for split brain than any other 2 node cluster HA storage IME. Proper fencing is really the best, but even a not quite perfect fencing setup is entirely sufficient for a great many use cases. In the cases where split-brain does occur, resolving it isn't an ELE either in a well designed HA system. HAST and heartbeat are the best combo I've used on FreeBSD and it's as good as the equivalents on other platforms. Real high end clustered HA like lustre is a different story on FreeBSD. -- Adam From owner-freebsd-fs@freebsd.org Fri Nov 6 16:55:26 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68100A26008 for ; Fri, 6 Nov 2015 16:55:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 39A2A1C60 for ; Fri, 6 Nov 2015 16:55:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA6GtQYV005626 for ; Fri, 6 Nov 2015 16:55:26 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204337] ZFS can have a non-empty directory, but the files don't exist. Date: Fri, 06 Nov 2015 16:55:26 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: Andrew@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 16:55:26 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204337 Bug ID: 204337 Summary: ZFS can have a non-empty directory, but the files don't exist. Product: Base System Version: 11.0-CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-fs@FreeBSD.org Reporter: Andrew@FreeBSD.org I can easily get ZFS into a state where a directory is empty, but the files within it don't exist. I can reproduce this by extracting base.txz from a weekly snapshot. When I extract the tarball I got: root@cavium:/tank/andrew # tar -xpf base.txz -C /tank/andrew/test/ ./usr/share/man/man3/remainderl.3.gz: Can't create 'usr/share/man/man3/remainderl.3.gz' ./usr/share/man/man3/jdate.3.gz: Can't create 'usr/share/man/man3/jdate.3.gz' ./usr/share/man/man3/archive_write_add_filter_xz.3.gz: Can't create 'usr/share/man/man3/archive_write_add_filter_xz.3.gz' ... ./usr/share/man/man3/winnstr.3.gz: Can't create 'usr/share/man/man3/winnstr.3.gz' ./usr/share/man/man3/dwarf_func_cu_offset.3.gz: Can't create 'usr/share/man/man3/dwarf_func_cu_offset.3.gz' ./usr/share/man/man3/remainder.3.gz: Can't create 'usr/share/man/man3/remainder.3.gz' tar: Error exit delayed from previous errors. I've trimmed most of the errors as there were over 1000 lines. When I tried to remove the test directory I got: root@cavium:/tank/andrew # rm -fr test rm: test/usr/share/man/man3: Directory not empty rm: test/usr/share/man: Directory not empty rm: test/usr/share: Directory not empty rm: test/usr: Directory not empty rm: test: Directory not empty root@cavium:/tank/andrew # ls -lh test/usr/share/man/man3 ls: MD4FileChunk.3.gz: No such file or directory ls: catanf.3.gz: No such file or directory ls: gelf_getmove.3.gz: No such file or directory ls: krb5_config_vget_strings.3.gz: No such file or directory ls: quota_read.3.gz: No such file or directory ls: rpc_clnt_create.3.gz: No such file or directory ls: ufs_disk_close.3.gz: No such file or directory ls: wcslcpy.3.gz: No such file or directory total 0 Nether remounting, or rebooting fixed the error, it seems to be an issue in the disk. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Fri Nov 6 18:51:00 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33438A283DC for ; Fri, 6 Nov 2015 18:51:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 16BE31E91 for ; Fri, 6 Nov 2015 18:51:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 16483A283DB; Fri, 6 Nov 2015 18:51:00 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 15DDEA283DA for ; Fri, 6 Nov 2015 18:51:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 83E261E8F for ; Fri, 6 Nov 2015 18:50:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tA6IosTV042274 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 6 Nov 2015 20:50:54 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tA6IosTV042274 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tA6IorXZ042271; Fri, 6 Nov 2015 20:50:53 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 6 Nov 2015 20:50:53 +0200 From: Konstantin Belousov To: Kirk McKusick Cc: fs@freebsd.org Subject: Re: an easy (?) question on namecache sizing Message-ID: <20151106185053.GS2257@kib.kiev.ua> References: <20151105205255.GL2257@kib.kiev.ua> <201511052107.tA5L7QBB067460@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201511052107.tA5L7QBB067460@chez.mckusick.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 18:51:00 -0000 On Thu, Nov 05, 2015 at 01:07:26PM -0800, Kirk McKusick wrote: > > Date: Thu, 5 Nov 2015 22:52:55 +0200 > > From: Konstantin Belousov > > To: Kirk McKusick > > Subject: Re: an easy (?) question on namecache sizing > > Cc: fs@freebsd.org > > > > On Thu, Nov 05, 2015 at 12:25:38PM -0800, Kirk McKusick wrote: > >> > >> Does moving the setting of wantfreevnodes before the cache size changes > >> (as redone above) close the window enough? The vlrureclaim() function > >> operates slowly enough that a brief period of inconsistency seems > >> unimportant. Changing desiredvnodes happens very rarely. And at the moment > >> we are not correcting wantfreevnodes at all. Or am I missing some key point? > > > > I think wantfreevnodes should be set before the cache size changes when > > desiredvnodes is decreased, but kept at the place in your patch for the > > increasing case. > > What is the benefit of waiting until after the caches are resized > for setting wantfreevnodes when desiredvnodes is increasing? It > seems like it just complicates the code to conditionally do the > update in two places, so I am inclined to just do it at the beginning > as there is good reason for doing it there when downsizing and for > the upsizing it does not really matter much. With upsizing, if wantfreevnodes are set before desiredvnodes are increased, you can again get into the contradictory state, where the wantfreevnodes is larger than desiredvnodes. From owner-freebsd-fs@freebsd.org Fri Nov 6 19:09:37 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4045CA28793 for ; Fri, 6 Nov 2015 19:09:37 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 2BD301A1F for ; Fri, 6 Nov 2015 19:09:37 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: by mailman.ysv.freebsd.org (Postfix) id 2B29DA28792; Fri, 6 Nov 2015 19:09:37 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2ACE0A28791 for ; Fri, 6 Nov 2015 19:09:37 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:d250:99ff:fe57:4030]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1330C1A1E for ; Fri, 6 Nov 2015 19:09:37 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [IPv6:::1]) by chez.mckusick.com (8.15.2/8.14.9) with ESMTP id tA6J9aqJ091368; Fri, 6 Nov 2015 11:09:36 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201511061909.tA6J9aqJ091368@chez.mckusick.com> From: Kirk McKusick To: Konstantin Belousov Subject: Re: an easy (?) question on namecache sizing cc: fs@freebsd.org In-reply-to: <20151106185053.GS2257@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <91366.1446836976.1@chez.mckusick.com> Date: Fri, 06 Nov 2015 11:09:36 -0800 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Nov 2015 19:09:37 -0000 > Date: Fri, 6 Nov 2015 20:50:53 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Subject: Re: an easy (?) question on namecache sizing > Cc: fs@freebsd.org > >>> I think wantfreevnodes should be set before the cache size changes when >>> desiredvnodes is decreased, but kept at the place in your patch for the >>> increasing case. >> >> What is the benefit of waiting until after the caches are resized >> for setting wantfreevnodes when desiredvnodes is increasing? It >> seems like it just complicates the code to conditionally do the >> update in two places, so I am inclined to just do it at the beginning >> as there is good reason for doing it there when downsizing and for >> the upsizing it does not really matter much. > > With upsizing, if wantfreevnodes are set before desiredvnodes are > increased, you can again get into the contradictory state, where > the wantfreevnodes is larger than desiredvnodes. Setting wantfreevnodes before adjusting the cache sizes always has first set desiredvnodes. Waiting to set wantfreevnodes until after the caches have been resized has no effect on desiredvnodes. It is just less efficient to find them when desiredvnodes is rising. Kirk McKusick From owner-freebsd-fs@freebsd.org Sat Nov 7 18:04:27 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B954A28BBF for ; Sat, 7 Nov 2015 18:04:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E44191251 for ; Sat, 7 Nov 2015 18:04:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tA7I4QT1080467 for ; Sat, 7 Nov 2015 18:04:26 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 201677] unionfs or tmpfs kernel panic Date: Sat, 07 Nov 2015 18:04:26 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-BETA1 X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: franco@opnsense.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Nov 2015 18:04:27 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201677 --- Comment #2 from Franco Fichtner --- Need to pick this back up. Would like to see this brought into in 11-CURRENT if there are no objections? -- You are receiving this mail because: You are the assignee for the bug.