From owner-freebsd-fs@FreeBSD.ORG Mon Jun 10 10:03:43 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2AABC67F for ; Mon, 10 Jun 2013 10:03:43 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from melon.pingpong.net (melon.pingpong.net [79.136.116.200]) by mx1.freebsd.org (Postfix) with ESMTP id C1FD318B6 for ; Mon, 10 Jun 2013 10:03:42 +0000 (UTC) Received: from girgBook.local (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by melon.pingpong.net (Postfix) with ESMTPSA id 0A4B616571; Mon, 10 Jun 2013 11:54:58 +0200 (CEST) Message-ID: <51B5A277.2060904@FreeBSD.org> Date: Mon, 10 Jun 2013 11:55:03 +0200 From: Palle Girgensohn User-Agent: Postbox 3.0.8 (Macintosh/20130427) MIME-Version: 1.0 To: Kirk McKusick Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) References: <201306022101.r52L19vg033389@chez.mckusick.com> In-Reply-To: <201306022101.r52L19vg033389@chez.mckusick.com> X-Enigmail-Version: 1.2.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Dan Thomas , Jeff Roberson , Julian Akehurst X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Jun 2013 10:03:43 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kirk McKusick skrev: >> Date: Sun, 02 Jun 2013 22:35:23 +0200 From: Palle Girgensohn >> To: Kirk McKusick >> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) >> Cc: freebsd-fs@freebsd.org, Dan Thomas , Jeff >> Roberson , Julian Akehurst >> >> >> --On 31 maj 2013 11.25.40 -0700 Kirk McKusick >> wrote: >> >>> Your results are very enlightening. Especially the fact that you >>> have to do a forcible unmount of the filesystem. What that tells >>> me is that somehow we are getting vnodes that have phantom >>> references. That is there is some system call where we get a >>> reference on a vnode (vref, vget, or similar) that does not >>> ultimately have a corresponding drop of the reference (vrele, >>> vput, or similar). The net effect is that the file is held open >>> despite the fact that there are no longer any connections to it. >>> When you do the forcible unmount, the kernel walks the list of >>> vnodes associated with the filesystem and does a vgone on each of >>> them. That causes each to be inactivated which then triggers the >>> release of their associated disk space. The reason that the >>> unmount takes 20 seconds is to process all the releasing of the >>> space. My guess is that there is an error path in some system >>> call that is missing the vrele or vput. >>> >>> Assuming that you are able to run some more tests on your test >>> machine, the next step in narrowing down the set of code to look >>> at is to try running your system with soft updates disabled. The >>> idea is to find out whether the miss-matched references are in >>> the soft updates code or are in one of the filesystem system >>> calls themselves. To disable soft updates run the command `tunefs >>> -n disable /pgsql' on the unmounted /pgsql filesystem. If the >>> system then runs without the problem, I will know to search the >>> soft updates code. If the problem persists, then I'll know to >>> look in the system calls themselves. You may want to do some >>> preliminary tests to see how quickly the problem manifests >>> itself. You can do this by running it for a short time (10 >>> minutes say) and then checking to see if you need to do a >>> forcible unmount of the filesystem. Once you establish how long >>> you have to run before you reliably have to do a forcible >>> unmount, you will know how long to run the test with soft updates >>> turned off. If you find that running with soft updates turned off >>> makes your application run too slowly you can mount your >>> filesystem asynchronously. Note however, that you should not run >>> asynchronously if the data on the filesystem is critical as you >>> may end up with an unrecoverable filesystem after a power >>> failure or system crash. So only run asynchronously if you can >>> afford to lose your filesystem. >>> >>> Finally, it would be helpful if you could add two more commands >>> to your diskspacecheck.sh script: >>> >>> sysctl -a | egrep vnode mount -v >>> >>> The first shows the vnode usage and the second shows the >>> operational state of your filesystems. >>> >>> Kirk McKusick >> OK, I have now turned off soft updates. This is on the test server. >> It is not as busy as the production machine, but I'll keep an eye >> on it and will mail new results as soon as I see any evidence of >> either that soft updates is the culprit or that it is not. >> >> FWIW, I attach the script from this remount process as well, which >> includes >> >> sysctl -a | grep vnode ; mount -v. >> >> Note that it is all in one script file this time. >> >> Cheers, Palle > > This looks good. Keep me posted. After running for a number of days without soft updates, it seems to me that the culprit is indeed in the soft updates code. # df -k /pgsql; du -sk /pgsql Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/da2s1d 134763348 86339044 37643238 70% /pgsql 86303252 /pgsql Palle -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJRtaJ3AAoJEIhV+7FrxBJD+IkH/3FOoZ95VGE0fOWSuFIwVn8I jvHiJ6qTx0zh17pZNnc+G0UpU5fHxCazD1yT6yCwfkWebWKXELXtfQMeZUMGi0AX e94P0HJ2O4RQSMHC1rlWSLUidAB6m1ZtAtpXzgziB9P/Jonk78uFqRcTmZyMycsy pxPFHsbywsjJm9FLF4ZuhiSPX57tbAKLQM3HYDMFQ/rHPJiBlkx7VVeON6svtmMO bRZWnQTUXUAAMT1NDUEL8opGAO2S72+hFBiCjJsgS22SSq7KIMzAlJqq01L2svhH o7KNAkN6lIMuJS9B2idjJWLVXG/vNQ1QBOha0VY80fIQYSYeZt25EGlXf3rYL6Y= =Zmu2 -----END PGP SIGNATURE-----