Date: Mon, 10 Jun 2013 11:55:03 +0200 From: Palle Girgensohn <girgen@FreeBSD.org> To: Kirk McKusick <mckusick@mckusick.com> Cc: freebsd-fs@freebsd.org, Dan Thomas <godders@gmail.com>, Jeff Roberson <jroberson@jroberson.net>, Julian Akehurst <julian@pingpong.se> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) Message-ID: <51B5A277.2060904@FreeBSD.org> In-Reply-To: <201306022101.r52L19vg033389@chez.mckusick.com> References: <201306022101.r52L19vg033389@chez.mckusick.com>
next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kirk McKusick skrev: >> Date: Sun, 02 Jun 2013 22:35:23 +0200 From: Palle Girgensohn >> <girgen@freebsd.org> To: Kirk McKusick <mckusick@mckusick.com> >> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) >> Cc: freebsd-fs@freebsd.org, Dan Thomas <godders@gmail.com>, Jeff >> Roberson <jroberson@jroberson.net>, Julian Akehurst >> <julian@pingpong.se> >> >> --On 31 maj 2013 11.25.40 -0700 Kirk McKusick >> <mckusick@mckusick.com> wrote: >> >>> Your results are very enlightening. Especially the fact that you >>> have to do a forcible unmount of the filesystem. What that tells >>> me is that somehow we are getting vnodes that have phantom >>> references. That is there is some system call where we get a >>> reference on a vnode (vref, vget, or similar) that does not >>> ultimately have a corresponding drop of the reference (vrele, >>> vput, or similar). The net effect is that the file is held open >>> despite the fact that there are no longer any connections to it. >>> When you do the forcible unmount, the kernel walks the list of >>> vnodes associated with the filesystem and does a vgone on each of >>> them. That causes each to be inactivated which then triggers the >>> release of their associated disk space. The reason that the >>> unmount takes 20 seconds is to process all the releasing of the >>> space. My guess is that there is an error path in some system >>> call that is missing the vrele or vput. >>> >>> Assuming that you are able to run some more tests on your test >>> machine, the next step in narrowing down the set of code to look >>> at is to try running your system with soft updates disabled. The >>> idea is to find out whether the miss-matched references are in >>> the soft updates code or are in one of the filesystem system >>> calls themselves. To disable soft updates run the command `tunefs >>> -n disable /pgsql' on the unmounted /pgsql filesystem. If the >>> system then runs without the problem, I will know to search the >>> soft updates code. If the problem persists, then I'll know to >>> look in the system calls themselves. You may want to do some >>> preliminary tests to see how quickly the problem manifests >>> itself. You can do this by running it for a short time (10 >>> minutes say) and then checking to see if you need to do a >>> forcible unmount of the filesystem. Once you establish how long >>> you have to run before you reliably have to do a forcible >>> unmount, you will know how long to run the test with soft updates >>> turned off. If you find that running with soft updates turned off >>> makes your application run too slowly you can mount your >>> filesystem asynchronously. Note however, that you should not run >>> asynchronously if the data on the filesystem is critical as you >>> may end up with an unrecoverable filesystem after a power >>> failure or system crash. So only run asynchronously if you can >>> afford to lose your filesystem. >>> >>> Finally, it would be helpful if you could add two more commands >>> to your diskspacecheck.sh script: >>> >>> sysctl -a | egrep vnode mount -v >>> >>> The first shows the vnode usage and the second shows the >>> operational state of your filesystems. >>> >>> Kirk McKusick >> OK, I have now turned off soft updates. This is on the test server. >> It is not as busy as the production machine, but I'll keep an eye >> on it and will mail new results as soon as I see any evidence of >> either that soft updates is the culprit or that it is not. >> >> FWIW, I attach the script from this remount process as well, which >> includes >> >> sysctl -a | grep vnode ; mount -v. >> >> Note that it is all in one script file this time. >> >> Cheers, Palle > > This looks good. Keep me posted. After running for a number of days without soft updates, it seems to me that the culprit is indeed in the soft updates code. # df -k /pgsql; du -sk /pgsql Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/da2s1d 134763348 86339044 37643238 70% /pgsql 86303252 /pgsql Palle -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJRtaJ3AAoJEIhV+7FrxBJD+IkH/3FOoZ95VGE0fOWSuFIwVn8I jvHiJ6qTx0zh17pZNnc+G0UpU5fHxCazD1yT6yCwfkWebWKXELXtfQMeZUMGi0AX e94P0HJ2O4RQSMHC1rlWSLUidAB6m1ZtAtpXzgziB9P/Jonk78uFqRcTmZyMycsy pxPFHsbywsjJm9FLF4ZuhiSPX57tbAKLQM3HYDMFQ/rHPJiBlkx7VVeON6svtmMO bRZWnQTUXUAAMT1NDUEL8opGAO2S72+hFBiCjJsgS22SSq7KIMzAlJqq01L2svhH o7KNAkN6lIMuJS9B2idjJWLVXG/vNQ1QBOha0VY80fIQYSYeZt25EGlXf3rYL6Y= =Zmu2 -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51B5A277.2060904>