From owner-freebsd-fs@FreeBSD.ORG Fri May 31 18:25:45 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 99CA3ED9; Fri, 31 May 2013 18:25:45 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 6EA3963F; Fri, 31 May 2013 18:25:45 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r4VIPeFV077457; Fri, 31 May 2013 11:25:41 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201305311825.r4VIPeFV077457@chez.mckusick.com> To: Palle Girgensohn Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) In-reply-to: <51A73076.8020609@FreeBSD.org> Date: Fri, 31 May 2013 11:25:40 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@FreeBSD.org, Dan Thomas , Jeff Roberson , Julian Akehurst X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 May 2013 18:25:45 -0000 > Date: Thu, 30 May 2013 12:56:54 +0200 > From: Palle Girgensohn > To: Kirk McKusick > CC: freebsd-fs@FreeBSD.org, Jeff Roberson , > Dan Thomas , Julian Akehurst > Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) > > Hello again! > > I have now remounted the postgresql filesystem on a test server that > experiences the same problem. The production server is not remounted > yet, we're planning that in a weeks time approximately, but I though I > could gain som time by running the suggested procedure on the test box. > > The base problem was this: > > # df -h /pgsql ; du -hxs /pgsql > Filesystem Size Used Avail Capacity Mounted on > /dev/da2s1d 128G 101G 17G 86% /pgsql > 82G /pgsql > > df says 101 GB used, but du only finds 82 GB, and fstat cannot find any > open files that are unreferenced in the file system. Stopping postgresql > does not help. It seems the OS is leaking inode references. > > FreeBSD 9.1, postgresql 9.2.3 from port. > > I ran the suggested commans (in attached diskspacecheck) before stopping > postgresql (before.log), after stopping postgresql but before unmount > /pgsql (before2.log), and then i unmounted /pgsql (had to run umount -f > /pgsql, and it took about 20 seconds). I did not enter single-user mode, > since I really did not have to this time (On the production server, the > disk is /usr, so that will require more shutting down...) > > I've attach the logs here. Hope it helps! > > The commands run in diskspaccheck are > #! /bin/sh > df -ih /pgsql > vmstat -z > vmstat -m > sysctl debug > fstat -f /pgsql > > as suggested by Kirk. Your results are very enlightening. Especially the fact that you have to do a forcible unmount of the filesystem. What that tells me is that somehow we are getting vnodes that have phantom references. That is there is some system call where we get a reference on a vnode (vref, vget, or similar) that does not ultimately have a corresponding drop of the reference (vrele, vput, or similar). The net effect is that the file is held open despite the fact that there are no longer any connections to it. When you do the forcible unmount, the kernel walks the list of vnodes associated with the filesystem and does a vgone on each of them. That causes each to be inactivated which then triggers the release of their associated disk space. The reason that the unmount takes 20 seconds is to process all the releasing of the space. My guess is that there is an error path in some system call that is missing the vrele or vput. Assuming that you are able to run some more tests on your test machine, the next step in narrowing down the set of code to look at is to try running your system with soft updates disabled. The idea is to find out whether the miss-matched references are in the soft updates code or are in one of the filesystem system calls themselves. To disable soft updates run the command `tunefs -n disable /pgsql' on the unmounted /pgsql filesystem. If the system then runs without the problem, I will know to search the soft updates code. If the problem persists, then I'll know to look in the system calls themselves. You may want to do some preliminary tests to see how quickly the problem manifests itself. You can do this by running it for a short time (10 minutes say) and then checking to see if you need to do a forcible unmount of the filesystem. Once you establish how long you have to run before you reliably have to do a forcible unmount, you will know how long to run the test with soft updates turned off. If you find that running with soft updates turned off makes your application run too slowly you can mount your filesystem asynchronously. Note however, that you should not run asynchronously if the data on the filesystem is critical as you may end up with an unrecoverable filesystem after a power failure or system crash. So only run asynchronously if you can afford to lose your filesystem. Finally, it would be helpful if you could add two more commands to your diskspacecheck.sh script: sysctl -a | egrep vnode mount -v The first shows the vnode usage and the second shows the operational state of your filesystems. Kirk McKusick