From owner-freebsd-fs@FreeBSD.ORG Mon Jul 15 19:32:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 33361F8D; Mon, 15 Jul 2013 19:32:34 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) by mx1.freebsd.org (Postfix) with ESMTP id 0B13F303; Mon, 15 Jul 2013 19:32:33 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r6FJWSxM087108; Mon, 15 Jul 2013 12:32:28 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201307151932.r6FJWSxM087108@chez.mckusick.com> To: Dan Thomas Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) In-reply-to: Date: Mon, 15 Jul 2013 12:32:28 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org, Palle Girgensohn , Jeff Roberson , Julian Akehurst X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jul 2013 19:32:34 -0000 > Date: Mon, 15 Jul 2013 10:51:10 +0100 > Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) > From: Dan Thomas > To: Kirk McKusick > Cc: Palle Girgensohn , freebsd-fs@freebsd.org, > Jeff Roberson , > Julian Akehurst > X-ASK-Info: Message Queued (2013/07/15 02:51:22) > X-ASK-Info: Confirmed by User (2013/07/15 02:55:04) > > On 11 June 2013 01:17, Kirk McKusick wrote: > > OK, good to have it narrowed down. I will look to devise some > > additional diagnostics that hopefully will help tease out the > > bug. I'll hopefully get back to you soon. > > Hi, > > Is there any news on this issue? We're still running several servers > that are exhibiting this problem (most recently, one that seems to be > leaking around 10gb/hour), and it's getting to the point where we're > looking at moving to a different OS until it's resolved. > > We have access to several production systems with this problem and (at > least from time to time) will have systems with a significant leak on > them that we can experiment with. Is there any way we can assist with > tracking this down? Any diagnostics or testing that would be useful? > > Thanks, > Dan Hi Dan (and Palle), Sorry for the long delay with no help / news. I have gotten side-tracked on several projects and have had little time to try and devise some tests that would help find the cause of the lost space. It almost certainly is a one-line fix (a missing vput or vrele probably in some error path), but finding where it goes is the hard part :-) I have had little success in inserting code that tracks reference counts (too many false positives). So, I am going to need some help from you to narrow it down. My belief is that there is some set of filesystem operations (system calls) that are leading to the problem. Notably, a file is being created, data put into it, then the file is deleted (either before or after being closed). Somehow a reference to that file is persisting despite there being no valid reference to it. Hence the filesystem thinks it is still live and is not deleting it. When you do the forcible unmount, these files get cleared and the space shows back up. What I need to devise is a small test program doing the set of system calls that cause this to happen. The way that I would like to try and get it is to have you `ktrace -i' your application and then run your application just long enough to create at least one of these lost files. The goal is to minimize the amount of ktrace data through which we need to sift. In preparation for doing this test you need to have a kernel compiled with `option DIAGNOSTIC' or if you prefer, just add `#define DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you have at least one offending file when you try to unmount the affected filesystem and find it busy. Before doing the `umount -f', enable busy printing using `sysctl debug.busyprt=1'. Then capture the console output which will show the details of all the vnodes that had to be forcibly flushed. Hopefully we will then be able to correlate them back to the files (NAMI in the ktrace output) with which they were associated. We may need to augment the NAMI data with the inode number of the associated file to make the association with the busyprt output. Anyway, once we have that, we can look at all the system calls done on those files and create a small test program that exhibits the problem. Given a small test program, Jeff or I can track down the offending system call path and nail this pernicious bug once and for all. Kirk McKusick