From owner-freebsd-fs@FreeBSD.ORG  Mon Jul 15 19:32:34 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 33361F8D;
 Mon, 15 Jul 2013 19:32:34 +0000 (UTC)
 (envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (chez.mckusick.com
 [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452])
 by mx1.freebsd.org (Postfix) with ESMTP id 0B13F303;
 Mon, 15 Jul 2013 19:32:33 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
 by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id r6FJWSxM087108;
 Mon, 15 Jul 2013 12:32:28 -0700 (PDT)
 (envelope-from mckusick@chez.mckusick.com)
Message-Id: <201307151932.r6FJWSxM087108@chez.mckusick.com>
To: Dan Thomas <godders@gmail.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?) 
In-reply-to: <CAG8duQ14p3s-9hK_bqjjYsb_4_yG8UsHdmXz-TyRe2qX5O5Kpg@mail.gmail.com>
Date: Mon, 15 Jul 2013 12:32:28 -0700
From: Kirk McKusick <mckusick@mckusick.com>
Cc: freebsd-fs@freebsd.org, Palle Girgensohn <girgen@freebsd.org>,
 Jeff Roberson <jroberson@jroberson.net>, Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Jul 2013 19:32:34 -0000

> Date: Mon, 15 Jul 2013 10:51:10 +0100
> Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
> From: Dan Thomas <godders@gmail.com>
> To: Kirk McKusick <mckusick@mckusick.com>
> Cc: Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org,
>         Jeff Roberson <jroberson@jroberson.net>,
>         Julian Akehurst <julian@pingpong.se>
> X-ASK-Info: Message Queued (2013/07/15 02:51:22)
> X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
> 
> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com> wrote:
> > OK, good to have it narrowed down. I will look to devise some
> > additional diagnostics that hopefully will help tease out the
> > bug. I'll hopefully get back to you soon.
> 
> Hi,
> 
> Is there any news on this issue? We're still running several servers
> that are exhibiting this problem (most recently, one that seems to be
> leaking around 10gb/hour), and it's getting to the point where we're
> looking at moving to a different OS until it's resolved.
> 
> We have access to several production systems with this problem and (at
> least from time to time) will have systems with a significant leak on
> them that we can experiment with. Is there any way we can assist with
> tracking this down? Any diagnostics or testing that would be useful?
> 
> Thanks,
> Dan

Hi Dan (and Palle),

Sorry for the long delay with no help / news. I have gotten
side-tracked on several projects and have had little time to try
and devise some tests that would help find the cause of the lost
space. It almost certainly is a one-line fix (a missing vput or
vrele probably in some error path), but finding where it goes is
the hard part :-)

I have had little success in inserting code that tracks reference
counts (too many false positives). So, I am going to need some help
from you to narrow it down. My belief is that there is some set of
filesystem operations (system calls) that are leading to the problem.
Notably, a file is being created, data put into it, then the file
is deleted (either before or after being closed).  Somehow a reference
to that file is persisting despite there being no valid reference
to it. Hence the filesystem thinks it is still live and is not
deleting it. When you do the forcible unmount, these files get
cleared and the space shows back up.

What I need to devise is a small test program doing the set of
system calls that cause this to happen. The way that I would like
to try and get it is to have you `ktrace -i' your application and
then run your application just long enough to create at least one
of these lost files. The goal is to minimize the amount of ktrace
data through which we need to sift.

In preparation for doing this test you need to have a kernel
compiled with `option DIAGNOSTIC' or if you prefer, just add
`#define DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will
know you have at least one offending file when you try to unmount the
affected filesystem and find it busy. Before doing the `umount -f',
enable busy printing using `sysctl debug.busyprt=1'. Then capture
the console output which will show the details of all the vnodes
that had to be forcibly flushed. Hopefully we will then be able to
correlate them back to the files (NAMI in the ktrace output) with
which they were associated. We may need to augment the NAMI data
with the inode number of the associated file to make the association
with the busyprt output. Anyway, once we have that, we can look at
all the system calls done on those files and create a small test
program that exhibits the problem. Given a small test program, Jeff
or I can track down the offending system call path and nail this
pernicious bug once and for all.

	Kirk McKusick