From owner-freebsd-fs@FreeBSD.ORG  Tue Jul 16 22:47:27 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id BDC6CF65
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:47:27 +0000 (UTC)
 (envelope-from girgen@FreeBSD.org)
Received: from melon.pingpong.net (melon.pingpong.net [79.136.116.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 5D0AD76A
 for <freebsd-fs@freebsd.org>; Tue, 16 Jul 2013 22:47:27 +0000 (UTC)
Received: from girgbook.lan
 (c-0f54e155.1525-1-64736c12.cust.bredbandsbolaget.se [85.225.84.15])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by melon.pingpong.net (Postfix) with ESMTPSA id BE2992E651;
 Wed, 17 Jul 2013 00:47:23 +0200 (CEST)
Message-ID: <51E5CD7A.2020109@FreeBSD.org>
Date: Wed, 17 Jul 2013 00:47:22 +0200
From: Palle Girgensohn <girgen@FreeBSD.org>
User-Agent: Postbox 3.0.8 (Macintosh/20130427)
MIME-Version: 1.0
To: Kirk McKusick <mckusick@mckusick.com>
Subject: Re: leaking lots of unreferenced inodes (pg_xlog files?)
References: <201307151932.r6FJWSxM087108@chez.mckusick.com>
In-Reply-To: <201307151932.r6FJWSxM087108@chez.mckusick.com>
X-Enigmail-Version: 1.2.3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Jeff Roberson <jroberson@jroberson.net>,
 Julian Akehurst <julian@pingpong.se>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Jul 2013 22:47:27 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kirk McKusick skrev:
>> Date: Mon, 15 Jul 2013 10:51:10 +0100 Subject: Re: leaking lots of
>> unreferenced inodes (pg_xlog files?) From: Dan Thomas
>> <godders@gmail.com> To: Kirk McKusick <mckusick@mckusick.com> Cc:
>> Palle Girgensohn <girgen@freebsd.org>, freebsd-fs@freebsd.org, Jeff
>> Roberson <jroberson@jroberson.net>, Julian Akehurst
>> <julian@pingpong.se> X-ASK-Info: Message Queued (2013/07/15
>> 02:51:22) X-ASK-Info: Confirmed by User (2013/07/15 02:55:04)
>> 
>> On 11 June 2013 01:17, Kirk McKusick <mckusick@mckusick.com>
>> wrote:
>>> OK, good to have it narrowed down. I will look to devise some 
>>> additional diagnostics that hopefully will help tease out the 
>>> bug. I'll hopefully get back to you soon.
>> Hi,
>> 
>> Is there any news on this issue? We're still running several
>> servers that are exhibiting this problem (most recently, one that
>> seems to be leaking around 10gb/hour), and it's getting to the
>> point where we're looking at moving to a different OS until it's
>> resolved.
>> 
>> We have access to several production systems with this problem and
>> (at least from time to time) will have systems with a significant
>> leak on them that we can experiment with. Is there any way we can
>> assist with tracking this down? Any diagnostics or testing that
>> would be useful?
>> 
>> Thanks, Dan
> 
> Hi Dan (and Palle),
> 
> Sorry for the long delay with no help / news. I have gotten 
> side-tracked on several projects and have had little time to try and
> devise some tests that would help find the cause of the lost space.
> It almost certainly is a one-line fix (a missing vput or vrele
> probably in some error path), but finding where it goes is the hard
> part :-)
> 
> I have had little success in inserting code that tracks reference 
> counts (too many false positives). So, I am going to need some help 
> from you to narrow it down. My belief is that there is some set of 
> filesystem operations (system calls) that are leading to the
> problem. Notably, a file is being created, data put into it, then the
> file is deleted (either before or after being closed).  Somehow a
> reference to that file is persisting despite there being no valid
> reference to it. Hence the filesystem thinks it is still live and is
> not deleting it. When you do the forcible unmount, these files get 
> cleared and the space shows back up.
> 
> What I need to devise is a small test program doing the set of system
> calls that cause this to happen. The way that I would like to try and
> get it is to have you `ktrace -i' your application and then run your
> application just long enough to create at least one of these lost
> files. The goal is to minimize the amount of ktrace data through
> which we need to sift.
> 
> In preparation for doing this test you need to have a kernel compiled
> with `option DIAGNOSTIC' or if you prefer, just add `#define
> DIAGNOSTIC 1' to the top of sys/kern/vfs_subr.c. You will know you
> have at least one offending file when you try to unmount the affected
> filesystem and find it busy. Before doing the `umount -f', enable
> busy printing using `sysctl debug.busyprt=1'. Then capture the
> console output which will show the details of all the vnodes that had
> to be forcibly flushed. Hopefully we will then be able to correlate
> them back to the files (NAMI in the ktrace output) with which they
> were associated. We may need to augment the NAMI data with the inode
> number of the associated file to make the association with the
> busyprt output. Anyway, once we have that, we can look at all the
> system calls done on those files and create a small test program that
> exhibits the problem. Given a small test program, Jeff or I can track
> down the offending system call path and nail this pernicious bug once
> and for all.
> 
> Kirk McKusick

Hi,

I have run ktrace -i on pg_ctl (which forks off all the postgresql
processes) and I got two "busy" files that where "lost" after a few
hours. dmesg reveals this:

vflush: busy vnode
0xfffffe067cdde960: tag ufs, type VREG
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe0335922000 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11047146, on dev da2s1d
vflush: busy vnode
0xfffffe039f35bb40: tag ufs, type VREG
    usecount 1, writecount 0, refcount 3 mountedhere 0
    flags (VI(0x200))
 VI_LOCKed    v_object 0xfffffe03352701d0 ref 0 pages 0
    lock type ufs: EXCL by thread 0xfffffe01600eb8e0 (pid 56723)
	ino 11045961, on dev da2s1d


I had to umount -f, so they where "lost".

So, now I have 55 GB ktrace output... ;)  Is there anything I can do to
filter it, or shall I compress it and put it on a web server for you to
fetch as it is?

Palle

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJR5c16AAoJEIhV+7FrxBJDK0AH/RLG1QLdyQhwNC6USlqO2+2B
6HXmYwbmDCMIlUQZAaG4h0x6QPzWjXWYMa1KDdpk/BtRhfL7z8tFPdWjTzqBPuK1
aEEQjv/Cp5IgI6FqVbc2agW3GfUwomtjEL3lUk2zmKdPImEWte6ZkLzOFgQpqQao
QAxFnN0I8/g+ynQNQIavGOo0foze89wAuOaNvoy9z1wa7tFbjlH2lsVK1xGU6eNj
AQn4RJw+tMPMGkNMy6Xjy7B/WMXfxutz1f4O9B1KBwLRZ/cgKxhmppoZdF3N4JsK
GNiQvcRbYR9GhBiK+Er87UXKBcj2NS+QQsdSqIb5Ik1ahp78hjxq3raHuOLCTLw=
=8+W4
-----END PGP SIGNATURE-----