From owner-freebsd-stable@FreeBSD.ORG  Wed Jan  2 13:24:47 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7D463561;
 Wed,  2 Jan 2013 13:24:47 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id F1A508FC0A;
 Wed,  2 Jan 2013 13:24:46 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq02AKk05FCDaFvO/2dsb2JhbABFFoFphDu3KHOCHgEBBSNWGw4KAgINGQJZBogmDKdukEuBIo5lgRMDiGKNKoEcjyyDEoII
X-IronPort-AV: E=Sophos;i="4.84,396,1355115600"; 
   d="scan'208";a="8578889"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 02 Jan 2013 08:24:39 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6A7F5B3F7D;
 Wed,  2 Jan 2013 08:24:39 -0500 (EST)
Date: Wed, 2 Jan 2013 08:24:39 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Hiroki Sato <hrs@FreeBSD.org>
Message-ID: <1914428061.1617223.1357133079421.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20130102.105304.1817355190360003433.hrs@allbsd.org>
Subject: Re: NFS-exported ZFS instability
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: Konstantin Belousov <kostikbel@gmail.com>, alc <alc@freebsd.org>,
 stable@FreeBSD.org
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 13:24:47 -0000

Hiroki Sato wrote:
> Hello,
> 
> I have been in a trouble about my NFS server for a long time. The
> symptom is that it stops working in one or two weeks after a boot. I
> could not track down the cause yet, but it is reproducible and only
> occurred under a very high I/O load.
> 
> It did not panic, just stopped working---while it responded to ping,
> userland programs seemed not working. I could break it into DDB and
> get a kernel dump. The following URLs are a log of ps, trace, and
> etc.:
> 
> http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
> http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102
> 
> Does anyone see how to debug this? I guess this is due to a deadlock
> somewhere. I have suffered from this problem for almost two years.
> The above log is from stable/9 as of Dec 19, but this have persisted
> since 8.X.
> 
Well, I took a quick glance at the log and there are a lot of processes
sleeping on "pfault" (in vm_waitpfault() in sys/vm/vm_page.c). I'm no
vm guy, so I'm not sure when/why that will happen. The comment on the
function suggests they are waiting for free pages.

Maybe something as simple as running out of swap space or a problem
talking to the disk(s) that has the swap partition(s) or ???
(I'm talking through my hat here, because I'm not conversant with
 the vm side of things.)

I might take a closer look this evening and see if I can spot anything
in the log, rick
ps: I hope Alan and Kostik don't mind being added to the cc list.

> -- Hiroki