From owner-freebsd-stable@FreeBSD.ORG Wed Jan 2 13:24:47 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7D463561; Wed, 2 Jan 2013 13:24:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id F1A508FC0A; Wed, 2 Jan 2013 13:24:46 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq02AKk05FCDaFvO/2dsb2JhbABFFoFphDu3KHOCHgEBBSNWGw4KAgINGQJZBogmDKdukEuBIo5lgRMDiGKNKoEcjyyDEoII X-IronPort-AV: E=Sophos;i="4.84,396,1355115600"; d="scan'208";a="8578889" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 02 Jan 2013 08:24:39 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6A7F5B3F7D; Wed, 2 Jan 2013 08:24:39 -0500 (EST) Date: Wed, 2 Jan 2013 08:24:39 -0500 (EST) From: Rick Macklem To: Hiroki Sato Message-ID: <1914428061.1617223.1357133079421.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20130102.105304.1817355190360003433.hrs@allbsd.org> Subject: Re: NFS-exported ZFS instability MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Konstantin Belousov , alc , stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 13:24:47 -0000 Hiroki Sato wrote: > Hello, > > I have been in a trouble about my NFS server for a long time. The > symptom is that it stops working in one or two weeks after a boot. I > could not track down the cause yet, but it is reproducible and only > occurred under a very high I/O load. > > It did not panic, just stopped working---while it responded to ping, > userland programs seemed not working. I could break it into DDB and > get a kernel dump. The following URLs are a log of ps, trace, and > etc.: > > http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102 > http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102 > > Does anyone see how to debug this? I guess this is due to a deadlock > somewhere. I have suffered from this problem for almost two years. > The above log is from stable/9 as of Dec 19, but this have persisted > since 8.X. > Well, I took a quick glance at the log and there are a lot of processes sleeping on "pfault" (in vm_waitpfault() in sys/vm/vm_page.c). I'm no vm guy, so I'm not sure when/why that will happen. The comment on the function suggests they are waiting for free pages. Maybe something as simple as running out of swap space or a problem talking to the disk(s) that has the swap partition(s) or ??? (I'm talking through my hat here, because I'm not conversant with the vm side of things.) I might take a closer look this evening and see if I can spot anything in the log, rick ps: I hope Alan and Kostik don't mind being added to the cc list. > -- Hiroki