From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 15 15:29:01 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4F2621065672
	for <freebsd-fs@freebsd.org>; Wed, 15 Sep 2010 15:29:01 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 0AF818FC32
	for <freebsd-fs@freebsd.org>; Wed, 15 Sep 2010 15:29:00 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEABuDkEyDaFvO/2dsb2JhbACDG587sjKSGYEigyt0BIoshHc
X-IronPort-AV: E=Sophos;i="4.56,371,1280721600"; d="scan'208";a="91964071"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Sep 2010 11:29:00 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0AFC5B3F21;
	Wed, 15 Sep 2010 11:29:00 -0400 (EDT)
Date: Wed, 15 Sep 2010 11:28:59 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Eric Crist <ecrist@secure-computing.net>,
	Thomas Johnson <tom@claimlynx.com>
Message-ID: <1260697257.960376.1284564539991.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <D3BB029B-C385-438C-ADA6-809E2B6709C7@claimlynx.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [24.65.230.102]
X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3
	(Mac)/6.0.7_GA_2473.RHEL4_64)
Cc: freebsd-fs@freebsd.org
Subject: Re: NFS nfs_getpages errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Sep 2010 15:29:01 -0000

> Hey folks,
> 
> We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with
> NFS root. On these machines, we run proftpd and apache 2.2. Over the
> past couple weeks, we've seen a ton of errors as follows:
> 
> Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0
> (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating
> (signal 11)
> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
> (proftpd)
> Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 proftpd[31761]:
> 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD
> terminating (signal 11)
> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
> (proftpd)
> Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on
> signal 11
> 
> These, in this case, occurred on three of the four machines until
> midnight after which all three of the machines had proftpd exit on
> signal 11. The message above was for child processes. At midnight, the
> logfile rotated, and newsyslog sent singal 1 to the parent process,
> which I think finally finished it off. The fourth machine remained
> running and did not display these messages.
> 
> The number following 'nfs_getpages: error' changes for each cycle and
> I'm not certain if any of them repeat.
> 
Well, at a quick glance, those errors seem to be coming from the NFS
server in a read reply. Also, the error values seem bogus, since they
should be small positive numbers (1<->70 + a few just above 10000).

Could you possibly get a packet capture when one of these happens?
("tcpdump -s -0 -w xxx host <nfs-server>" would suffice, but you need to
 have it running when the error occurs. If you can reproduce it by
 talking to the proftpd server, so the tcpdump doesn't run for too
 long, that would be best.)

You can look in the tcpdump via wireshark and see what it being returned
for the Read RPCs at that time. (You can email me the "xxx" packet trace
as an attachment and I can look at it, if you get that far.)

rick
ps: Otherwise, I'd go look at your NFS server and see if it's logging
    errors or if there are indications of problems.