From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 15 15:44:13 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A4BE910656A6
	for <freebsd-fs@freebsd.org>; Wed, 15 Sep 2010 15:44:13 +0000 (UTC)
	(envelope-from ecrist@secure-computing.net)
Received: from kenny.secure-computing.net (unknown
	[IPv6:2001:470:1f11:463::210])
	by mx1.freebsd.org (Postfix) with ESMTP id 588788FC14
	for <freebsd-fs@freebsd.org>; Wed, 15 Sep 2010 15:44:13 +0000 (UTC)
Received: from swordfish.ply.claimlynx.com (mtka.claimlynx.com [74.95.66.25])
	(using TLSv1 with cipher AES128-SHA (128/128 bits))
	(No client certificate requested)
	(Authenticated sender: ecrist@secure-computing.net)
	by kenny.secure-computing.net (Postfix) with ESMTP id 907FF2E06D;
	Wed, 15 Sep 2010 10:44:12 -0500 (CDT)
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset=us-ascii
From: Eric Crist <ecrist@secure-computing.net>
In-Reply-To: <4C90E88D.9050608@comcast.net>
Date: Wed, 15 Sep 2010 10:44:11 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <ABD892F1-F41D-483C-8571-34DEAC51CF11@secure-computing.net>
References: <1260697257.960376.1284564539991.JavaMail.root@erie.cs.uoguelph.ca>
	<4C90E88D.9050608@comcast.net>
To: Steve Polyack <korvus@comcast.net>
X-Mailer: Apple Mail (2.1081)
Cc: freebsd-fs@freebsd.org, Thomas Johnson <tom@claimlynx.com>
Subject: Re: NFS nfs_getpages errors
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Sep 2010 15:44:13 -0000

On Sep 15, 2010, at 10:38:53, Steve Polyack wrote:

> On 09/15/10 11:28, Rick Macklem wrote:
>>> Hey folks,
>>>=20
>>> We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with
>>> NFS root. On these machines, we run proftpd and apache 2.2. Over the
>>> past couple weeks, we've seen a ton of errors as follows:
>>>=20
>>> Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0
>>> (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating
>>> (signal 11)
>>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>>> (proftpd)
>>> Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 =
proftpd[31761]:
>>> 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD
>>> terminating (signal 11)
>>> Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552
>>> Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761
>>> (proftpd)
>>> Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on
>>> signal 11
>>>=20
>>> These, in this case, occurred on three of the four machines until
>>> midnight after which all three of the machines had proftpd exit on
>>> signal 11. The message above was for child processes. At midnight, =
the
>>> logfile rotated, and newsyslog sent singal 1 to the parent process,
>>> which I think finally finished it off. The fourth machine remained
>>> running and did not display these messages.
>>>=20
>>> The number following 'nfs_getpages: error' changes for each cycle =
and
>>> I'm not certain if any of them repeat.
>>>=20
>> Well, at a quick glance, those errors seem to be coming from the NFS
>> server in a read reply. Also, the error values seem bogus, since they
>> should be small positive numbers (1<->70 + a few just above 10000).
> We see these errors on some 8.1 clients as well:
> nfs_getpages: error 1110586608
> nfs_getpages: error 1108948624
> vm_fault: pager read error, pid 56216 (php)
> nfs_getpages: error 1114969744
> vm_fault: pager read error, pid 54770 (php)
> nfs_getpages: error 1137006224
> vm_fault: pager read error, pid 50578 (php)
>=20
> They do not show up often, so we haven't spent much time looking into =
it (no tcpdumps yet).  Our NFS server is a 8-STABLE system backed by =
ZFS, so maybe its related to that (again :) ).
>=20
> Eric, is your NFS server backed by ZFS as well?
>=20
> The NFS server doesn't seem to be logging any errors, but the =
ret-failed count is always increasing:
> Server Info:
>  Getattr   Setattr    Lookup  Readlink      Read     Write    Create   =
 Remove
> 543523097  14397049 1949982185      6380  17587820  14002952   8980955 =
  8070238
>   Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus   =
 Access
>  6966495         9      1668   1117125    904969   5567689     22307 =
184929325
>    Mknod    Fsstat    Fsinfo  PathConf    Commit
>        0 338500745        57         0   7129262
> Server Ret-Failed
>         29089796
> Server Faults
>            0
> Server Cache Stats:
>   Inprog      Idem  Non-idem    Misses
>        0         0         0         0
> Server Write Gathering:
> WriteOps  WriteRPC   Opsaved
> 14001235  14002952      1717
>=20
>> Could you possibly get a packet capture when one of these happens?
>> ("tcpdump -s -0 -w xxx host<nfs-server>" would suffice, but you need =
to
>>  have it running when the error occurs. If you can reproduce it by
>>  talking to the proftpd server, so the tcpdump doesn't run for too
>>  long, that would be best.)
>>=20
>> You can look in the tcpdump via wireshark and see what it being =
returned
>> for the Read RPCs at that time. (You can email me the "xxx" packet =
trace
>> as an attachment and I can look at it, if you get that far.)
>>=20
>> rick
>> ps: Otherwise, I'd go look at your NFS server and see if it's logging
>>     errors or if there are indications of problems.

The NFS server is logging nothing at all related to NFS.  It *is* =
running 8.1-RC2, so there is potential for an update.  If/when we notice =
these errors again, we'll try to get a packet capture and forward it to =
you.  Our NFS server is backed by ZFS, as well.

Eric