From owner-freebsd-fs@FreeBSD.ORG Wed Sep 15 15:29:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4F2621065672 for ; Wed, 15 Sep 2010 15:29:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 0AF818FC32 for ; Wed, 15 Sep 2010 15:29:00 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEABuDkEyDaFvO/2dsb2JhbACDG587sjKSGYEigyt0BIoshHc X-IronPort-AV: E=Sophos;i="4.56,371,1280721600"; d="scan'208";a="91964071" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Sep 2010 11:29:00 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0AFC5B3F21; Wed, 15 Sep 2010 11:29:00 -0400 (EDT) Date: Wed, 15 Sep 2010 11:28:59 -0400 (EDT) From: Rick Macklem To: Eric Crist , Thomas Johnson Message-ID: <1260697257.960376.1284564539991.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [24.65.230.102] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL4_64) Cc: freebsd-fs@freebsd.org Subject: Re: NFS nfs_getpages errors X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 15:29:01 -0000 > Hey folks, > > We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with > NFS root. On these machines, we run proftpd and apache 2.2. Over the > past couple weeks, we've seen a ton of errors as follows: > > Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0 > (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating > (signal 11) > Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552 > Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761 > (proftpd) > Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 proftpd[31761]: > 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD > terminating (signal 11) > Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552 > Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761 > (proftpd) > Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on > signal 11 > > These, in this case, occurred on three of the four machines until > midnight after which all three of the machines had proftpd exit on > signal 11. The message above was for child processes. At midnight, the > logfile rotated, and newsyslog sent singal 1 to the parent process, > which I think finally finished it off. The fourth machine remained > running and did not display these messages. > > The number following 'nfs_getpages: error' changes for each cycle and > I'm not certain if any of them repeat. > Well, at a quick glance, those errors seem to be coming from the NFS server in a read reply. Also, the error values seem bogus, since they should be small positive numbers (1<->70 + a few just above 10000). Could you possibly get a packet capture when one of these happens? ("tcpdump -s -0 -w xxx host " would suffice, but you need to have it running when the error occurs. If you can reproduce it by talking to the proftpd server, so the tcpdump doesn't run for too long, that would be best.) You can look in the tcpdump via wireshark and see what it being returned for the Read RPCs at that time. (You can email me the "xxx" packet trace as an attachment and I can look at it, if you get that far.) rick ps: Otherwise, I'd go look at your NFS server and see if it's logging errors or if there are indications of problems.