Date: Wed, 15 Sep 2010 11:28:59 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Eric Crist <ecrist@secure-computing.net>, Thomas Johnson <tom@claimlynx.com> Cc: freebsd-fs@freebsd.org Subject: Re: NFS nfs_getpages errors Message-ID: <1260697257.960376.1284564539991.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <D3BB029B-C385-438C-ADA6-809E2B6709C7@claimlynx.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> Hey folks, > > We've got 4 servers running FreeBSD 8.1-RELEASE which PXE boot with > NFS root. On these machines, we run proftpd and apache 2.2. Over the > past couple weeks, we've seen a ton of errors as follows: > > Sep 14 20:28:59 lion-3 proftpd[31761]: 0.0.0.0 > (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD terminating > (signal 11) > Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552 > Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761 > (proftpd) > Sep 14 20:28:59 lion-3 kernel: Sep 14 20:28:59 lion-3 proftpd[31761]: > 0.0.0.0 (folsom-1-red.claimlynx.com[216.17.68.130]) - ProFTPD > terminating (signal 11) > Sep 14 20:28:59 lion-3 kernel: nfs_getpages: error 1046353552 > Sep 14 20:28:59 lion-3 kernel: vm_fault: pager read error, pid 31761 > (proftpd) > Sep 14 20:28:59 lion-3 kernel: pid 31761 (proftpd), uid 0: exited on > signal 11 > > These, in this case, occurred on three of the four machines until > midnight after which all three of the machines had proftpd exit on > signal 11. The message above was for child processes. At midnight, the > logfile rotated, and newsyslog sent singal 1 to the parent process, > which I think finally finished it off. The fourth machine remained > running and did not display these messages. > > The number following 'nfs_getpages: error' changes for each cycle and > I'm not certain if any of them repeat. > Well, at a quick glance, those errors seem to be coming from the NFS server in a read reply. Also, the error values seem bogus, since they should be small positive numbers (1<->70 + a few just above 10000). Could you possibly get a packet capture when one of these happens? ("tcpdump -s -0 -w xxx host <nfs-server>" would suffice, but you need to have it running when the error occurs. If you can reproduce it by talking to the proftpd server, so the tcpdump doesn't run for too long, that would be best.) You can look in the tcpdump via wireshark and see what it being returned for the Read RPCs at that time. (You can email me the "xxx" packet trace as an attachment and I can look at it, if you get that far.) rick ps: Otherwise, I'd go look at your NFS server and see if it's logging errors or if there are indications of problems.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1260697257.960376.1284564539991.JavaMail.root>