Date: Wed, 9 Mar 2016 20:59:56 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Paul Mather <paul@gromit.dlib.vt.edu> Cc: Ronald Klop <ronald-lists@klop.ws>, freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Subject: Re: Unstable NFS on recent CURRENT Message-ID: <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <op.ydylazgukndu52@ronaldradial.radialsg.local> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Paul Mather wrote: > On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > > > Paul Mather wrote: > >> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > >> > >>> Paul Mather (forwarded by Ronald Klop) wrote: > >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >>>> <paul@gromit.dlib.vt.edu> > >>>> wrote: > >>>> > >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > >>>>> having trouble with NFS. I have been doing a buildworld and > >>>>> buildkernel > >>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has > >>>>> resulted in the buildworld failing at some point, with a variety of > >>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > >>>>> of /usr/src doesn't manage to complete. It errors out thus: > >>>>> > >>>>> ===== > >>>>> [[...]] > >>>>> total 0 > >>>>> ls: ./.svn/pristine/fe: Permission denied > >>>>> > >>>>> ./.svn/pristine/ff: > >>>>> total 0 > >>>>> ls: ./.svn/pristine/ff: Permission denied > >>>>> ls: fts_read: Permission denied > >>>>> ===== > >>>>> > >>>>> On the console, I get the following: > >>>>> > >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>>>> MIDDLEWARE) > >>>>> > > Oh, I had forgotten this. Here's the comment related to this error. > > (about line#445 in sys/fs/nfsclient/nfs_clport.c): > > 446 * BROKEN NFS SERVER OR MIDDLEWARE > > 447 * > > 448 * Certain NFS servers (certain old proprietary filers > > ca. > > 449 * 2006) or broken middleboxes (e.g. WAN accelerator > > products) > > 450 * will respond to GETATTR requests with results for a > > 451 * different fileid. > > 452 * > > 453 * The WAN accelerator we've observed not only serves > > stale > > 454 * cache results for a given file, it also > > occasionally serves > > 455 * results for wholly different files. This causes > > surprising > > 456 * problems; for example the cached size attribute of > > a file > > 457 * may truncate down and then back up, resulting in > > zero > > 458 * regions in file contents read by applications. We > > observed > > 459 * this reliably with Clang and .c files during > > parallel build. > > 460 * A pcap revealed packet fragmentation and GETATTR > > RPC > > 461 * responses with wholly wrong fileids. > > > > If you can connect the client->server with a simple switch (or just an RJ45 > > cable), it > > might be worth testing that way. (I don't recall the name of the middleware > > product, but > > I think it was shipped by one of the major switch vendors. I also don't > > know if the product > > supports NFSv4?) > > > > rick > > > Currently, the client is connected to the server via a dumb gigabit switch, > so it is already fairly direct. > > As for the above error, it appeared on the console only once. (Sorry if I > made it sound like it appears every time.) > > I just tried another buildworld attempt via NFS and it failed again. This > time, I get this on the BeagleBone Black console: > > nfs_getpages: error 13 > vm_fault: pager read error, pid 5401 (install) > 13 is EACCES and could be caused by what I mention below. (Any mount of a file system on the server unless "-S" is specified as a flag for mountd.) > > The other thing I have noticed is that if I induce heavy load on the NFS > server---e.g., by starting a Poudriere bulk build---then that provokes the > client to crash much more readily. For example, I started a NFS buildworld > on the BeagleBone Black, and it seemed to be chugging along nicely. The > moment I kicked off a Poudriere build update of my packages on the NFS > server, it crashed the buildworld on the NFS client. > Try adding "-S" to mountd_flags on the server. Any time file systems are mounted (and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd to reload /etc/exports. When /etc/exports are being reloaded, there will be access errors for mounts (that are temporarily not exported) unless you specify "-S" (which makes mountd suspend the nfsd threads during the reload of /etc/exports). rick > I have had problems with swap on FreeBSD/arm before. Swapping to a file does > not appear to work for me. As a result, I switched to swapping to a > partition on the SD card. Maybe this is unreliable, too? > > Cheers, > > Paul. > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?508973676.11871738.1457575196588.JavaMail.zimbra>