Date: Thu, 10 Mar 2016 09:29:25 -0500 From: Paul Mather <paul@gromit.dlib.vt.edu> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Ronald Klop <ronald-lists@klop.ws>, freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Subject: Re: Unstable NFS on recent CURRENT Message-ID: <BF9757C7-654D-4FAC-97E4-7E8B36C6E4A7@gromit.dlib.vt.edu> In-Reply-To: <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <op.ydylazgukndu52@ronaldradial.radialsg.local> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mar 9, 2016, at 8:59 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Paul Mather wrote: >> On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem@uoguelph.ca> = wrote: >>=20 >>> Paul Mather wrote: >>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem@uoguelph.ca> = wrote: >>>>=20 >>>>> Paul Mather (forwarded by Ronald Klop) wrote: >>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather >>>>>> <paul@gromit.dlib.vt.edu> >>>>>> wrote: >>>>>>=20 >>>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I = have been >>>>>>> having trouble with NFS. I have been doing a buildworld and >>>>>>> buildkernel >>>>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this = process has >>>>>>> resulted in the buildworld failing at some point, with a variety = of >>>>>>> errors (Segmentation fault; Permission denied; etc.). Even a = "ls -alR" >>>>>>> of /usr/src doesn't manage to complete. It errors out thus: >>>>>>>=20 >>>>>>> =3D=3D=3D=3D=3D >>>>>>> [[...]] >>>>>>> total 0 >>>>>>> ls: ./.svn/pristine/fe: Permission denied >>>>>>>=20 >>>>>>> ./.svn/pristine/ff: >>>>>>> total 0 >>>>>>> ls: ./.svn/pristine/ff: Permission denied >>>>>>> ls: fts_read: Permission denied >>>>>>> =3D=3D=3D=3D=3D >>>>>>>=20 >>>>>>> On the console, I get the following: >>>>>>>=20 >>>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid >>>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS = SERVER OR >>>>>>> MIDDLEWARE) >>>>>>>=20 >>> Oh, I had forgotten this. Here's the comment related to this error. >>> (about line#445 in sys/fs/nfsclient/nfs_clport.c): >>> 446 * BROKEN NFS SERVER OR MIDDLEWARE >>> 447 * >>> 448 * Certain NFS servers (certain old = proprietary filers >>> ca. >>> 449 * 2006) or broken middleboxes (e.g. WAN = accelerator >>> products) >>> 450 * will respond to GETATTR requests with = results for a >>> 451 * different fileid. >>> 452 * >>> 453 * The WAN accelerator we've observed = not only serves >>> stale >>> 454 * cache results for a given file, it = also >>> occasionally serves >>> 455 * results for wholly different files. = This causes >>> surprising >>> 456 * problems; for example the cached size = attribute of >>> a file >>> 457 * may truncate down and then back up, = resulting in >>> zero >>> 458 * regions in file contents read by = applications. We >>> observed >>> 459 * this reliably with Clang and .c files = during >>> parallel build. >>> 460 * A pcap revealed packet fragmentation = and GETATTR >>> RPC >>> 461 * responses with wholly wrong fileids. >>>=20 >>> If you can connect the client->server with a simple switch (or just = an RJ45 >>> cable), it >>> might be worth testing that way. (I don't recall the name of the = middleware >>> product, but >>> I think it was shipped by one of the major switch vendors. I also = don't >>> know if the product >>> supports NFSv4?) >>>=20 >>> rick >>=20 >>=20 >> Currently, the client is connected to the server via a dumb gigabit = switch, >> so it is already fairly direct. >>=20 >> As for the above error, it appeared on the console only once. (Sorry = if I >> made it sound like it appears every time.) >>=20 >> I just tried another buildworld attempt via NFS and it failed again. = This >> time, I get this on the BeagleBone Black console: >>=20 >> nfs_getpages: error 13 >> vm_fault: pager read error, pid 5401 (install) >>=20 > 13 is EACCES and could be caused by what I mention below. (Any mount = of a file > system on the server unless "-S" is specified as a flag for mountd.) >=20 >>=20 >> The other thing I have noticed is that if I induce heavy load on the = NFS >> server---e.g., by starting a Poudriere bulk build---then that = provokes the >> client to crash much more readily. For example, I started a NFS = buildworld >> on the BeagleBone Black, and it seemed to be chugging along nicely. = The >> moment I kicked off a Poudriere build update of my packages on the = NFS >> server, it crashed the buildworld on the NFS client. >>=20 > Try adding "-S" to mountd_flags on the server. Any time file systems = are mounted > (and Poudriere likes to do that, I am told), mount sends a SIGHUP to = mountd to > reload /etc/exports. When /etc/exports are being reloaded, there will = be access > errors for mounts (that are temporarily not exported) unless you = specify "-S" > (which makes mountd suspend the nfsd threads during the reload of = /etc/exports). >=20 > rick Bingo! I think we may have a winner. I added that flag to mountd_flags = on the server and the "instability" appears to have gone away. It may be that all along the NFS problems on the client just coincided = with Poudriere runs on the server. I build custom packages for my local = machines using Poudriere so I use it quite a lot. Maybe the Poudriere = port should come with a warning at install to those using NFS that it = may provoke disruption and suggest the addition of "-S"? = (Alternatively, maybe "-S" could become a default for mountd_flags? Is = there a downside from using it that means making it a default option is = unsuitable?) Anyway, many, many thanks for all the help, Rick. I'll keep monitoring = my BeagleBone Black, but it looks for now that this has solved the NFS = "instability." Cheers, Paul.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BF9757C7-654D-4FAC-97E4-7E8B36C6E4A7>