From owner-freebsd-arm@freebsd.org Wed Mar 9 16:12:44 2016 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 869BCAC9755; Wed, 9 Mar 2016 16:12:44 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.126.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5CDA1C50; Wed, 9 Mar 2016 16:12:44 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from pmather.lib.vt.edu (pmather.lib.vt.edu [128.173.126.193]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id E5109F21; Wed, 9 Mar 2016 11:12:36 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Unstable NFS on recent CURRENT From: Paul Mather In-Reply-To: <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> Date: Wed, 9 Mar 2016 11:12:36 -0500 Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 16:12:44 -0000 On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: > Paul Mather wrote: >> On Mar 7, 2016, at 9:55 PM, Rick Macklem = wrote: >>=20 >>> Paul Mather (forwarded by Ronald Klop) wrote: >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather = >>>> wrote: >>>>=20 >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have = been >>>>> having trouble with NFS. I have been doing a buildworld and = buildkernel >>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this = process has >>>>> resulted in the buildworld failing at some point, with a variety = of >>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls = -alR" >>>>> of /usr/src doesn't manage to complete. It errors out thus: >>>>>=20 >>>>> =3D=3D=3D=3D=3D >>>>> [[...]] >>>>> total 0 >>>>> ls: ./.svn/pristine/fe: Permission denied >>>>>=20 >>>>> ./.svn/pristine/ff: >>>>> total 0 >>>>> ls: ./.svn/pristine/ff: Permission denied >>>>> ls: fts_read: Permission denied >>>>> =3D=3D=3D=3D=3D >>>>>=20 >>>>> On the console, I get the following: >>>>>=20 >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER = OR >>>>> MIDDLEWARE) >>>>>=20 > Oh, I had forgotten this. Here's the comment related to this error. > (about line#445 in sys/fs/nfsclient/nfs_clport.c): > 446 * BROKEN NFS SERVER OR MIDDLEWARE > 447 * > 448 * Certain NFS servers (certain old proprietary = filers ca. > 449 * 2006) or broken middleboxes (e.g. WAN = accelerator products) > 450 * will respond to GETATTR requests with results = for a > 451 * different fileid. > 452 * > 453 * The WAN accelerator we've observed not only = serves stale > 454 * cache results for a given file, it also = occasionally serves > 455 * results for wholly different files. This = causes surprising > 456 * problems; for example the cached size = attribute of a file > 457 * may truncate down and then back up, resulting = in zero > 458 * regions in file contents read by = applications. We observed > 459 * this reliably with Clang and .c files during = parallel build. > 460 * A pcap revealed packet fragmentation and = GETATTR RPC > 461 * responses with wholly wrong fileids. >=20 > If you can connect the client->server with a simple switch (or just an = RJ45 cable), it > might be worth testing that way. (I don't recall the name of the = middleware product, but > I think it was shipped by one of the major switch vendors. I also = don't know if the product > supports NFSv4?) >=20 > rick Currently, the client is connected to the server via a dumb gigabit = switch, so it is already fairly direct. As for the above error, it appeared on the console only once. (Sorry if = I made it sound like it appears every time.) I just tried another buildworld attempt via NFS and it failed again. = This time, I get this on the BeagleBone Black console: nfs_getpages: error 13 vm_fault: pager read error, pid 5401 (install) The other thing I have noticed is that if I induce heavy load on the NFS = server---e.g., by starting a Poudriere bulk build---then that provokes = the client to crash much more readily. For example, I started a NFS = buildworld on the BeagleBone Black, and it seemed to be chugging along = nicely. The moment I kicked off a Poudriere build update of my packages = on the NFS server, it crashed the buildworld on the NFS client. I have had problems with swap on FreeBSD/arm before. Swapping to a file = does not appear to work for me. As a result, I switched to swapping to = a partition on the SD card. Maybe this is unreliable, too? Cheers, Paul.