Date: Tue, 2 Feb 2016 01:26:43 -0500 From: Charles Sprickman <spork@bway.net> To: Ben Woods <woodsb02@gmail.com> Cc: Vick Khera <vivek@khera.org>, freebsd-fs@freebsd.org, "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Re: NFS unstable with high load on server Message-ID: <5EAD4A4A-211F-451E-A3B9-752DAC6D94B4@bway.net> In-Reply-To: <CAOc73CCHS4r-proJ_jT4T%2BBfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com> References: <CALd%2BdcfzPU=nMGo41BBZzt3jQnsQJaANVyA222TDM_is2Ueo0A@mail.gmail.com> <CAOc73CCHS4r-proJ_jT4T%2BBfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 2, 2016, at 1:10 AM, Ben Woods <woodsb02@gmail.com> wrote: >=20 > On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote: >=20 >> I have a handful of servers at my data center all running FreeBSD = 10.2. On >> one of them I have a copy of the FreeBSD sources shared via NFS. When = this >> server is running a large poudriere run re-building all the ports I = need, >> the clients' NFS mounts become unstable. That is, the clients keep = getting >> read failures. The interactive performance of the NFS server is just = fine, >> however. The local file system is a ZFS mirror. >>=20 >> What could be causing NFS to be unstable in this situation? >>=20 >> Specifics: >>=20 >> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with = NFS >> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad >> processor. >>=20 >> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS >> exported via the ZFS exports file. I put the FreeBSD sources on this >> dataset and symlink to /usr/src. >>=20 >>=20 >> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, = NFS >> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor = (basically >> same hardware but more RAM). >>=20 >> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS >> options are "intr,nolockd". /usr/src is symlinked to the sources in = that >> NFS mount. >>=20 >>=20 >> What I observe: >>=20 >> [lorax]~% cd /usr/src >> [lorax]src% svn status >> [lorax]src% w >> 9:12AM up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61 >> USER TTY FROM LOGIN@ IDLE WHAT >> vivek pts/0 vick.int.kcilink.com 8:44AM - tmux: = client >> (/tmp/ >> vivek pts/1 tmux(19747).%0 8:44AM 19 sed >> y%*+%pp%;s%[^_a >> vivek pts/2 tmux(19747).%1 8:56AM - w >> vivek pts/3 tmux(19747).%2 8:56AM - slogin >> bluefish-prv >> [lorax]src% pwd >> /u/lorax1/usr10/src >>=20 >> So right now the load average is more than 1 per processor on lorax. = I can >> quite easily run "svn status" on the source directory, and the = interactive >> performance is pretty snappy for editing local files and navigating = around >> the file system. >>=20 >>=20 >> On the client: >>=20 >> [bluefish]~% cd /usr/src >> [bluefish]src% pwd >> /n/lorax1/usr10/src >> [bluefish]src% svn status >> svn: E070008: Can't read directory = '/n/lorax1/usr10/src/contrib/sqlite3': >> Partial results are valid but processing is incomplete >> [bluefish]src% svn status >> svn: E070008: Can't read directory = '/n/lorax1/usr10/src/lib/libfetch': >> Partial results are valid but processing is incomplete >> [bluefish]src% svn status >> svn: E070008: Can't read directory >> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results = are >> valid but processing is incomplete >> [bluefish]src% w >> 9:14AM up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15 >> USER TTY FROM LOGIN@ IDLE WHAT >> vivek pts/0 lorax-prv.kcilink.com 8:56AM - w >> [bluefish]src% df . >> Filesystem 1K-blocks Used Avail Capacity Mounted on >> lorax-prv:/u/lorax1 932845181 6090910 926754271 1% /n/lorax1 >>=20 >>=20 >> What I see is more or less random failures to read the NFS volume. = When the >> server is not so busy running poudriere builds, the client never has = any >> failures. >>=20 >> I also observe this kind of failure doing buildworld or installworld = on >> the client when the server is busy -- I get strange random failures = reading >> the files causing the build or install to fail. >>=20 >> My workaround is to not do build/installs on client machines when the = NFS >> server is busy doing large jobs like building all packages, but there = is >> definitely something wrong here I'd like to fix. I observe this on = all the >> local NFS clients. I rebooted the server before to try to clear this = up but >> it did not fix it. >>=20 >> Any help would be appreciated. >>=20 >=20 > I just wanted to point out that I am experiencing this exact same = issue in > my home setup. >=20 > Performing an installworld from an NFS mount works perfectly, until I = start > running poudriere on the NFS server. Then I start getting NFS timeouts = and > the installworld fails. >=20 > The NFS server is also using ZFS, but the NFS export in my case is = being > done via the ZFS property "sharenfs" (I am not using the /etc/exports = file). Me three. I=E2=80=99m actually updating a small group of servers now = and started=20 blowing up my installworlds by trying to do some poudriere builds at the = same=20 time. Very repeatable. Of note, I=E2=80=99m on 9.3, and saw this on = 8.4 as well. If I=20 track down the client-side failures, it=E2=80=99s always =E2=80=9Cpermissi= on denied=E2=80=9D. Thanks, Charles >=20 > I suspect this will boil down to a ZFS tuning issue, where poudriere = and > installworld are both stress testing the server. Both of these would > obviously cause significant memory and CPU usage, and the "recently = used" > portion of the ARC to be constantly flushed as they access a large = number > of different files. >=20 > It might be interesting if you could report the output of the heading = lines > (including memory and ARC details) from the "top" command before/after > running poudriere and attempting the installworld. >=20 > Regards, > Ben >=20 >=20 > --=20 >=20 > -- > From: Benjamin Woods > woodsb02@gmail.com > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5EAD4A4A-211F-451E-A3B9-752DAC6D94B4>