From owner-freebsd-fs@FreeBSD.ORG Fri Oct 31 00:07:41 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7AC37E43 for ; Fri, 31 Oct 2014 00:07:41 +0000 (UTC) Received: from quine.pinyon.org (quine.pinyon.org [65.101.5.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D88F7A4 for ; Fri, 31 Oct 2014 00:07:41 +0000 (UTC) Received: by quine.pinyon.org (Postfix, from userid 122) id 6869616031A; Thu, 30 Oct 2014 17:07:34 -0700 (MST) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on quine.pinyon.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.0 Received: from feyerabend.n1.pinyon.org (feyerabend.n1.pinyon.org [10.0.10.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by quine.pinyon.org (Postfix) with ESMTPSA id 9935E1602E3; Thu, 30 Oct 2014 17:07:31 -0700 (MST) Message-ID: <5452D2C3.9040902@pinyon.org> Date: Thu, 30 Oct 2014 17:07:31 -0700 From: "Russell L. Carter" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Garrett Wollman , freebsd-fs@freebsd.org Subject: Re: Definite NFS bug References: <21586.48982.64913.250497@khavrinen.csail.mit.edu> In-Reply-To: <21586.48982.64913.250497@khavrinen.csail.mit.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2014 00:07:41 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/30/14 15:44, Garrett Wollman wrote: > Like many other users, I upgrade my FreeBSD servers by > NFS-mounting /usr/src and /usr/obj from a shared build server.[1] > Since I upgraded the build server to 9.3, clients running 9.3 > kernels have been randomly erroring out during installkernel and > installworld. Today I had some time to look more closely into this > and found that the error is definitely coming from the server: at > some point, it just randomly starts returning errors to client > ACCESS and GETATTR operations. The errors are a mix of NFS3ERR_IO > and NFS3ERR_ACCES, but there is nothing on the server to indicate > any kind of error, and restarting the operation on the client > causes it to fail in a different place. With enough patients and > restarts, it's possible to complete the installation in just four > or five passes. > > Needless to say this is a bit worrying. Strangely, 9.1 and 9.2 > clients don't see this issue at all; it's only 9.3 clients that > break. > > It's easy to reproduce, just 'cd /usr/sc && find . -type f > >/dev/null'. It does not seem to depend on the client NFS version > (3 or 4) or implementation ("old" or "new"). I haven't tried the > "old" server yet -- I'll need to figure out how to do that first. > > If anyone is willing to help debug this, I can share a packet > trace, but I don't think it's very informative. Also, if anyone > has a good dtrace script that I could run on the server that would > report what's going on when that first NFS3ERR_IO is returned, that > would be great. This sounds sort of like what I have been complaining about. I of course have no competency here but if I build the world - -j1, I have a much better chance of successful remote installs. The problems I'm seeing on -current for the last few months seem to me to be out-of-date targets, so that the failure is a desire by the remote client to try to rebuild the out-of-date target on the RO file system. My new plan is to dump all of the st_atim and st_mtim for every .depend list on both systems when I see the problem again, to see if something jumps out. I just reinstalled everybody with -j1 builds of r273808M, no problems. Last week however, a fast box failed. Kind of concerning for an install to fail say 2/3 through. I have to admit when soon after I had a crash on that 2/3 system (on NFS unmount), I had to step out of the room for the reboot. Exciting. I am traveling on Sunday for a week, but I've got a few days to run things on several big fast 8cpu boxes (my old laptop is much less afflicted with this problem, though it occasionally fails too). Russell > -GAWollman > > [1] I'd run my own freebsd-update server but unfortunately it is > too tied to building things that look like official FreeBSD > security updates, and isn't really designed for (e.g.) updating > kernels when we change a configuration option. It also doesn't > have any obvious knobs for building with anything other than a > default {make,src}.conf. And with a pkg-able base just around the > corner I don't really want to put much effort into making > freebsd-update do what I want. NFS, on the other hand, is a big > deal and so I need to track down and fix these bugs. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs To > unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJUUtLDAAoJEFnLrGVSDFaEu0MQAJOlPWcsduuiS75LUe42uj+E SRnxSvm5JgUdJojatx7cL5TQjEvXbYov8CE8OLZUqGxIi0D0IdpKlr6WJes8KOUC wfix7doQZQe3IPqgYAJZz0y6j89q6+QABPTS2oy+cPpYmop9568TvuJJZCCixBOF Zv3XYa4I7uIl1pYF2zl2nJHtOwLi2wjT+851heqXo8GvIo8SAhBouTN5biPh2JGl Yabbb4e5xePvigMLEwxbPNslv3nhT1JOcsH9GoFLo5zph2+Txw6ZPy1Sccyv88AQ w5ID129VMzZChX6zYT7+LtJYLmZME3bVrA2R6YeEdnr/Is8qm5eKtpkMrUz+5Qn4 ULf3fJSCjYdlfatfBIFfi2jFJWBkBY7qVu9S5nqfG9yn4DCLY2UYl4skP71Eo4hz DPDKQwpuij/Tf8y459Vj60AsOt87Sh0eYBnW+nWJdgIPWptYLNmjv/VHvC8ZFbnn HsrvUw9DovnTfd7rn+GR4F4+nlnjXqOKdPJtLroId3tSxZzy9L08n7Y6AvAWFFWM oQ4q/B4LxpOmjXqIBTCrC5ux7GdtKGN2gkAYvY4zh3ngPJJ9ts0BRHbq2zRMo9OA eUT8Cf+D/wQcFcd+27eI1RJu8IbyycStwGMXbA57UkvJkfSA5CVpcey+T5z9uyPa 7xlgxCpHOIHSJ6l2BeSQ =4Q5V -----END PGP SIGNATURE-----