Date: Fri, 19 Mar 2010 21:32:50 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Steve Polyack <korvus@comcast.net> Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions <freebsd-questions@freebsd.org> Subject: Re: FreeBSD NFS client goes into infinite retry loop Message-ID: <Pine.GSO.4.63.1003192120470.17841@muncher.cs.uoguelph.ca> In-Reply-To: <4BA3DEBC.2000608@comcast.net> References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 19 Mar 2010, Steve Polyack wrote: > > To anyone who is interested: I did some poking around with DTrace, which led > me to the nfsiod client code. > In src/sys/nfsclient/nfs_nfsiod.c: > } else { > if (bp->b_iocmd == BIO_READ) > (void) nfs_doio(bp->b_vp, bp, bp->b_rcred, NULL); > else > (void) nfs_doio(bp->b_vp, bp, bp->b_wcred, NULL); > } > If you look t nfs_doio(), it decides whether or not to mark the buffer invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO) are not considered fatal, but the others are. (When the async I/O daemons call nfs_doio(), they are threads that couldn't care less if the underlying I/O op succeeded. The outcome of the I/O operation determines what nfs_doio() does with the buffer cache block.) > > The result is that my problematic repeatable circumstance begins logging > "nfssvc_iod: iod 0 nfs_doio returned errno: 5" (corresponding to > NFSERR_INVAL?) for each repetition of the failed write. The only things > triggering this are my failed writes. I can also see the nfsiod0 process > waking up each iteration. > Nope, errno 5 is EIO and that's where the problem is. I don't know why the server is returning EIO after the file has been deleted on the server (I assume you did that when running your little shell script?). > Do we need some kind of "retry x times then abort" logic within nfsiod_iod(), > or does this belong in the subsequent functions, such as nfs_doio()? I think > it's best to avoid these sorts of infinite loops which have the potential to > take out the system or overload the network due to dumb decisions made by > unprivileged users. > Nope, people don't like data not getting written back to a server when it is slow or temporarily network partitioned. The only thing that should stop a client from retrying a write back to the server is a fatal error from the server that says "this won't ever succeed". I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or if the server is sending NFS3ERR_STALE and the client is somehow munging that into EIO, causing the confusion. rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1003192120470.17841>