Date: Mon, 16 Mar 1998 17:09:01 +0200 From: Anatoly Vorobey <mellon@pobox.com> To: fs@FreeBSD.ORG Subject: NFS Message-ID: <19980316170901.06534@techunix.technion.ac.il>
next in thread | raw e-mail | index | archive | help
This is a possibly clueless question; a few days ago I knew nothing about NFS internals, so please forgive my ignorance. I'm trying to learn NFS and VFS internals by debugging a few crash scenarios. How is NFS supposed (if it is) to deal with deadlocks resulting from upcalls? Example: currently it's possible to hang the machine by mounting an NFS-exported fs _locally_, on the same machine, and copying with cp or dd a large (>2Mb) file from a local fs to the "imported" fs. E.g. mount_nfs localhost:/usr /local ; cp LARGEFILE /local The systems timeouts indefinitely in NFS client code; softmounting does not solve the problem. Using NFS 2 does solve the problem. None of the latest John Dyson's fixes addresses this; it seems to be a more fundamental problem. Here's why it happens. As cp keeps issuing write()'s, and they become nfs_write()'s, nfs_write() keeps filling buffer after buffer and calls nfs_doio to write them. Since it's NFS 3, the write is async by default (until commit comes along), and nfs_doio marks the buffer dirty and delayed-write, and sends the write (later biodone will release the buffer onto the dirty queue). The "server", which is the same machine, keeps receiving these writes, and since it's NFS 3 again, it calls bdwrite() instead of bwrite(), also putting them onto the dirty queue. At some point, server's bdwrite() will discover there're too many dirty buffers (numdirtybuffers>=highdirtybuffers, which is 256 by default, thus the approx. 2Mb limit), and will try to flush dirty buffers. However, some of those dirty buffers are _client's_ dirty buffers, and flushing them will try to nfs_commit(). This nfs_commit() will fail because we still haven't returned from the previous nfs_write() (the server needs to flush buffers in order to perform it). We're in a deadlock. If it's a soft mount, after a few minutes nfs_write() will timeout, and nfs_commit() will get a chance to receive its reply from the server; however, it won't: the server is locked trying to nfsrv_commit() - it can't do that before nfsrv_write()->bdwrwite()->flushdirtybuffers() return. The client can't even resend commit since the NFS send window shrinked after all those timeouts. Note that although importing NFS-exported fs locally is bizzarre, the same scenario can happen on two machines which are importing from each other, when there're enough dirty buffers on each. The problem is, formally, that nfs_writerpc() which is on a layer lower than buffercache, leads to an upcall on the server which can lead to the server's calls on the buffercache layer. There may be different possible ways to fix this, but I'm not even sure at this point it's considered a problem, and how bad should it be considered. (I don't have two machines to test a deadlock between two). Note that if you cat LARGEFILE instead of cp or dd, it never hangs. The reason is that cat sends 1024-byte blocks instead of full buffers or more to nfs_write, and nfs_write deals with it by bdwrite()'ing them and not calling nfs_doio() at all - it'll get called later when there's a need to purge the dirty cache. This consideration leads to discovering a _bug_ in nfs_doio: when it both sends a full buffer and puts it into dirty cache, it never checks if there's a need to flush buffers, and numdirtybuffers merrily grows much greater than highdirtybuffers (it can't check really, because it doesn't see highdirtybuffers which is local to vfs_bio.c; it shouldn't ++numdirtybuffers itself but rather should call bdirty (not bdwrite()), which is currently never called by anyone and should also be slightly modified; I can send a patch for this to whomever's interested). -- Anatoly Vorobey, mellon@pobox.com http://pobox.com/~mellon/ "Angels can fly because they take themselves lightly" - G.K.Chesterton To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980316170901.06534>