From owner-freebsd-fs@FreeBSD.ORG Wed Jul 25 16:58:54 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8EFF16A421 for ; Wed, 25 Jul 2007 16:58:54 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from phoenix.cs.uoguelph.ca (phoenix.cs.uoguelph.ca [131.104.94.216]) by mx1.freebsd.org (Postfix) with ESMTP id 881B313C457 for ; Wed, 25 Jul 2007 16:58:54 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.96.170]) by phoenix.cs.uoguelph.ca (8.13.1/8.13.1) with ESMTP id l6PGwrcU014063 for ; Wed, 25 Jul 2007 12:58:53 -0400 Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id l6PH3Jw02212 for ; Wed, 25 Jul 2007 13:03:19 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 25 Jul 2007 13:03:19 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Scanned-By: MIMEDefang 2.57 on 131.104.94.216 Subject: handling unresonsive NFS servers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2007 16:58:54 -0000 I have been thinking about what to do on a client when an NFS server is unresponsive and thought I'd email to see what others thought? "intr mounts" - These don't work correctly and it is nearly impossible to make them work correctly. The problem is that, often, the process which has a termination signal posted against it is blocked waiting for some resource (vnode lock, buffer cache block,...) that another process that is waiting for an RPC reply from the unresponsive server, holds. Also, for NFSv4, a client can't just forget about an RPC that alters state on the server. If it does so, the RPC may have been performed on the server and the client's view of state might become inconsistent with the server's view. (As such, I feel this should be "deprecated or disabled". I don't like things that "sorta work", but I can understand why some might feel that it should remain for NFSv2,3.) "soft mounts" - These have the problem that system calls may terminate abnormally when all you have is a slow, heavily loaded server. As such, they might be ok for read-only mounts using NFSv2,3, but seem too dangerous for anything else. (Very few apps. expect an I/O system call to fail with ETIMEDOUT.) So, about all I can think to do is make "umount -f" work properly. Since it terminates all outstanding RPCs on the mount point (and gets rid of all state for NFSv4), this can be made to work well. (Mac OS X does this.) A problem with this is that it can only be done by someone with system priviledge. However, it seems to me that most systems are either personal (laptops or desktops) where the person has system priviledge OR systems running as servers in machine room environments. The latter usually have sysadmin monitoring and also tend to talk to NFS servers where connectivity seldom goes away. As such, needing system priviledge doesn't seem too serious an issue to me. Any other thoughts? rick