From owner-freebsd-fs@FreeBSD.ORG Wed Jul 25 18:37:17 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D235416A417 for ; Wed, 25 Jul 2007 18:37:17 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.layeredtech.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id 1A21913C442 for ; Wed, 25 Jul 2007 18:37:17 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.local (209-163-168-124.static.twtelecom.net [209.163.168.124]) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l6PIbGta046864; Wed, 25 Jul 2007 13:37:16 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <46A7985C.3010202@freebsd.org> Date: Wed, 25 Jul 2007 13:37:16 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.5 (Macintosh/20070716) MIME-Version: 1.0 To: Jim Rees References: <20070725171214.GC25749@citi.umich.edu> In-Reply-To: <20070725171214.GC25749@citi.umich.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-fs@freebsd.org Subject: Re: handling unresonsive NFS servers X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2007 18:37:17 -0000 Jim Rees wrote: > Afs has the same problem, and solves it by marking a server "down" when it > doesn't respond. The timeout is very long, like a minute or more. Normally > this would permanently hang the client, but once the server is marked down, > any subsequent operations fail immediately. The client checks periodically > to see if the server has come back up. Failing this way is better than > waiting forever, because waiting forever results in a reboot when the > machine's owner runs out of patience. For 'fail immediately', what does that mean? It returns EIO? That might be sufficient, although I think 1min is pretty low for NFS. Of course, if it's settable, then that's good. :) Eric