From owner-freebsd-fs@FreeBSD.ORG Sun Apr 14 18:58:16 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 394ED16B for ; Sun, 14 Apr 2013 18:58:16 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-vb0-x236.google.com (mail-vb0-x236.google.com [IPv6:2607:f8b0:400c:c02::236]) by mx1.freebsd.org (Postfix) with ESMTP id F1B12DBA for ; Sun, 14 Apr 2013 18:58:15 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id w16so3304167vbf.27 for ; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=ewlU8n7ARbOs00OqEYT6KyC/XB5MbBDFjacepi2fRA0=; b=MxD8xSd1gphZFlXDMcbTXp+Mg6PxTCd2xMUK5Nqw4CkCyaoXOLSx/wlAURh2TnEvF6 va3rYZyzuUz0Vz60nRzMKVbRMU8VDtCHI4rB0/roFc+FkkbonZxTTHOWKw4J+QGoEwgV TpDofOE2wY5pKY4128kL/OMsxs5cR/T02zrZzw8TylajouBo66erex+WfCEiW9FuqEDW BdwPGiFPTAofLUSPjK4z/85O3GO5EHdGkZvIgqtZ1Wxr0dTC0AWC4GtYtwOrvMSunNzN U3DZPW0B/gzf7zzqNaBTm9MguvyA4jg1mbWuyLchROfxqXnp1nRKEOuwcC3gH1CKdn9V 1gjw== MIME-Version: 1.0 X-Received: by 10.52.183.36 with SMTP id ej4mr12056052vdc.95.1365965895509; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) Received: by 10.220.91.83 with HTTP; Sun, 14 Apr 2013 11:58:15 -0700 (PDT) In-Reply-To: <20130414185117.GA38259@icarus.home.lan> References: <516A8092.2080002@o2.pl> <9C59759CB64B4BE282C1D1345DD0C78E@multiplay.co.uk> <516AF61B.7060204@o2.pl> <20130414185117.GA38259@icarus.home.lan> Date: Sun, 14 Apr 2013 14:58:15 -0400 Message-ID: Subject: Re: A failed drive causes system to hang From: Zaphod Beeblebrox To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs , =?UTF-8?B?UmFkaW8gbcS5P29keWNoIGJhbmR5dMSCxYJ3?= , support@lists.pcbsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Apr 2013 18:58:16 -0000 I'd like to throw in my two cents here. I've seen this (drives in RAID-1 configuration) hanging whole systems. Back in the IDE days, two drives were connected with one cable --- I largely wrote it off as a deficiency of IDE hardware and resolved to by SCSI hardware for more important systems. Of late, the physical hardware for SCSI (SAS) and SATA drives have converged. I'm willing to accept that SAS hardware may be built to a different standard, but I'm suspicious of the fact that a bad SATA drive on an ACH* controller can hang the whole system. ... it's not complete, however. Often pulling the drive's cable will unfreeze things. It's also not entirely consistent. Drives I have behind 4:1 port multipliers haven't (so far) hung the system that they're on (which uses ACH10). Right now, I have a remote ACH10 system that's hung hard a couple of times --- and it passes both it's short and long SMART tests on both drives. Is there no global timeout we can depend on here?