From owner-freebsd-fs@FreeBSD.ORG Tue Nov 16 13:32:39 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 386D71065670 for ; Tue, 16 Nov 2010 13:32:39 +0000 (UTC) (envelope-from michaelscotttech@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id D69938FC08 for ; Tue, 16 Nov 2010 13:32:38 +0000 (UTC) Received: by qyk7 with SMTP id 7so721798qyk.13 for ; Tue, 16 Nov 2010 05:32:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:cc:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=tpjWkbiqoxvIjs6tnOuVD8Y1A3LXZHUhCTgCIzNg3fA=; b=j8fTY+HM0HbLRQsnQ4Gaj2DmcjDQxrXvs4TgqOPg/QfskzG3HTKQg9F3KzHpm64MIM O5NGWAIBeN4JjuOGzCcAYNCLuOcaW5dkNp1u7LFERNgBukyfY0jWPEZWlIBOPbjw4vNX DdF0Gj3vKpfv5Ys/tnwpzsOprFySqd5jF43uU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=cc:message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=gQ6Vazz3XuQ/tT+t7OdnDrkzpzheU90jIDoob5q8GZV0iErhWJAHE6sQ4XQ0nTJKnN zKn7a2P9hYKA1vw9eMsZ8+xLIDsyM18tA6XKJUMXsi6W/3xCSYA+YgH3BMQ3gZ2lhJUV Tx722BF3Hb0FbFVmYbD6ui/xwy7BPRkwN6Qa0= Received: by 10.224.28.85 with SMTP id l21mr3025141qac.188.1289914358181; Tue, 16 Nov 2010 05:32:38 -0800 (PST) Received: from msb.datacomp-intranet.com (h69-130-231-62.mdsnwi.tisp.static.tds.net [69.130.231.62]) by mx.google.com with ESMTPS id mz11sm743067qcb.39.2010.11.16.05.32.36 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 16 Nov 2010 05:32:37 -0800 (PST) Message-Id: <441E3529-6178-404E-8A2D-2CF9BBC4170C@gmail.com> From: Michael Boers To: freebsd-fs@freebsd.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Tue, 16 Nov 2010 08:32:35 -0500 References: <25DC6C26-52FB-447A-AEB0-8549DA8F53E7@gmail.com> X-Mailer: Apple Mail (2.936) Cc: Subject: Re: zfs mirror recognizing disk failures X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Nov 2010 13:32:39 -0000 On Nov 16, 2010, at 5:24 AM, Olivier Smedts wrote: > 2010/11/15 Michael Boers : >> Is there anything I can do to make a zfs mirror quicker to give up >> on a >> flaky disk? >> >> I recently had a 100% zfs system crash when started to have some disk >> errors. I had hoped that by having a mirror, the system would >> survive this >> type of error. Instead it just hung. > > You can offline the faulty drive. > Also, I think you're interested in a feature like TLER : > http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery > But typical (cheap) drives don't implement it. Unfortunately, I was not able to offline the drive. I was not able to gain access to the machine. It responded to pings and since it is a CARP master, it was still broadcasting its "masterness", but any attempt to ssh into the machine failed. It is my guess that anything disk related was blocked behind the problem. To answer Jermey's question of "what happened next?" The machine was not serving web requests The machine was not responsive via ssh The machine was pingable after waiting about 15 minutes, I used the ipmi protocol to power down the machine. When it came back up, I found the enclosed errors in the log. If I am following your comments correctly, the fault for this lies in the mpt system not giving up which could either be a driver or a firmware issue. Is that correct? How do I protect against that? > >> >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SYNCHRONIZE >> CACHE(10). >> CDB: 35 0 0 0 0 0 0 0 0 0 >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): CAM Status: SCSI >> Status >> Error >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): SCSI Status: Check >> Condition >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): ABORTED COMMAND >> asc:0,0 >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): No additional sense >> information >> Nov 11 10:05:01 caprica kernel: (da2:mpt0:0:3:0): Retries Exhausted >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003c87a0:2838 timed >> out for ccb 0xffffff0103acc000 (req->ccb 0xffffff0103acc000) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003c5110:2839 timed >> out for ccb 0xffffff035cab0800 (req->ccb 0xffffff035cab0800) >> Nov 11 10:05:53 caprica kernel: mpt0: attempting to abort req >> 0xffffff80003c87a0:2838 function 0 >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003bef30:2840 timed >> out for ccb 0xffffff0007986800 (req->ccb 0xffffff0007986800) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003c8560:2841 timed >> out for ccb 0xffffff032d985000 (req->ccb 0xffffff032d985000) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003bf320:2842 timed >> out for ccb 0xffffff0103af2000 (req->ccb 0xffffff0103af2000) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003cbda0:2843 timed >> out for ccb 0xffffff0103b0b000 (req->ccb 0xffffff0103b0b000) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003bfd40:2844 timed >> out for ccb 0xffffff00102bf800 (req->ccb 0xffffff00102bf800) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003cad50:2845 timed >> out for ccb 0xffffff01e6f33000 (req->ccb 0xffffff01e6f33000) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003caf00:2846 timed >> out for ccb 0xffffff01e6f24800 (req->ccb 0xffffff01e6f24800) >> Nov 11 10:05:53 caprica kernel: mpt0: request >> 0xffffff80003ccd60:2847 timed >> out for ccb 0xffffff01308a4000 (req->ccb 0xffffff01308a4000) >> >> Is this a type of error zfs can survive or do I need a hardware >> mirror to >> handle this type of problem? >> >> Thanks, >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > > > -- > Olivier Smedts _ > ASCII ribbon campaign ( ) > e-mail: olivier@gid0.org - against HTML email & vCards X > www: http://www.gid0.org - against proprietary attachments / \ > > "Il y a seulement 10 sortes de gens dans le monde : > ceux qui comprennent le binaire, > et ceux qui ne le comprennent pas."