From owner-freebsd-stable@FreeBSD.ORG Mon Oct 4 14:04:00 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDEF3106564A for ; Mon, 4 Oct 2010 14:03:59 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 653448FC13 for ; Mon, 4 Oct 2010 14:03:59 +0000 (UTC) Received: by fxm9 with SMTP id 9so4181007fxm.13 for ; Mon, 04 Oct 2010 07:03:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=goaGer2rOL+v5j/VPMgwLdYsrHX/OKVWvPmJEhin6JY=; b=ViEfIIpeN81kO/qRS0grWCbMLdLZTuqLkCIihB7qZ2mvQl+FbWYRsLWIvwwdStbK79 SOl6FsCtiiECr5mmgqpI9t6xKas8PVW3M+g57RQRFClo0JZPZYr5ijj2VkzfVvA/IWfg XZRiSoq8MQ5ZxQ8EbjERWfgHPaVgydB1gQE28= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=L06pE/axxyldv21ndniI2stjCGVUXUX2V3WQJ9xmlmdjU65+Aa0IpDg7AZn/zN8UGY 4E5j49OXYvYxGmZD/cg/pPcsoUi4A+zzSgQBztH4pJEnLFYxMA9tYynTY45Qg7x6oraZ MpwMVJHxph5/VRh8mNM0suX1DYmB/X/svq+uc= Received: by 10.204.76.205 with SMTP id d13mr6969132bkk.93.1286201038094; Mon, 04 Oct 2010 07:03:58 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id y19sm3722870bkw.6.2010.10.04.07.03.55 (version=SSLv3 cipher=RC4-MD5); Mon, 04 Oct 2010 07:03:56 -0700 (PDT) Sender: Alexander Motin Message-ID: <4CA9DEC3.1000302@FreeBSD.org> Date: Mon, 04 Oct 2010 17:03:47 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.24 (X11/20100402) MIME-Version: 1.0 To: Alexander Leidinger References: <4CA73702.5080203@langille.org> <20101002141921.GC70283@icarus.home.lan> <4CA7AD95.9040703@langille.org> <20101002223626.GB78136@icarus.home.lan> <4CA7BEE4.9050201@langille.org> <20101002235024.GA80643@icarus.home.lan> <4CA7E4AE.4060607@langille.org> <4CA7E98E.3040701@comcast.net> <20101003110338.00004197@unknown> In-Reply-To: <20101003110338.00004197@unknown> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable , Steve Polyack , Jeremy Chadwick , Dan Langille Subject: Re: out of HDD space - zfs degraded X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Oct 2010 14:04:00 -0000 Alexander Leidinger wrote: > On Sat, 02 Oct 2010 22:25:18 -0400 Steve Polyack > wrote: > >> I thin its worth it to think about TLER (or the absence of it) here - >> http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery . Your >> consumer / SATA Hitachi drives likely do not put a limit on the time >> the drive may block on a command while handling inernal errors. If >> we consider that gpt/gisk06-live encountered some kind of error and >> had to relocate a significant number of blocks or perform some other >> error recovery, then it very well may have timed out long enough for >> siis(4) to drop the device. I have no idea what the timeouts are set >> to in the siis(4) driver, nor does anything in your SMART report >> stick out to me (though I'm certainly no expert with SMART data, and >> my understanding is that many drive manufacturers report the various >> parameters in different ways). Timeouts for commands usually defined by ada(4) peripheral driver and ATA transport layer of CAM. Most of timeouts set to 30 seconds. Only time value defined by siis(4) is hard reset time - 15 seconds now. As soon as drive didn't reappeared after `camcontrol reset/rescan ...` done after significant period of time, but required power cycle, I have doubt that any timeout value could help it. It may be also theoretically possible that it was controller firmware stuck, not drive. It would be interesting to power cycle specific drive if problem repeats. > IIRC mav@ (CCed) made a commit regarding this to -current in the not so > distant past. I do not know about the MFC status of this, or if it may > have helped or not in this situation. My last commit to siis(4) 2 weeks ago (merged recently) fixed specific bug in timeout handling, leading to system crash. I don't see alike symptoms here. If there was any messages before "Oct 2 00:50:53 kraken kernel: (ada0:siisch0:0:0:0): lost device", they could give some hints about original problem. Messages after it could be consequence. Enabling verbose kernel messages could give some more information about what happened there. -- Alexander Motin