From owner-freebsd-hardware@FreeBSD.ORG Fri Jul 22 16:22:54 2011 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BEC31065675 for ; Fri, 22 Jul 2011 16:22:54 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com [209.85.161.44]) by mx1.freebsd.org (Postfix) with ESMTP id C9F348FC17 for ; Fri, 22 Jul 2011 16:22:53 +0000 (UTC) Received: by fxe6 with SMTP id 6so4142289fxe.17 for ; Fri, 22 Jul 2011 09:22:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=yW4s2bj6tbEOeAturR+p/iJwHTI2JAnB0YUcnGN85sA=; b=Uls+9v2YIzsJctJCp2YuA4L8eJ9WoOupPiwipILnPKtp5YpY4wFS0Qp9S7vbbP3E7Q FfJ3y3NsrHEtDlLSVnXHXwZ4VId/KQasEQbv+Uu2QTRibat9H3RcIS6vBZwTi/Kv6iI6 sQs6Z0Z5PnfjvtQ8iEEeFNoWwlhCmT/d4uink= Received: by 10.223.1.12 with SMTP id 12mr2268609fad.113.1311351772615; Fri, 22 Jul 2011 09:22:52 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id f7sm1387864faa.32.2011.07.22.09.22.50 (version=SSLv3 cipher=OTHER); Fri, 22 Jul 2011 09:22:51 -0700 (PDT) Sender: Alexander Motin Message-ID: <4E29A3D6.1080609@FreeBSD.org> Date: Fri, 22 Jul 2011 19:22:46 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: lev@FreeBSD.org References: <1981757790.20110720013856@serebryakov.spb.ru> In-Reply-To: <1981757790.20110720013856@serebryakov.spb.ru> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: 7bit Cc: freebsd-hardware@freebsd.org Subject: Re: ahci.ko / geom_mirror / zfs hangs up system when one of HDDs fauilts. X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jul 2011 16:22:54 -0000 Lev Serebryakov wrote: > I've have two identical live locks when HDD becomes broken on > 8.2-STABLE system with two SATA HDDs withgmirror and ZFS on them. > > It is Hetzner-based server, so only access I have is LARA console, > but symptoms are identical in both cases: HDD becomes bad, ahci.ko > complains about timeouts, and after that server stops to respond on > high-level access attempts (ssh/HTTP/SMTP), but can be pinged both > with IPv4 and IPv6 addresses. > > HDDs are identical, and they are splitted into several (BSD)partions. > Some partitions are mirrired with geom_mirror and one pair of > partitions are added to (mirrored) ZFS pool like this (I proved output > on rebooted one-HDD-only system, but, I think, it is clear how it > looks when both HDDs are Ok): > > Screenshot of LARA console in such case is attached. Kernel messages look like if controller or device stuck, unable to complete some command and can't recover from that condition even after device hard reset. I don't see what driver can do about it, except being more aggressive in dropping faulty device after several consecutive timeouts. If that is not a wanted way out, start from updating card BIOS and devices firmware. -- Alexander Motin