From owner-freebsd-current@FreeBSD.ORG  Wed Oct  5 08:24:10 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 35F70106564A;
	Wed,  5 Oct 2011 08:24:10 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id E86BC8FC0C;
	Wed,  5 Oct 2011 08:24:09 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 5EB2028426;
	Wed,  5 Oct 2011 10:24:08 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id 571C728422;
	Wed,  5 Oct 2011 10:24:07 +0200 (CEST)
Message-ID: <4E8C1426.60107@quip.cz>
Date: Wed, 05 Oct 2011 10:24:06 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: lev@FreeBSD.org
References: <1927112464.20111004220507@serebryakov.spb.ru>	<4E8B7A27.5070908@quip.cz>
	<344794801.20111005101957@serebryakov.spb.ru>
In-Reply-To: <344794801.20111005101957@serebryakov.spb.ru>
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 8bit
Cc: Alexander Motin <mav@FreeBSD.org>, current@freebsd.org,
	freebsd-geom@FreeBSD.org
Subject: Re: RFC: Project geom-events
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Oct 2011 08:24:10 -0000

Lev Serebryakov wrote:
> Hello, Miroslav.
> You wrote 5 октября 2011 г., 1:27:03:
>
>> I am still missing one thing - dropped provider is not marked as failed
>> RAID provider and is accessible for anything like normal disk device. So
>> in some edge cases, the system can boot from failed RAID component
>> instead of degraded RAID. This can cause data loss or demage.
>    What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
> Something else?

I am mostly using geom_mirror.

> If GEOM class drops underlying provider due to errors,
> it doesn't have chances to update metadata for it.

I understand this, but if there are (stale) metadata on provider, system 
can read this metadata and should disallow normal operations (for 
example propagating slices, partitions and labels)

>    But most of classes, if dropped provider attached again, will
> rebuild itself, as they track which components are actual and which
> ones are old.

I see many times dropped provider (for example ada1) because of some DMA 
timeout (bad cables and so on), sometimes provider (disk ada1) detached 
from ATA channel and reattached after reboot. In both cases, provider 
has stale metadata and is marked as "broken" by geom_mirror and auto 
rebuild did not start.

In this case, I see gm0 with all of its slices, partitions and labels 
and ada1 with the same slices, partitions and labels - this is the 
problem. Because there are two devices providing same labels and the 
winner is the first tasted... Even if the system (geom_mirror) knows, 
that ada1 is "broken disk".

I think that GEOM should be more robust in this case and if metadata is 
found, do not publish slices, partitions, labels and so on...

>    Do you want GEOM classes to track droppen components somewhere else
> and din't even try to attach them automaticaly when they re-appear?

If some disk is removed, reinserted and synchronisation starts, then 
everything is OK. But situation where component is marked as "broken" 
and system and user can operate on it like on normal "good and clean" 
drive is wrong.

The drive's content should be inacessible until operator do some action 
(for example gmirror clear on broken disk device).

Miroslav Lachman