Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Mar 2004 00:14:13 -0300
From:      =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= <jonny@jonny.eng.br>
To:        Greg 'groggy' Lehey <grog@FreeBSD.org>
Cc:        Joao Carlos Mendes Luis <jonny@faperj.br>
Subject:   Re: Serious bug in vinum?
Message-ID:  <406A3785.1040007@jonny.eng.br>
In-Reply-To: <20040331004630.GA15929@wantadilla.lemis.com>
References:  <4068EA56.3060600@jonny.eng.br> <20040330053143.GN15929@wantadilla.lemis.com> <40697F3B.2020202@jonny.eng.br> <20040326222853.GA93269@zeus.faperj.br> <20040330143257.C72259@pcle2.cc.univie.ac.at> <20040331004630.GA15929@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help


Greg 'groggy' Lehey wrote:

> On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote:
> 
>>On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote:
>>
>>
>>>    I think this should be like:
>>>
>>>        if (plex->state > plex_corrupt) {                  /* something accessible, */
>>>
>>>    Or, in other words, volume state is up only if plex state is degraded
>>>or better.
>>
>>You are right, this is a bug,
> 
> No, see my reply.

     I think "maybe" is the best answer here.

>>The correct solution, of course, is to check if the data is valid
>>before changing the volume state, but turn might turn out to be a
>>very complex check.
> 
> 
> Well, the minimum correct solution is to return an error if somebody
> tries to access the inaccessible part of the volume.  That should
> happen, and I'm confused that it doesn't appear to be doing so in this
> case.
> 
> On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote:
> 
>>Greg 'groggy' Lehey wrote:
>>
>>>On Tuesday, 30 March 2004 at  0:32:38 -0300, Joo Carlos Mendes Lus wrote:
>>>
>>>Basically, this is a feature and not a bug.  A plex that is corrupt is
>>>still partially accessible, so we should allow access to it.  If you
>>>have two striped plexes both striped between two disks, with the same
>>>stripe size, and one plex starts on the first drive, and the other on
>>>the second, and one drive dies, then each plex will lose half of its
>>>data, every second stripe.  But the volume will be completely
>>>accessible.
>>
>>    A good idea if you have both stripe and mirror, to avoid discarding the
>>whole disk.  But, IMHO, if some part of the disk is inacessible, the volume
>>should go down, and IFF the operator wants to try recovery, should use the
>>setstate command.  This is the safe state.
> 
> setstate is not safe.  It bypasses a lot of consistency checking.

     That's why it should be done only by a human operator, and only after 
checking the physical disk.  I use setstate frequently, when I have my wizard 
hat on, but I know the consequences of doing that.  If I have someone watching I 
carefully explain then to *not* repeat that.   ;-)

> 
> One possibility would be: 
> 
> 1.  Based on the plex states, check if all of the volume is still
>     accessible.
> 2.  If not, take the volume into a "flaky" state.  

     This is easy if the volume is composed of a single plex (my case, and the 
case of most people who needs only a big and "unsafe" disk.  Where unsafe means 
a disk available or not available, and not half a disk.  At least for me.

     If the volume has more than one plex, then you could think of an algoritm 
that explores this redundancy.

     But, IMO, a disk with half of it unavailable is hardly an "up and ok" one.

     Also note that, instead of turning the whole subdisk stale when a single 
I/O fails, the error could be passed above.  But, also, this only works with 
single plex stripe or concat configurations.


> 3.  *Somehow* ensure that the volume can't be accessed again as a file
>     system until it has been remounted.
> 4.  Refuse to remount the file system without the -f option.
> 
> The last two are outside the scope of Vinum, of course.

     And again violates the layering aproach.  I thought newfs -v has been enough...

     The first time I used vinum I was happilly thinking that I would mix 4 
whole disks (except for boot and swap partitions, of course) and create a new 
pseudo disk, in which I would again disklabel it, and repartition for expected 
use.  Say, for example, that I want to have /var and /usr on different 
partitions, but I want both with mirroring.  With real world vinum I need to 
create 2 vinum partitions on real disks, and have 2 vinum volumes.

     AFAIK, -current and GEOM fixes this, right?  My last experience with 
RaidFrame was a panic one, since the disk creation.  But I must confess I did 
not try that hard, since vinum and -stable was working for me.  I am not a 
-current hacker for a long time now.

     Greg, I like vinum, and I use it since its release in FreeBSD.  Before that 
I have used ccd(4).  When 5.x is stable, I will use GEOM, vinum or raidframe. 
But I really think *ix is great for it's reusability, recursivity and modularity 
and vinum breaks this.  If vinum creates a virtual disk, it should behave like a 
real disk.

                                         Jonny

-- 
João Carlos Mendes Luís - Networking Engineer - jonny@jonny.eng.br



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?406A3785.1040007>