Date: Wed, 31 Mar 2004 00:14:13 -0300 From: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= <jonny@jonny.eng.br> To: Greg 'groggy' Lehey <grog@FreeBSD.org> Cc: Joao Carlos Mendes Luis <jonny@faperj.br> Subject: Re: Serious bug in vinum? Message-ID: <406A3785.1040007@jonny.eng.br> In-Reply-To: <20040331004630.GA15929@wantadilla.lemis.com> References: <4068EA56.3060600@jonny.eng.br> <20040330053143.GN15929@wantadilla.lemis.com> <40697F3B.2020202@jonny.eng.br> <20040326222853.GA93269@zeus.faperj.br> <20040330143257.C72259@pcle2.cc.univie.ac.at> <20040331004630.GA15929@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Greg 'groggy' Lehey wrote: > On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote: > >>On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote: >> >> >>> I think this should be like: >>> >>> if (plex->state > plex_corrupt) { /* something accessible, */ >>> >>> Or, in other words, volume state is up only if plex state is degraded >>>or better. >> >>You are right, this is a bug, > > No, see my reply. I think "maybe" is the best answer here. >>The correct solution, of course, is to check if the data is valid >>before changing the volume state, but turn might turn out to be a >>very complex check. > > > Well, the minimum correct solution is to return an error if somebody > tries to access the inaccessible part of the volume. That should > happen, and I'm confused that it doesn't appear to be doing so in this > case. > > On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote: > >>Greg 'groggy' Lehey wrote: >> >>>On Tuesday, 30 March 2004 at 0:32:38 -0300, Joo Carlos Mendes Lus wrote: >>> >>>Basically, this is a feature and not a bug. A plex that is corrupt is >>>still partially accessible, so we should allow access to it. If you >>>have two striped plexes both striped between two disks, with the same >>>stripe size, and one plex starts on the first drive, and the other on >>>the second, and one drive dies, then each plex will lose half of its >>>data, every second stripe. But the volume will be completely >>>accessible. >> >> A good idea if you have both stripe and mirror, to avoid discarding the >>whole disk. But, IMHO, if some part of the disk is inacessible, the volume >>should go down, and IFF the operator wants to try recovery, should use the >>setstate command. This is the safe state. > > setstate is not safe. It bypasses a lot of consistency checking. That's why it should be done only by a human operator, and only after checking the physical disk. I use setstate frequently, when I have my wizard hat on, but I know the consequences of doing that. If I have someone watching I carefully explain then to *not* repeat that. ;-) > > One possibility would be: > > 1. Based on the plex states, check if all of the volume is still > accessible. > 2. If not, take the volume into a "flaky" state. This is easy if the volume is composed of a single plex (my case, and the case of most people who needs only a big and "unsafe" disk. Where unsafe means a disk available or not available, and not half a disk. At least for me. If the volume has more than one plex, then you could think of an algoritm that explores this redundancy. But, IMO, a disk with half of it unavailable is hardly an "up and ok" one. Also note that, instead of turning the whole subdisk stale when a single I/O fails, the error could be passed above. But, also, this only works with single plex stripe or concat configurations. > 3. *Somehow* ensure that the volume can't be accessed again as a file > system until it has been remounted. > 4. Refuse to remount the file system without the -f option. > > The last two are outside the scope of Vinum, of course. And again violates the layering aproach. I thought newfs -v has been enough... The first time I used vinum I was happilly thinking that I would mix 4 whole disks (except for boot and swap partitions, of course) and create a new pseudo disk, in which I would again disklabel it, and repartition for expected use. Say, for example, that I want to have /var and /usr on different partitions, but I want both with mirroring. With real world vinum I need to create 2 vinum partitions on real disks, and have 2 vinum volumes. AFAIK, -current and GEOM fixes this, right? My last experience with RaidFrame was a panic one, since the disk creation. But I must confess I did not try that hard, since vinum and -stable was working for me. I am not a -current hacker for a long time now. Greg, I like vinum, and I use it since its release in FreeBSD. Before that I have used ccd(4). When 5.x is stable, I will use GEOM, vinum or raidframe. But I really think *ix is great for it's reusability, recursivity and modularity and vinum breaks this. If vinum creates a virtual disk, it should behave like a real disk. Jonny -- João Carlos Mendes Luís - Networking Engineer - jonny@jonny.eng.br
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?406A3785.1040007>