Date: Sat, 4 Jul 2009 11:15:38 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: Marcel Moolenaar <xcllnt@mac.com> Cc: rick-freebsd2008@kiwi-computer.com, freebsd-geom@freebsd.org Subject: Re: gmirror gm0 destroyed on shutdown; GPT corrupt Message-ID: <20090704091538.GA2891@garage.freebsd.pl> In-Reply-To: <D4656301-95DD-46B2-A52B-A4E9AE1CE841@mac.com> References: <20090625110253.GA31443@mech-cluster238.men.bris.ac.uk> <10FCC74D-6D46-4112-AD89-BBB4C5933957@mac.com> <h24v15$70v$1@ger.gmane.org> <2FFFB36F-EFA3-4D92-98A3-692BA2D6F63E@mac.com> <20090629210003.GA24038@keira.kiwi-computer.com> <704EE47D-F0C4-4C63-AA3C-3ADF92CC8379@mac.com> <20090701135338.GE4372@garage.freebsd.pl> <D4656301-95DD-46B2-A52B-A4E9AE1CE841@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--sdtB3X0nJg68CQEu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 01, 2009 at 08:29:23AM -0700, Marcel Moolenaar wrote: >=20 > On Jul 1, 2009, at 6:53 AM, Pawel Jakub Dawidek wrote: >=20 > >>Answer the following: > >> > >>foo0 is a provider with 3 sectors. > >>bar is a geom class that puts meta-data in the first sector. > >>baz is a geom class that puts meta-data in the last sector. > >> > >>Both bar and baz get to taste foo0. Which one should go first? > > > >Marcel, I don't think you expect than entire world will agree on one > >place where metadata should be stored? >=20 > No, I don't expect it. But we do need to realize that there > is a race and unless we keep track of the ordering (outside > of GEOM), we will always run into some scenarios where the > tasting results in warnings or errors... This is not a race, really and also ordering is not important. Let's do the following: # gmirror create test da0 # gpt create /dev/mirror/test Let's assume GPT will be given providers for tasting before MIRROR on boot: da0 arrives GEOM: GPT->taste(da0) GPT: Raport GPT corrupted (da0 is not the size we expect) GEOM: MIRROR->taste(da0) MIRROR: g_new_providerf(mirror/test) GEOM: GPT->taste(mirror/test) GPT: GPT is ok, configure partitions, etc. Now let's revert the order: MIRROR goes first, then GPT: da0 arrives GEOM: MIRROR->taste(da0) MIRROR: g_new_providerf(mirror/test) GEOM: GPT->taste(da0) GPT: Raport GPT corrupted (da0 is not the size we expect) GEOM: GPT->taste(mirror/test) GPT: GPT is ok, configure partitions, etc. This is the same, because GEOM will still present da0 for GPT tasting even if MIRROR will decide to use it. I do agree that it is hard to cope with, especially for metadata formats that are given and that we cannot extend. The real problem here is that in some situations (for some metadata formats) class cannot auto-discover its providers reliably. GPT is not alone here. There is similar issue for UFS labels. You have a 500GB disk da0, you also have 200GB partition da0a starting at sector 0. You create UFS file system on da0a: # newfs -L foo /dev/da0a The LABEL class is given disk da0 for tasting. How can it tell if the file system was created on da0 or da0a? What we do now is to look inside UFS metadata and get file system size from there. If the file system size is equal to provider's size this is our provider. So in this case file system size is 200GB and da0 size is 500GB, so we skip it. This is not perfect, because one can create smaller UFS file system than provider size: # newfs -s 419430400 -L foo /dev/da0 We created 200GB file system on 500GB da0. Now the LABEL class will incorrectly skip da0 during tasting, because of size mismatch. The problem is similar to GPT: they cannot reliably work in auto-discovery mode. This is also problematic that provider can have multiple consumers attached, but solution I use in some classes (which is a side-effect really) is to open provider for write and exclusively during tasting. Even if MIRROR provider isn't mounted it keeps its components open for writing and exclusively all the time (the main reason was to allow synchronization). Once MIRROR opens provider for writing every consumer attached to this provider gets spoiled event (at least those that depend on metadata). Going back to our example even if GPT will configure partitions on da0, it should remove them on spoiled event once MIRROR opens this provider for writing. At the end GPT will configure partitions on mirror/test. This is of course not perfect, but reduce the mess in /dev/ a bit. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --sdtB3X0nJg68CQEu Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFKTx26ForvXbEpPzQRAqNqAKCIOf1YZAiE8ct1Z63/qdTnkBNjFQCgylmY 02uu/aS9yrvyUbp6I5D5Aw8= =6Rqk -----END PGP SIGNATURE----- --sdtB3X0nJg68CQEu--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090704091538.GA2891>