From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 27 18:22:45 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A9C17106564A; Wed, 27 Jun 2012 18:22:45 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id DEB828FC08; Wed, 27 Jun 2012 18:22:44 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 4F0E8FFA; Wed, 27 Jun 2012 20:22:43 +0200 (CEST) Date: Wed, 27 Jun 2012 20:20:39 +0200 From: Pawel Jakub Dawidek To: Marcel Moolenaar Message-ID: <20120627182038.GB1401@garage.freebsd.pl> References: <4FE9B01C.30306@yandex.ru> <201206261337.11741.jhb@freebsd.org> <468988EA-AC50-451D-ACE1-17B58E0CAF67@xcllnt.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xXmbgvnjoT4axfJE" Content-Disposition: inline In-Reply-To: <468988EA-AC50-451D-ACE1-17B58E0CAF67@xcllnt.net> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Doug Rabson , John Baldwin , freebsd-hackers , Andriy Gapon , freebsd-current , "Andrey V. Elsukov" Subject: Re: [CFC/CFT] large changes in the loader(8) code X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jun 2012 18:22:45 -0000 --xXmbgvnjoT4axfJE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote: >=20 > On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: > >=20 > > GPT really wants the backup header at the last LBA. I know you can set= it,=20 > > but I've interpreted that as a way to see if the primary header is corr= ect or=20 > > not. It seems to me that GPT tables created in this fashion (inside a = GEOM=20 > > provider) will not work properly with partition editors for other OS's.= I'm=20 > > hesitant to encourage the use of this as I do think putting GPT inside = of a=20 > > gmirror violates the GPT spec. >=20 > Agreed. Guys. This doesn't violate the GPT spec in any way. The spec is narrow-minded if it talks only about raw disks, but you should think about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top of RAID array is spec violation, then I guess we just have to live with it. > While it is a nice trick to use the last sector for meta data, it does > create 2 problems. 1 is mentioned above. [...] It doesn't really matter where gmirror puts its metadata. If gmirror would keep its metadata in the first sector, gpart/gpt will find its metadata in the last sector and will complain about missing primary header. > [...] The second is that when there's > different metadata in the first *and* the last sector, you can't decide > which is to take precedence without also looking at the other and know > how to interpret it. We have not solved this second problem at all. We > do get reports about the problems though. At best we're handwaving or > kluging. This is different kind of problem. It took me a while to realize that, but now I know:) The real problem is that not all metadata formats are suitable for autodetection. That's all. The metadata I use in my GEOM classes play nice with autodetection. The solution is very easy - keep size of the disk device within metadata. This allows gmirror to figure out if it is configured on raw disk, last slice or last partition within last slice, etc. If GPT would keep disk size in its metadata the second problem you mentioned would not exist. And to be honest GPT kinda does that by having backup header's LBA stored in the primary header. And this is fine as long the primary header is valid. The same problem is with things like UFS labels. There is no way to properly support them using GEOM autodetection, because there is no provider size in UFS superblock. UFS superblock contains file system size, but it is not the same, as one can create smaller file system than the underlying disk device. > I think it's unwise to depend on FreeBSD-specific extensions or features > in industry-standard partitioning schemes and as such make the use of > "foreign" tools hard if not impossible. If you plan to use the given disk with FreeBSD only, what's the problem? Partitioning is not the end of the world. Even if you use "industry-standard partitioning schemes" what file system are you going to use to actually access your data? FAT? Of course if you do share your disk between various OSes then probably your best bet is to use MBR or GPT on raw disk and FAT file system. But if you use your disk with FreeBSD only, then I see no reason to not to leverage FreeBSD-specific features (be it gmirror, geli or zfs). > A much more flexible approach is to support out-of-band configuration > data. This allows us to mirror GPT disks without having to become non- > standard as it removes the need to use the last sector for meta-data. > The ability to construct GEOM hierarchies unambiguously is very > important and our current approach has proven to not deliver on that. > This is actually impacting existing FreeBSD consumers already, like > Juniper. So, se should not go deeper into this rabbit hole. We should > finally solve this problem for real... Marcel, nothing stops anyone from implementing GEOM mirror class that uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does provide mechanism for autoconfiguration, but it is totally optional and GEOM class might choose not to use it. As an example you can take a look at two other GEOM classes of mine: gconcat(8) and gstripe(8). You can use 'label' subcommand to store metadata on component disks, which will take advantage of GEOM autodetection and autoconfiguration. You can also use 'create' subcommand to create ad hoc provider that stores no metadata and makes use of entire disks, which also means it won't be automatically created on next boot. For Juniper it might be more handy to use out-of-band configuration as you know the hardware you are running on, so you know where the disks are exactly, etc. My company build appliances too, so I have been there. For most of our users automatic configuration is simply better, as they can shuffle disks around and not wonder if the system will boot or not. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --xXmbgvnjoT4axfJE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAk/rTvYACgkQForvXbEpPzTZwwCgy+YI9gzwZnE6G1ZMgpOl1G0t qJcAoO/lUg0evhqdiMeX/AGhxIq2yahP =JS+Z -----END PGP SIGNATURE----- --xXmbgvnjoT4axfJE--