Date: Wed, 19 Oct 2011 15:42:32 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> To: Alfred Bartsch <bartsch@dssgmbh.de>, freebsd-geom@FreeBSD.org Subject: Re: disk partitioning with gmirror + gpt + gjournal (RFC) Message-ID: <4E9ED3C8.7030607@quip.cz> In-Reply-To: <4E9D99F3.40104@dssgmbh.de> References: <4E69A152.6090408@rdtc.ru> <4E69EB15.50808@rdtc.ru> <4E9D2117.4090203@dssgmbh.de> <4E9D3B56.50300@quip.cz> <4E9D99F3.40104@dssgmbh.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Alfred Bartsch wrote: > Am 18.10.2011 10:39, schrieb Miroslav Lachman: >> Alfred Bartsch wrote: >>> I am going to use the following paritioning scheme on our servers >>> and programmers' workstations running FreeBSD 8 (system disk): >>> physical drive - geom_mirror - geom_part_gpt - journaled UFS with >>> separate boot and swap partitions. Partition names and sizes are >>> taken from our environment - Your requirements may vary. >> >> It is not good idead to use GPT on top of gmirror as was discussed >> in the near past at freebsd-current@. You can read more in the >> thread "RFC: Project geom-events" In short: >> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028054.html >> >> > http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028109.html > > I know this thread. But nobody there really mentions which utilities / > BIOSes would fail or destroy the gmirror-metadata. The only > complaining utility I know of is gptboot (only warning during boot). > If You know other applications which will fail due to GPT problems, > please tell me. Most of the problems shown in this thread seem to have > something to do with the combined usage of gpt and glabel, which I'm > avoiding. As is mentioned in the thread, the problem is with any GEOM class storing is metadata at the end of the device (for example gmirror, graid3, glabel and others) > IMHO the only dangerous code is a foreign UEFI, which "repairs" the > last sector of the GPT disk without further inquiry. None of our > machines act in this way up to now. > Once I will get one of those "unfriendly" machines I surely have to > rethink my view of disk partitioning. I expect that this day either > GEOM will be able to handle this situation or ZFS will be > production-ready. UEFI will replace old BIOS sooner or later, so what you will do then? Than you will need to rework your servers and change your setup routine. And I think it is better to avolid known possible problem than hoping "it will not bite me". You can't avoid Murphy's law ;) >> I am using gjournal on few of our servers, but we are slowly >> removing it from our setups. Data writes to gjournaled disks are >> too slow and sometimes gjournal is not playing nice. > > I'm heavily interested in more details. When I did some tests in the past, gjournal cannot be used in combination with iSCSI and I was not able to stop gjournal tasting providers (I was not able to remove / disable gjournal on device) until I stop all of them and unload gjournal kernel module. I don't know the current state. >> Maybe ZFS or UFS+SUJ is better option. > > Yes, maybe. ZFS is mainly for future use. Do you use the second option > on large filesystems? ZFS is there for "a long time". I feel safe to use it in production on few of our servers. I didn't test UFS+SUJ because it is released in forthcoming 9.0 and we are not deploying current on our servers. >>> create the (journaled) data partitions: root partition # gpart >>> add -t freebsd-ufs -s 1G mirror/gm0 # gjournal label mirror/gm0p7 >>> mirror/gm0p3 note: IMHO journal size doesn't need to exceed data >>> size >> >> I don't think gjournal is needed in such small partitions. Classic >> fsck will be fast. >> > You are right. But IMHO I can not mix journaled and not journaled R/W > filesystems on a gmirror or I lose the main advantage of avoiding > remirroring the whole disk after power failure or crash. Yes, you are right, I forgot about this feature. I never used it this way. >>> /etc/fstab could then look like # Device Mountpoint >>> FStype Options Dump Pass# /dev/mirror/gm0p2 none >>> swap sw 0 0 /dev/ufs/fbsdroot / >>> ufs rw,noatime,async 1 1 /dev/ufs/fbsdhome /home >>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdusr /usr >>> ufs rw,noatime,async 2 2 /dev/ufs/fbsdvar /var >>> ufs rw,noatime,async 2 2 >>> ===================================================================== >> >>> >> And there is one more problem which I am mentioning again and again >> - the main problem of labels and gmirror is that "broken" >> (dropped) provider (for example disk ad0) publishes its >> partitioning and labels, so after reboot with degraded mirror, you >> can start the system with /dev/ad0p7 mounted (because it also has >> label "fbsdroot") instead of mirrored one. It depends on order of >> tasting devices etc. and if something didn't change, it is >> unpredictable to me, which device will be choosed if two devices >> have the same label. > > Thanks for clarifying this. As I'm looking for a robust configuration, > I will drop these labels. This leads to some minor changes in my > configuration: > > # newfs -J mirror/gm0p7.journal > # newfs -J mirror/gm0p8.journal > # newfs -J mirror/gm0p9.journal > # newfs -J mirror/gm0p10.journal > > /etc/fstab could then look like > # Device Mountpoint FStype Options Dump Pass# > /dev/mirror/gm0p2 none swap sw 0 0 > /dev/gm0p7.journal / ufs rw,noatime,async 1 1 > /dev/gm0p10.journal /home ufs rw,noatime,async 2 2 > /dev/gm0p9.journal /usr ufs rw,noatime,async 2 2 > /dev/gm0p8.journal /var ufs rw,noatime,async 2 2 > >> >>> Some questions: Is this disk configuration valid and robust? >>> (I've just started testing) Are there any other proposals - >>> usable as "best known practice", I didn't find a complete setup >>> so far? >> >> We are using gmirror with good old mbr / fdisk / bsdlabel without >> mounting by labels and with gjournal only on the big data >> partitions. Not on root, var or partitions with databases (because >> gjournal is slow on writes) > > with fdisk + bsdlabel there are not enough partitions in one slice to > hold all the journals, and as I already mentioned I really want to > minimize recovery time. > With gmirror + gjournal I'm able to activate disk write cache without > losing data consistency, which improves performance significantly. According to following commit message, bsdlabel was extended to 26 partitions 3 years ago. http://lists.freebsd.org/pipermail/cvs-all/2007-December/239719.html (I didn't tested yet, because I don't need it - we are using two slices on our servers) >> I see what you are trying to do and it would be nice if "all works >> as one can expect", but the reality is different. So I don't think >> it is good idea to make it as you described. >> > I'm not yet fully convinced, that my idea of disk partitioning is a > bad one, so please let me take part in your negative experiences with > gjournal. > Thanks in advance. I am not saying that your idea is bad. It just contains some things which I rather avoid. PS: please use Reply All, to post your reply to the mailing list as well
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E9ED3C8.7030607>