Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Oct 2011 15:42:32 +0200
From:      Miroslav Lachman <000.fbsd@quip.cz>
To:        Alfred Bartsch <bartsch@dssgmbh.de>, freebsd-geom@FreeBSD.org
Subject:   Re: disk partitioning with gmirror + gpt + gjournal (RFC)
Message-ID:  <4E9ED3C8.7030607@quip.cz>
In-Reply-To: <4E9D99F3.40104@dssgmbh.de>
References:  <4E69A152.6090408@rdtc.ru> <4E69EB15.50808@rdtc.ru> <4E9D2117.4090203@dssgmbh.de> <4E9D3B56.50300@quip.cz> <4E9D99F3.40104@dssgmbh.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Alfred Bartsch wrote:
> Am 18.10.2011 10:39, schrieb Miroslav Lachman:
>> Alfred Bartsch wrote:
>>> I am going to use the following paritioning scheme on our servers
>>> and programmers' workstations running FreeBSD 8 (system disk):
>>> physical drive - geom_mirror - geom_part_gpt - journaled UFS with
>>> separate boot and swap partitions. Partition names and sizes are
>>> taken from our environment - Your requirements may vary.
>>
>> It is not good idead to use GPT on top of gmirror as was discussed
>> in the near past at freebsd-current@. You can read more in the
>> thread "RFC: Project geom-events" In short:
>> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028054.html
>>
>>
> http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028109.html
>
> I know this thread. But nobody there really mentions which utilities /
> BIOSes would fail or destroy the gmirror-metadata. The only
> complaining utility I know of is gptboot (only warning during boot).
> If You know other applications which will fail due to GPT problems,
> please tell me. Most of the problems shown in this thread seem to have
> something to do with the combined usage of gpt and glabel, which I'm
> avoiding.

As is mentioned in the thread, the problem is with any GEOM class 
storing is metadata at the end of the device (for example gmirror, 
graid3, glabel and others)

> IMHO the only dangerous code is a foreign UEFI, which "repairs" the
> last sector of the GPT disk without further inquiry. None of our
> machines act in this way up to now.
> Once I will get one of those "unfriendly" machines I surely have to
> rethink my view of disk partitioning. I expect that this day either
> GEOM will be able to handle this situation or ZFS will be
> production-ready.

UEFI will replace old BIOS sooner or later, so what you will do then? 
Than you will need to rework your servers and change your setup routine. 
And I think it is better to avolid known possible problem than hoping 
"it will not bite me". You can't avoid Murphy's law ;)

>> I am using gjournal on few of our servers, but we are slowly
>> removing it from our setups. Data writes to gjournaled disks are
>> too slow and sometimes gjournal is not playing nice.
>
> I'm heavily interested in more details.

When I did some tests in the past, gjournal cannot be used in 
combination with iSCSI and I was not able to stop gjournal tasting 
providers (I was not able to remove / disable gjournal on device) until 
I stop all of them and unload gjournal kernel module. I don't know the 
current state.

>> Maybe ZFS or UFS+SUJ is better option.
>
> Yes, maybe. ZFS is mainly for future use. Do you use the second option
> on large filesystems?

ZFS is there for "a long time". I feel safe to use it in production on 
few of our servers. I didn't test UFS+SUJ because it is released in 
forthcoming 9.0 and we are not deploying current on our servers.

>>> create the (journaled) data partitions: root partition # gpart
>>> add -t freebsd-ufs -s 1G mirror/gm0 # gjournal label mirror/gm0p7
>>> mirror/gm0p3 note: IMHO journal size doesn't need to exceed data
>>> size
>>
>> I don't think gjournal is needed in such small partitions. Classic
>> fsck will be fast.
>>
> You are right. But IMHO I can not mix journaled and not journaled R/W
> filesystems on a gmirror or I lose the main advantage of avoiding
> remirroring the whole disk after power failure or crash.

Yes, you are right, I forgot about this feature. I never used it this way.

>>> /etc/fstab could then look like # Device            Mountpoint
>>> FStype  Options          Dump    Pass# /dev/mirror/gm0p2   none
>>> swap    sw               0       0 /dev/ufs/fbsdroot   /
>>> ufs     rw,noatime,async 1       1 /dev/ufs/fbsdhome   /home
>>> ufs     rw,noatime,async 2       2 /dev/ufs/fbsdusr    /usr
>>> ufs     rw,noatime,async 2       2 /dev/ufs/fbsdvar    /var
>>> ufs     rw,noatime,async 2       2
>>> =====================================================================
>>
>>>
>> And there is one more problem which I am mentioning again and again
>> - the main problem of labels and gmirror is that "broken"
>> (dropped) provider (for example disk ad0) publishes its
>> partitioning and labels, so after reboot with degraded mirror, you
>> can start the system with /dev/ad0p7 mounted (because it also has
>> label "fbsdroot") instead of mirrored one. It depends on order of
>> tasting devices etc. and if something didn't change, it is
>> unpredictable to me, which device will be choosed if two devices
>> have the same label.
>
> Thanks for clarifying this. As I'm looking for a robust configuration,
> I will drop these labels. This leads to some minor changes in my
> configuration:
>
> # newfs -J mirror/gm0p7.journal
> # newfs -J mirror/gm0p8.journal
> # newfs -J mirror/gm0p9.journal
> # newfs -J mirror/gm0p10.journal
>
> /etc/fstab could then look like
> # Device            Mountpoint  FStype  Options          Dump    Pass#
> /dev/mirror/gm0p2   none        swap    sw               0       0
> /dev/gm0p7.journal  /           ufs     rw,noatime,async 1       1
> /dev/gm0p10.journal /home       ufs     rw,noatime,async 2       2
> /dev/gm0p9.journal  /usr        ufs     rw,noatime,async 2       2
> /dev/gm0p8.journal  /var        ufs     rw,noatime,async 2       2
>
>>
>>> Some questions: Is this disk configuration valid and robust?
>>> (I've just started testing) Are there any other proposals -
>>> usable as "best known practice", I didn't find a complete setup
>>> so far?
>>
>> We are using gmirror with good old mbr / fdisk / bsdlabel without
>> mounting by labels and with gjournal only on the big data
>> partitions. Not on root, var or partitions with databases (because
>> gjournal is slow on writes)
>
> with fdisk + bsdlabel there are not enough partitions in one slice to
> hold all the journals, and as I already mentioned I really want to
> minimize recovery time.
> With gmirror + gjournal I'm able to activate disk write cache without
> losing data consistency, which improves performance significantly.

According to following commit message, bsdlabel was extended to 26 
partitions 3 years ago.
http://lists.freebsd.org/pipermail/cvs-all/2007-December/239719.html
(I didn't tested yet, because I don't need it - we are using two slices 
on our servers)

>> I see what you are trying to do and it would be nice if "all works
>> as one can expect", but the reality is different. So I don't think
>> it is good idea to make it as you described.
>>
> I'm not yet fully convinced, that my idea of disk partitioning is a
> bad one, so please let me take part in your negative experiences with
> gjournal.
> Thanks in advance.

I am not saying that your idea is bad. It just contains some things 
which I rather avoid.

PS: please use Reply All, to post your reply to the mailing list as well



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E9ED3C8.7030607>