Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jan 2003 13:01:07 +0100
From:      phk@freebsd.org
To:        Ruslan Ermilov <ru@freebsd.org>
Cc:        current@freebsd.org, docs@freebsd.org
Subject:   Re: cvs commit: src/sbin/disklabel disklabel.8 disklabel.c 
Message-ID:  <6349.1043582467@critter.freebsd.dk>
In-Reply-To: Your message of "Sun, 26 Jan 2003 13:40:00 %2B0200." <20030126114000.GA58366@sunbay.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <20030126114000.GA58366@sunbay.com>, Ruslan Ermilov writes:

>and installed the new kernel (without any problems) on it.  Next
>reboot refused to boot FreeBSD by mentioning that "No operating
>system was found".  I wondered how I managed to screw my disk up.

Welcome to the club if people who was bitten by the poor design
choices in the BSD disklabel.

>Now the
>question.  Where is the code in the kernel that prevents swapping
>and/or writing to a disklabel portion of a physically first
>partition on the disk?

In GEOM it works the following way:

Assume we have a disk: ad0.

The disk has an MBR with two slices: ad0s1 ad0s2.

Assume that ad0s1 has a BSD disklabel with three partitions: ad0s1a
ad0s1b and ad0s1c.

When nothing is opened yet, you can open any of those devices anyway
you want, and (almost, see below) write anything you want to any
of them.

No, lets say you open ad0s1a for writing.

Since the location of ad0s1a is determined by the on-disk BSD disklabel,
we cannot allow you to trash that label now.  The BSD module will
therefore open ad0s1 with an "Exclusive" bit, which means "don't
write under my feet".

If you try to open ad0s1 for writing now, you will get EPERM.

You can open ad0s2 for writing because the MBR module will reaslize
that they don't overlap so there is no danger.

When The BSD label opens ad0s1, the same thing repeats:  The MBR
module wants to protect the MBR, so it also opens /dev/ad0 with
the exclusive bit and therefore you cannot open /dev/ad0 for writing.

So far so good.

You cannot trash the MBR by writing to ad0s1 or ad0s2 because despite
all its failings, the MBR does not have it's controlling meta-data
in-band.  Basically the disklayout is:

  [MBR sector][slice 1][slice 2]

The BSD disklabel on the other hand, puts the label into harms way
by allowing partitions to overlap the disklabel and bootblock area:

  [LABEL-area]
  [partition A       ]
		      [partition B]

So special magic is implemented to check if the writes you do to
one of the ad0s1? partitions would modify the label-area, and
intercept the write, validate that it contains a disklabel compatible
which do not modify the currently open partitions and generally
DTRT.

This protection does not extend down in the stack:  If you didn't
open any of the ad0s1? paritions, nothing prevents you from opening
ad0s1 for writing and trashing the BSD label that way.

The final wrinkle is that the 'c' partition overlaps the 'a' and 'b'
partitions, so once you have opened ad0s1a for writing you are no
longer allowed to open ad0s1c for writing since that would allow you
to royally screw up the filesystem on 'a' under its feet.

Do I hate the fact that the disklabel and boot code is in-band in
BSD labels ?  Yes you bet.

If it wasn't because the BSD label has so many other restrictions,
I would simply change our policy to not allow the overlap:  no
partitions in the first 16k, end of story.  Sure it would take some
time to migrate etc etc.

But (fortunately) the BSD label has some stupid limitations and I
have decided that we should migrate to a more powerful format,
and that the GPT format looks like the thing, so I'm just going to
leave the BSD label with it's broken semantics as it is.

Thanks for asking the question, it prompted me to explain this
stuff, and I've cc'ed the doc crew because this answer should
probably go into a FAQ/DOC of some kind.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6349.1043582467>