From owner-freebsd-current Sun Jan 26 4: 1:23 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D60637B401; Sun, 26 Jan 2003 04:01:09 -0800 (PST) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BC5F43E4A; Sun, 26 Jan 2003 04:01:08 -0800 (PST) (envelope-from phk@freebsd.org) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h0QC17Ql006350; Sun, 26 Jan 2003 13:01:07 +0100 (CET) (envelope-from phk@freebsd.org) To: Ruslan Ermilov Cc: current@freebsd.org, docs@freebsd.org Subject: Re: cvs commit: src/sbin/disklabel disklabel.8 disklabel.c From: phk@freebsd.org In-Reply-To: Your message of "Sun, 26 Jan 2003 13:40:00 +0200." <20030126114000.GA58366@sunbay.com> Date: Sun, 26 Jan 2003 13:01:07 +0100 Message-ID: <6349.1043582467@critter.freebsd.dk> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <20030126114000.GA58366@sunbay.com>, Ruslan Ermilov writes: >and installed the new kernel (without any problems) on it. Next >reboot refused to boot FreeBSD by mentioning that "No operating >system was found". I wondered how I managed to screw my disk up. Welcome to the club if people who was bitten by the poor design choices in the BSD disklabel. >Now the >question. Where is the code in the kernel that prevents swapping >and/or writing to a disklabel portion of a physically first >partition on the disk? In GEOM it works the following way: Assume we have a disk: ad0. The disk has an MBR with two slices: ad0s1 ad0s2. Assume that ad0s1 has a BSD disklabel with three partitions: ad0s1a ad0s1b and ad0s1c. When nothing is opened yet, you can open any of those devices anyway you want, and (almost, see below) write anything you want to any of them. No, lets say you open ad0s1a for writing. Since the location of ad0s1a is determined by the on-disk BSD disklabel, we cannot allow you to trash that label now. The BSD module will therefore open ad0s1 with an "Exclusive" bit, which means "don't write under my feet". If you try to open ad0s1 for writing now, you will get EPERM. You can open ad0s2 for writing because the MBR module will reaslize that they don't overlap so there is no danger. When The BSD label opens ad0s1, the same thing repeats: The MBR module wants to protect the MBR, so it also opens /dev/ad0 with the exclusive bit and therefore you cannot open /dev/ad0 for writing. So far so good. You cannot trash the MBR by writing to ad0s1 or ad0s2 because despite all its failings, the MBR does not have it's controlling meta-data in-band. Basically the disklayout is: [MBR sector][slice 1][slice 2] The BSD disklabel on the other hand, puts the label into harms way by allowing partitions to overlap the disklabel and bootblock area: [LABEL-area] [partition A ] [partition B] So special magic is implemented to check if the writes you do to one of the ad0s1? partitions would modify the label-area, and intercept the write, validate that it contains a disklabel compatible which do not modify the currently open partitions and generally DTRT. This protection does not extend down in the stack: If you didn't open any of the ad0s1? paritions, nothing prevents you from opening ad0s1 for writing and trashing the BSD label that way. The final wrinkle is that the 'c' partition overlaps the 'a' and 'b' partitions, so once you have opened ad0s1a for writing you are no longer allowed to open ad0s1c for writing since that would allow you to royally screw up the filesystem on 'a' under its feet. Do I hate the fact that the disklabel and boot code is in-band in BSD labels ? Yes you bet. If it wasn't because the BSD label has so many other restrictions, I would simply change our policy to not allow the overlap: no partitions in the first 16k, end of story. Sure it would take some time to migrate etc etc. But (fortunately) the BSD label has some stupid limitations and I have decided that we should migrate to a more powerful format, and that the GPT format looks like the thing, so I'm just going to leave the BSD label with it's broken semantics as it is. Thanks for asking the question, it prompted me to explain this stuff, and I've cc'ed the doc crew because this answer should probably go into a FAQ/DOC of some kind. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message