Date: Wed, 27 Mar 1996 12:49:37 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: jerry@border.com (Jerry Kendall) Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: fdisk and partition info Message-ID: <199603271949.MAA01736@phaeton.artisoft.com> In-Reply-To: <96Mar27.104433est.18433-2@janus.border.com> from "Jerry Kendall" at Mar 27, 96 10:43:47 am
next in thread | previous in thread | raw e-mail | index | archive | help
> I have a bit of a puzzle on my hands. As most of you know, I am
> writing a utility to 'fdisk/disklabel/newfs' a hard disk. The puzzle
> I have run across is not a show stopper, but, more of a 'explain this to
> me' sort of thing..
>
> The current layout(due to short sightedness on microsoft's part) of the
> partition table in sector 0 only support 1024 cylinders, 64 sectors, and
> 256 heads. The current layout of my hard disk has 1647 cylinders, 63
> sectors and 16 heads. All of it is for FreeBSD. When I do a 'fdisk wd0'
> it only reports 24MB.. Now then, can I assume the 24MB amount is due to
> the 1024 cylinder limit? Does FreeBSD look at the 'disklabel' or does it
> also look at the partition layout? Does FreeBSD, once loaded, need to be
> concerned about the partition layout ?? After all is said and done, is it
> safe to say the partition table is ONLY used at boot time? If so, this
> would explain way FreeBSD does'nt care about the 24MB reported by fdisk.
[ ... well, this will go on longer than it should, but I don't have
time to write anything smaller ... ]
OK. Here's the scoop.
The BIOS provides the INT 13 raw disk interface, which operates on C/H/S
offsets.
There is also a potential for an "LBA" interface, which is an extension
interface not supported by all hardware.
Now there are five types of translation a BIOS accessed drive may have:
1) No translation. This is best, because the geometry is
invariant of whether you are accessing it via BIOS or via
a protected mode driver).
2) Hardware translation. This is next best, because the
geometry is still invariant of whether you are accessing
it via BIOS or via a protected mode driver. It's only
next best because it is subject to implementation errors;
specifically, the WD1007 sector sparing is flawed because
the controller reports the non-spared sector count when
queried.
3) Linear software translation. This is the mathematical
translation of C/H/S values by the BIOS so that the
interface presented at the upper layer does not exceed
a 'C' of 1024 (the largest cylinder number that can be
transferred to the controller using the INT 13 API).
It is linear because absolute sector addresses are
invariant despite translation. That is, sector 9135
is sector 9135, after the C/H/S multiplication.
4) Non-linear software translation. This is also the
mathematical translation of C/H/S values by the BIOS;
the difference is, that something like sector sparing
or other "media perfection" is stuck in the BIOS, such
that it "goes away" if you use a non-BIOS method to
access the drive. These can't be shared between
protected mode and BIOS-using OS's, or can't be easily
shared, anyway, unless the sparing mathematics used by
the BIOS are duplicated in the driver software for the
non-BIOS driver.
5) C/H/S to "LBA" translation. This is available only on
systems with BIOS support for LBA (either on the controller
ROM, loaded via POST, or in the system BIOS ROM). This
translation of C/H/S to LBA interface is done by an INT
13 redirector. There are two classes of redirector: the
first is in the controller BIOS and is loaded on POST or
integrated with the INT 13 controller firmware. The
second is loaded as part of the boot process from the
boot media (an example is the OnTrack Disk Manager 6.x
and 7.x MBR replaements). For MBR-loaded TSR's for C/H/S
to LBA translation (and generally, geometry translation
in BIOS at the same time), the MBR loader is generally
hidden from the BIOS by having the cylinder it uses
subtracted from the available cylinders it reports, and
by having the resulting sector number biased by 64 (the
translated cylinder size base on number of sectors).
Only one redirector, the OnTrack redirector, is recognized
by FreeBSD, and that's only because they have a fake
partition table with a recognizable ID and the OnTrack
people disclosed this information to the FreeBSD camp.
When any translation is in effect, an fdisk program must create
its DOS partition table partitions with knowledge of the geometry
that the BIOS would see were it to look at the drive. Things like
the OS/2 boot manager insist on partitions starting on cylinder
boundries, etc..
This means that in order to provide a FreeBSD fdisk program, you
must be able to determine the BIOS geometry of the underlying
drive. This is without regard for the underlying implementation
details of the fdisk program itself.
Generally, there are three approaches to solving this problem:
1) Assume the geometry is translated, and that it is
1024/64/256. This is what the current FreeBSD slice
code does, much to the consternation of those of us
who own WD1007 or other non-translating controllers.
2) Interpolate the geometry from an existing DOS partition
table, and the known rules of behaviour for the DOS fdisk
program. Specifically, starting and ending on cylinder
boundries, etc.. Since it is possible to have multiple
geometries result in the same values, especially when you
have a small number of partitions (for instance, one),
this is not a 100% reliable approach -- in general, it is
so *unreliable* that the slice code was invented to replace
it. It's possible to "help" this approach be more accurate
using the 32 bit absolute sector address, which is also in
the partition table. The problem with that is that older
FDISK programs did not generally fill out the 32 bit sector
offset correctly, so determining this information is usually
no more reliable if the offset happens to check a valid
geometry [NB: the FReeBSD FDISK should always fill these
fields out correctly in any case).
3) Ask the BIOS. There are several ways to do this:
a) Implement a VM86() call through which you can do
disk I/O.
b) Implement a multi-stage boot as opposed to a two
stage boot, and use real mode code in the third
stage boot program (probably called "/boot") to
pass the information off to the kernel. This
could be a potential future problem with system
incompatability with the OpenBoot and MultiBoot
standards.
c) Do all BSD partitioning using a real-mode FDISK
loaded under DOS or loaded directly as a "cold
capture" (running it instead of a boot program).
Such "BIOS-begging" would probably require multiple I/O's
to the disk to assemble MD5 checkums (one method) to identify
specific sectors to establish BIOS-drive-number-to-protected-
mode-driver-device-id mappings. Otherwise, you wouldn't be
sure which drive was which, and even when you knew all the
BIOS drive geometries, you couldn't match them to non-BIOS
drives.
My personal opinion is that option 3c is not an option. The reason
I believe this is that I think that partitioning should occur in a
standard framework, and it will be impossible to fit all types of
partitioning into that framework. DOS partition table partitioning
is not the only type of partitioning available.
Other types of partitioning, specifically, DOS Extended partitions,
BSD disklabels, Solaris disklabels, SVR4 slicing, OSF disklabels,
etc. etc., should all be manageable with a single tool, in my opinion.
To achieve this, I would suggest leveraging the devfs code. Specifically,
given an arbitrary raw device, you could ioctl() an fd open to it to
determine the partitioning in place and the partitioning allowable.
Once a partitioning scheme was in place, additional device nodes would
appear in the devfs hierarchy as a result of a call-back from the
create ioctl(). It is possible to generate a flat name space from
this, but I don't think it is desirable for two reasons:
1) The hierarchy could get large fast. For instance, a device
with OnTrack disk manager "partitioning" with DOS partitioning
on top of that with DOS extended partitioning on top of that
with BSD disklabel partitioning on top of that, with a
transparent media perfection layer (bad144) layered on top
of that. The names would get impossibly unwieldy fast.
2) There is no natural recognition of hierarchical ordering in
a flat name space, when in fact what is presented is a logical
on physical driver hierarchy. The n-m mapping of the graph is
too complex to deal with in a flat name space and still present
a uniform user interface. Specifically, if I have an arbitrary
device in a flat name space, am I allowed to add DOS partitioning
to it or not? My argument here is that without an easy way
to traverse the hierarchy of devices to find parents, there
is no easy answer to that question, short of iterating all
devices. Directories are a natural fit. In addition, the
name/depth association is not fixed. For instance, SCSI
devices may be identified by a LUN selector, while IDE
devices may not. There is no easy way to map "offset in name"
to "depth in hierarchy".
An fdisk program based on a controller knowledge of disk "geometry"
for use in applying DOS partitioning semantics, and a generic logical
device driver interface, with each driver knowing to which devices
it may be applied and "recognizing" the devices (ala PP's and LP's
under AIX), would allow a single program for management of all present
and future partitioning schemes (ie: no knowledge is built into the
fdisk program itself). This also buys the ability to implement
logical devices for volume spanning operations (mirroring, striping,
soft RAID, and concatenation), and media perfection (sector sparing
using the same methods a BIOS and applied to the whole drive, or BSD
bad144 style sector sparing which can protect the disklabel as well
as the partition contents).
As for intermediate soloutions, the first problem is, as you've
identified, finding out the BIOS geometry for a given drive to
allow application of DOS disk partitioning and extended partitioning.
Hope this helps.
Regards,
Terry Lambert
terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199603271949.MAA01736>
