Date: Wed, 27 Mar 1996 12:49:37 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: jerry@border.com (Jerry Kendall) Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: fdisk and partition info Message-ID: <199603271949.MAA01736@phaeton.artisoft.com> In-Reply-To: <96Mar27.104433est.18433-2@janus.border.com> from "Jerry Kendall" at Mar 27, 96 10:43:47 am
next in thread | previous in thread | raw e-mail | index | archive | help
> I have a bit of a puzzle on my hands. As most of you know, I am > writing a utility to 'fdisk/disklabel/newfs' a hard disk. The puzzle > I have run across is not a show stopper, but, more of a 'explain this to > me' sort of thing.. > > The current layout(due to short sightedness on microsoft's part) of the > partition table in sector 0 only support 1024 cylinders, 64 sectors, and > 256 heads. The current layout of my hard disk has 1647 cylinders, 63 > sectors and 16 heads. All of it is for FreeBSD. When I do a 'fdisk wd0' > it only reports 24MB.. Now then, can I assume the 24MB amount is due to > the 1024 cylinder limit? Does FreeBSD look at the 'disklabel' or does it > also look at the partition layout? Does FreeBSD, once loaded, need to be > concerned about the partition layout ?? After all is said and done, is it > safe to say the partition table is ONLY used at boot time? If so, this > would explain way FreeBSD does'nt care about the 24MB reported by fdisk. [ ... well, this will go on longer than it should, but I don't have time to write anything smaller ... ] OK. Here's the scoop. The BIOS provides the INT 13 raw disk interface, which operates on C/H/S offsets. There is also a potential for an "LBA" interface, which is an extension interface not supported by all hardware. Now there are five types of translation a BIOS accessed drive may have: 1) No translation. This is best, because the geometry is invariant of whether you are accessing it via BIOS or via a protected mode driver). 2) Hardware translation. This is next best, because the geometry is still invariant of whether you are accessing it via BIOS or via a protected mode driver. It's only next best because it is subject to implementation errors; specifically, the WD1007 sector sparing is flawed because the controller reports the non-spared sector count when queried. 3) Linear software translation. This is the mathematical translation of C/H/S values by the BIOS so that the interface presented at the upper layer does not exceed a 'C' of 1024 (the largest cylinder number that can be transferred to the controller using the INT 13 API). It is linear because absolute sector addresses are invariant despite translation. That is, sector 9135 is sector 9135, after the C/H/S multiplication. 4) Non-linear software translation. This is also the mathematical translation of C/H/S values by the BIOS; the difference is, that something like sector sparing or other "media perfection" is stuck in the BIOS, such that it "goes away" if you use a non-BIOS method to access the drive. These can't be shared between protected mode and BIOS-using OS's, or can't be easily shared, anyway, unless the sparing mathematics used by the BIOS are duplicated in the driver software for the non-BIOS driver. 5) C/H/S to "LBA" translation. This is available only on systems with BIOS support for LBA (either on the controller ROM, loaded via POST, or in the system BIOS ROM). This translation of C/H/S to LBA interface is done by an INT 13 redirector. There are two classes of redirector: the first is in the controller BIOS and is loaded on POST or integrated with the INT 13 controller firmware. The second is loaded as part of the boot process from the boot media (an example is the OnTrack Disk Manager 6.x and 7.x MBR replaements). For MBR-loaded TSR's for C/H/S to LBA translation (and generally, geometry translation in BIOS at the same time), the MBR loader is generally hidden from the BIOS by having the cylinder it uses subtracted from the available cylinders it reports, and by having the resulting sector number biased by 64 (the translated cylinder size base on number of sectors). Only one redirector, the OnTrack redirector, is recognized by FreeBSD, and that's only because they have a fake partition table with a recognizable ID and the OnTrack people disclosed this information to the FreeBSD camp. When any translation is in effect, an fdisk program must create its DOS partition table partitions with knowledge of the geometry that the BIOS would see were it to look at the drive. Things like the OS/2 boot manager insist on partitions starting on cylinder boundries, etc.. This means that in order to provide a FreeBSD fdisk program, you must be able to determine the BIOS geometry of the underlying drive. This is without regard for the underlying implementation details of the fdisk program itself. Generally, there are three approaches to solving this problem: 1) Assume the geometry is translated, and that it is 1024/64/256. This is what the current FreeBSD slice code does, much to the consternation of those of us who own WD1007 or other non-translating controllers. 2) Interpolate the geometry from an existing DOS partition table, and the known rules of behaviour for the DOS fdisk program. Specifically, starting and ending on cylinder boundries, etc.. Since it is possible to have multiple geometries result in the same values, especially when you have a small number of partitions (for instance, one), this is not a 100% reliable approach -- in general, it is so *unreliable* that the slice code was invented to replace it. It's possible to "help" this approach be more accurate using the 32 bit absolute sector address, which is also in the partition table. The problem with that is that older FDISK programs did not generally fill out the 32 bit sector offset correctly, so determining this information is usually no more reliable if the offset happens to check a valid geometry [NB: the FReeBSD FDISK should always fill these fields out correctly in any case). 3) Ask the BIOS. There are several ways to do this: a) Implement a VM86() call through which you can do disk I/O. b) Implement a multi-stage boot as opposed to a two stage boot, and use real mode code in the third stage boot program (probably called "/boot") to pass the information off to the kernel. This could be a potential future problem with system incompatability with the OpenBoot and MultiBoot standards. c) Do all BSD partitioning using a real-mode FDISK loaded under DOS or loaded directly as a "cold capture" (running it instead of a boot program). Such "BIOS-begging" would probably require multiple I/O's to the disk to assemble MD5 checkums (one method) to identify specific sectors to establish BIOS-drive-number-to-protected- mode-driver-device-id mappings. Otherwise, you wouldn't be sure which drive was which, and even when you knew all the BIOS drive geometries, you couldn't match them to non-BIOS drives. My personal opinion is that option 3c is not an option. The reason I believe this is that I think that partitioning should occur in a standard framework, and it will be impossible to fit all types of partitioning into that framework. DOS partition table partitioning is not the only type of partitioning available. Other types of partitioning, specifically, DOS Extended partitions, BSD disklabels, Solaris disklabels, SVR4 slicing, OSF disklabels, etc. etc., should all be manageable with a single tool, in my opinion. To achieve this, I would suggest leveraging the devfs code. Specifically, given an arbitrary raw device, you could ioctl() an fd open to it to determine the partitioning in place and the partitioning allowable. Once a partitioning scheme was in place, additional device nodes would appear in the devfs hierarchy as a result of a call-back from the create ioctl(). It is possible to generate a flat name space from this, but I don't think it is desirable for two reasons: 1) The hierarchy could get large fast. For instance, a device with OnTrack disk manager "partitioning" with DOS partitioning on top of that with DOS extended partitioning on top of that with BSD disklabel partitioning on top of that, with a transparent media perfection layer (bad144) layered on top of that. The names would get impossibly unwieldy fast. 2) There is no natural recognition of hierarchical ordering in a flat name space, when in fact what is presented is a logical on physical driver hierarchy. The n-m mapping of the graph is too complex to deal with in a flat name space and still present a uniform user interface. Specifically, if I have an arbitrary device in a flat name space, am I allowed to add DOS partitioning to it or not? My argument here is that without an easy way to traverse the hierarchy of devices to find parents, there is no easy answer to that question, short of iterating all devices. Directories are a natural fit. In addition, the name/depth association is not fixed. For instance, SCSI devices may be identified by a LUN selector, while IDE devices may not. There is no easy way to map "offset in name" to "depth in hierarchy". An fdisk program based on a controller knowledge of disk "geometry" for use in applying DOS partitioning semantics, and a generic logical device driver interface, with each driver knowing to which devices it may be applied and "recognizing" the devices (ala PP's and LP's under AIX), would allow a single program for management of all present and future partitioning schemes (ie: no knowledge is built into the fdisk program itself). This also buys the ability to implement logical devices for volume spanning operations (mirroring, striping, soft RAID, and concatenation), and media perfection (sector sparing using the same methods a BIOS and applied to the whole drive, or BSD bad144 style sector sparing which can protect the disklabel as well as the partition contents). As for intermediate soloutions, the first problem is, as you've identified, finding out the BIOS geometry for a given drive to allow application of DOS disk partitioning and extended partitioning. Hope this helps. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199603271949.MAA01736>