From owner-freebsd-hackers Tue Dec 9 14:13:08 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id OAA10842 for hackers-outgoing; Tue, 9 Dec 1997 14:13:08 -0800 (PST) (envelope-from owner-freebsd-hackers) Received: from home.gtcs.com (home.gtcs.com [206.54.69.238]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id OAA10825 for ; Tue, 9 Dec 1997 14:12:58 -0800 (PST) (envelope-from bruce@gtcs.com) Received: from gtcs.com (localhost.gtcs.com [127.0.0.1]) by home.gtcs.com (8.8.5/8.8.5) with ESMTP id PAA07923 for ; Tue, 9 Dec 1997 15:09:44 -0700 (MST) Disposition-Notification-To: bgingery@gtcs.com Message-Id: <199712092209.PAA07923@home.gtcs.com> Date: Tue, 9 Dec 1997 15:09:42 -0700 (MST) From: bgingery@gtcs.com Reply-To: bgingery@gtcs.com Subject: Re: blocksize on devfs entries (and related) To: hackers@FreeBSD.ORG In-Reply-To: <199712082322.PAA27177@hub.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Julian Elischer alerted: -> In spec_getpages() the size of the device's blocks is incorrectly -> deduced from the blocksize of the filesystem in which the device -> resided ... This so obvioously wrong that i'm not worried about -> whether it SHOULD be fixed, just HOW? [munch] -> How does this information GET to this location.? [munch] -> When a device is 'upgraded' to read-write from read-only, the vnode -> is consulted, to see it it is permissable, but the device itself is -> not notified fo the change. Theoretically, the physical layout of the device should be stored whether or not there's any filesystem on it. Besides, an optimizing filesystem may *not* match the parameters of the device. Slew, may be intrinsic on the device low-level formatting, or may be as high as the filesystem on it. Logical blocksize MAY or MAY NOT be a multiple or sub-multiple of the physical sector (or other block) size. With read-aheads and large device buffering, the optimal physical handling size for a device may be quite different from its basic blocksize, but constraints of a specific filesystem above that may cause it to need to store values that are NOT directly related to the device parameters. Yet, especially with slicing, each slice creation needs hardware info AND possibly enclosing slice info, both for the best slice arrangement and for passing through to a filesystem creation routine. I see this devfs as some departure from previous handling, and a move in the right direction. Yet, let's not loose anything that's there in the move, and let's try to take up anything that's been overlooked in the past. To me some answers to these ... 1. physical block/sector size needs to be stored by DEVICE this may or may not match the logical blocksize of any filesystem resident on the device. Optimal transfer blocksize for each of read and write ALSO need to be stored. 2. physical layout (sect/track, tracks/cyl) also needs to be stored for any DASD. Also any OTHER known info which may be used to optimize the filesystem building process for the device, such as rotational speed, seek timing .. If this is not stored with driver info in the devfs, then some pointer or common reference point should be made to the "file entry" that contains the info. 3. If at the controller level it is possible to concatinate or RAID join devices, that information needs to be stored for the device. If this is intrinsic to the device driver or the physical device - no matter. 4. If the device is "virtual", built on a vnode structure with variable-sized (or as is more common today, FIXED size) underlying file, this needs to be known. I don't think I've seen variable-sized-devices anyplace but on NeXT's old swapfile structure, yet. With various emulators, I can see this becoming a VERY useful thing. I've seen times recently when it would be handy to have a secondary swap in a variable-sized file, as primary swap is on my old NeXTcube! 5. Some kind of "relative timing" metric should be avaliable for the device, and separately for writing and reading. 6. When a device is opened ro, if the underlying hardware has ANY indication that it's a ro open, then if it is later upgraded there should at least be a hook for it to be notified that it has been upgraded. Current state (ro/rw) should be avaialable to user processes without "testing it by opening a write file" to a filesystem (or even raw device). Other thoughts. Especially WRT possible experimental work, and emulators, it will be QUITE convenient to have everything that can be used to optimize the construction of a filesystem (of any of many many kinds) or slice-out and construct a filesystem. As wine, dosemu and bochs (to just name three) expand the emulations supporting other OSs, being free with filesystems for those OSs, other than purely "native" becomes all the more important. SoftPC/SoftWindows and Bochs both create internally what amounts to a FAT filesystem within a file - a vnode filesystem, but not using system provisions for it. That pretty well eliminates "device" access to the filesystem and (e.g.) doing a mount_msdos on 'em for other processing and data exchange, without adapting the emulator's code to *parallel* what we've already got with FreeBSD. Ideally, wine, softpc, dosemu, bochs, mtools, and mount_msdos (etc) would have NO idea what the device is on where the FAT filesystem resides. That is not the business of that "layer". Similarly for a MINIX or OS/2 under bochs, etc, for the filesystem in use. These would not care if it's a whole drive somewhere, a slice of a drive, slice of a slice, or virtual filesystem that resides totally within a file in a UFS. Yet, why deny these the optimization information which will allow them to map (within the constraints of their architecture) a new filesystem for best throughput, if it's actually available. Now let me raise some additional questions -- Should a DASD be mappable ONLY with horizontal slices? With what we're all doing today, it seems that taking a certain number of cylinders for slices is best - but other access methods may find an underlying physical structure more convenient if a slice specifies a range of heads and cylinders that do NOT presume that all heads/cylinders from starting to ending according to physical layout are part of the same slice. It may be quite convenient to have a cluster of heads across physical devices forming a logical device or slice, without fully dedicating those physical devices to that use. And, I'll mention again, DISK formats are not the only random-access mass-storage formats on the horizon! I'm guessing that for speed of inclusion into product lines, all will emulate a disk drive - but that may not be the most efficient way of using them (in fact, probably not). They also can be expected to have "direct access" methods according to their physical architecture, with some form of tree-access the MOST efficient! Finally - one of the most powerful potentials of the devfs is handling non-DASD devices! The connecting or turning-on of a device (nic/fax/printer/external-modem/scanner/parallel-to-parallel conn- ection to another PC, even industrial controls of some kind) SHOULD cause it to "arrive". If its turn-on generates a signal that can be caught by a minimal driver, that may trigger a load of a full driver (arrival event) and its inclusion in the devfs listings. Similarly, killing such a device might trigger an immediate or delayed unloading of the same driver, and removal from the devfs. Bruce Gingery