From owner-freebsd-hackers Sun Dec 7 21:50:20 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id VAA01852 for hackers-outgoing; Sun, 7 Dec 1997 21:50:20 -0800 (PST) (envelope-from owner-freebsd-hackers) Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id VAA01844 for ; Sun, 7 Dec 1997 21:50:11 -0800 (PST) (envelope-from julian@whistle.com) Received: (from daemon@localhost) by alpo.whistle.com (8.8.5/8.8.5) id VAA02030; Sun, 7 Dec 1997 21:49:03 -0800 (PST) Received: from UNKNOWN(), claiming to be "current1.whistle.com" via SMTP by alpo.whistle.com, id smtpd002028; Sun Dec 7 21:49:01 1997 Date: Sun, 7 Dec 1997 21:46:34 -0800 (PST) From: Julian Elischer To: hackers@freebsd.org cc: Julian Elischer , mckusick@mckusick.com Subject: [hackers:] Architectural advice needed Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk This starts out discussing a single problem and then goes on to discuss more general problems and ideas.. stick with it.. BDE pointed out a problem in the system that showed up when using my device filesystem. In spec_getpages() the size of the device's blocks is incorrectly deduced from the blocksize of the filesystem in which the device resided (e.g. if you are accessing sd2 with a blocksize of 1K, you will get 512 bytes because /dev/ is in / and THAT is on sd0 and has a blocksize of 512 bytes.) This so obvioously wrong that i'm not worried about whether it SHOULD be fixed, just HOW? The obvious place to store the blocksize is in the specinfo struct pointed to by the vnode for the device. It might be possible to make a request to the device to get this info, but it would require doing an ioctl to the device every time you wanted to do this and that would seem a very slow operation for retrieving a single int. Does anyone have a better place to stash this info? vn->v_blksize (as a macro) #define v_blksize v_specinfo->si_blksize would seem the correct scope and placement for this information. Now, Part 2.. How does this information GET to this location.? It needs to be put there at the time that either 1/ the vnode is allocated. (the same time it's put on the vnode alias list) 2/ the device is openned. In either case, the problem is that there is no easy call that can be made to the device to find out it's blocksize. The deveice drivers are only accessible through the devsw interface, and while there is a 'size' call, there is no 'blksize' call. This leaves the IOCTL interface. Should I just use the 'read disklabel' ioctl, or should there be a separate call of some sort. Open would be the right place except that the open call does not get a vnode as an argument, but, rather dev_t, so it can't fill in the field in the vnode. The lookup() that allocated the vnode cannot do the right thing because it is a vnop for the ufs (or devfs) that HOLDS the device rather than a representative of the device itself. So either the specfs open code should do an ioctl to get the blocksize, or the checkalias() code that is called when a device vnode is allocated, should do this ioctl. One worry is that within the kernel, it is possible to access the device without doing an open() on it, so the checkalias() (or nearby) position owuld be safer, but the open() would seem more correct. Question 3. One raised some time ago by PHK: When a device is 'upgraded' to read-write from read-only, the vnode is consulted, to see it it is permissable, but the device itself is not notified fo the change. If we (phk and myself) think about this and come up with a change for this, would it be considered a useful thing.. FINALLY: I have a long-term thought that eventually dev_t is going to be a rather silly thing. The devsw calls should all get a vnode pointer as the first argument. In this case they can always extract the minor number needed, but they have a way to interact more correctly with the vnode. This would eventually result in device driver implimented vnops. Is this a way to go? Overall it's about 3 months work and I'm really part of the way there already, but I've reached a point where the magnitude of the changes scares me. Not because of the technical problems, but rather because of the political repercussions. If dev_t is redefined as struct devref{ int dr_refs; /* this too is ref counted*/ struct vnode *dr_vn; u_long dr_v_id; /* the capabilty # of the vnode */ /* consumers should check this */ }; typedef struct devref *dev_t with strict reference counting, and a few MACRO's this could be used to transition from one system to the other. the VM system and others that hash on dev_t could hash on some of the contents of the struct above, and a whole lot of things would eventually become simpler. I think there would be a phase when things got a little 'hairy' but overall it makes a lot more sense than what we have at the moment. The big problem I see is how do I do this and keep in touch with the rest of freeBSD? I have DEVFS/SLICE working as a set of patches, and I'd like to have them commited if I can get someone to look them over. But the next stage cannot really be done as a set of patches. DEVFS/SLICE can be in the code and if you don't define DEVFS and SLICE, you don't get ANY changes to what you are running, but the changes I'd like to see are really to big to be feasible that way.. Is there a way in which such a large project can be approached? (and is there anyone else that thinks that this is the way to go?) I really an looking for advice from what I consider to be a very talented and experienced group of CS proffessionals here. SHOULD devices respond directly to VOPS? should dev_t continue to exist as a 'number' that needs to be interpreted? I wonder if there is a forum that we could use for a fuller discussion of this sort of thing? I've discussed this sort of thing with, (at various times) phk, peter, john, david, jkh, brian, terry, kirk, Bill(jolitz) Mike, cgd, theo, charles and others I forget. No-one has ever said "That's stupid", and most have agreed that it would simplify some aspects of how the kernel gets its work done. The question is, how do we get everybody to discuss this sort of thing at one time? How do we decide on an aproach for such a significant change? julian