From owner-freebsd-current Thu Apr 3 18:38:20 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id SAA08640 for current-outgoing; Thu, 3 Apr 1997 18:38:20 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id SAA08628 for ; Thu, 3 Apr 1997 18:38:17 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id TAA18256 for current@freebsd.org; Thu, 3 Apr 1997 19:21:29 -0700 From: Terry Lambert Message-Id: <199704040221.TAA18256@phaeton.artisoft.com> Subject: Re: DISCUSS: system open file table To: current@freebsd.org Date: Thu, 3 Apr 1997 19:21:23 -0700 (MST) In-Reply-To: <199704040127.RAA06069@root.com> from "David Greenman" at Apr 3, 97 05:27:00 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > >But... currently, a vnode reference is not the same thing as an open > >reference. > > Actually, for all practical purposes, it is. Ideally, everything in the > kernel would do a "VOP_OPEN" (actually, vn_open) for internal file I/O (such > as coredumps)...and I think we actually do now. There was a time in the > past where this wasn't the case, however. Is this new since the Lite2 merge? My Lite2 tree is not on this machine, so I can't check very easily. If it is, I need to back off until I've had a chance to look at the Lite2 stuff... Actually, the vnode being returned is being returned by VOP_LOOKUP, by way of the namei() call. The VOP_OPEN is one of those few "veto" based interfaces that actually works. The open calls the ufs_open() in ufs_vnops.c, and is basically there to veto the ability to zap the "append only" files. One problem I have with this is that the VOP_LOOKUP calls the generic kernel allocation code, and the deallocation code is called by the same layer that called the VOP_OPEN. I would really rather that if you call a VOP to allocate a vnode, you call a VOP to free one. We can discuss whether or not the VFS should be consuming a kernel vnode pool management interface in another thread; if the interface is reflexive, it doesn't matter because that consumption is opaque. If the vnode reference instance *was* the open instance, I'd be OK with leaving the interface at the VOP_ layer... though it still makes it difficult to perform an open in the kernel for a file in the FS proper, because VOP_'s are per FS, which is why we have namei(). The vn_open() soloution for this problem is not very nice, because it assumes that it will be called in a process context... I can't just pass it a manifest SUSER credential. The system open file table entry is really just a credential holder in the kernel case, and it makes it easier to deal with the idea of revoking a vnode: because the reference is to the system open file table entry instead of the vnode, you can revoke the vnode that the entry points to without notifying the people who refernced it until the go to access the now invalid vnode. If they have a vnode pointer instead, they have to be able to "check it" to see if its valid, or be notified. There's no real clean failure on refernce. So effectively, it's not only a credential holder, its a handle that can be invalidated out from under it. This is the same thing that happens to a user space process in the case of a forcible unmount of an FS where it has a file open. > >Also, for things like a CIFS/SMBFS/AppleTalk/NetWare client, I want > >to be able to use credentials which are not BSD process credentials, > >but user credentials. > > I don't think this makes any sense. Process credentials are an instance > of user credentials in the kernel. It lets me look up my user credentials indirectly, as root. This lets me have a "password cache", either "unlocked by user credentials" or stored in a session manager. This lets me have seperate "alien" credentials than some other process, but I can use the same connection to the server for multiple user sessions. I know this workds for CIFS Kerberos tickets; I admit, I think that an SMBFS (Samba, etc.) client would need a server connection per user; on the other hand, it could virtualize these (say having a maximum pool of 10 active connections) and using the credential lookup, make another connection on the requesting users behalf, after discarding the LRU list tail of the 10. For NetWare, which handles multiple session over a single connection for OS/2 and NT clients, it should work on one connection (though sessions might want to be pooled). It may also be that the session ticket was supplied by NDS or some other directory server (LDAP? X.500?) and not be a Kerberos ticket at all; so we can't just "handle it all the same". > >I want to make the distinction between a cached reference and a real > >reference, as well, both for the file I/O in a kernel thread, and > >to get around some of the recent problems that come from VOP_LOCK > >handling two types of locks. > > Hmmm. I agree with one thing: the current kludge of having vnodes with a > "0" reference count + vnode generation count in the cache seems very wrong to > me. It's done this way because of the handsprings one must do for NFS (and > presumably any other "stateless" filesystem, which can't hold an "open" > instance)... Yes, and it's complicated by a relatively high turnover, though this would probably tail off a lot if the vnode were FS associative instead of in a global pool. The SVR4 soloution for the name cache (which has similar problems) is to flush the cache by vnode (or by VFS, when an FS is unmounted). The NFS problem is less of an issue if the VFS handles cookie generation a bit more intelligently, and doesn't use the vp to do it. This is also a "vp is FS associative" argument... the NFS file handle lookup is done using the VFS OP FHTOVP to invoke a per FS "*_fhtovp" function, so the NFS wiring is all already there. > >Finally, there is the issue of taking faults in kernel mode without > >a process context to sleep on. I'd like to see the sleeping moved to > >the address of the field in the system open file table,so that the > >sleep handle doesn't have to know what kind of caller is making the > >call. > > Hmmm. Yes, I can see how this would be useful, but on the other hand, you > have to have some saved state (whether that is a kernel thread or a process > context), and any notion of kernel threads in FreeBSD (which I think is highly > unlikely to ever occur) is going to have to deal with sleeping anyway...so > I don't see a problem here. (Note: don't confuse this statement with kernel > support for user threads, which IS very likely to occur in FreeBSD's near > future). For kernel threading, the idea would be to allocate a context for the call; this would include a kernel stack, etc.. It's roughly exactly what you would need to support an async call gate for system calls (asyscall instead of syscall, and then operate of the same sysent[] table, with another flag for "CALL_CAN_BLOCK") to support call conversion for a full user space POSIX threading implementation. I actually think there was a kernel threading implementation posted to the SMP list a while back -- I know that one was done for FreeBSD, in any case, so I can probably dig it out from somewhere, even if it wasn't the SMP list. But I agree that supporting a future kernel threading implementation isn't the primary reason for doing this. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.