Date: Tue, 5 Sep 2000 18:02:19 +0700 (ALMST) From: Boris Popov <bp@butya.kz> To: freebsd-fs@freebsd.org Cc: dillon@freebsd.org, semenu@freebsd.org, tegge@freebsd.org Subject: CFR: nullfs, vm_objects and locks... (patch) Message-ID: <Pine.BSF.4.10.10009051705530.79991-100000@lion.butya.kz>
next in thread | raw e-mail | index | archive | help
Hello, Last few days I've spent trying make nullfs really functional and stable. There are many issues with the current nullfs code, but below I'll try to outline the most annoying ones. The first one, is an inability to handle mmap() operation. This comes from the VM/vnode_pager design where each vm_object associated with a single vnode and vise versa. Looking at the problem in general one may note, that stackable filesystems may have either separated vm_object per layer or don't have it at all. Since nullfs essentially maps its vnodes to underlying filesystem, it is reasonable to map all operations to underlying vnode. The above goal can be reached in two ways (at least): 1. Create a special vm pager (call it null_pager) which will map all operations to underlying vm_object. 2. Give each filesystem control over vm_object handling (eg, creating/destroying and access). I've played with both variants and later seems to be more simple to implement. To do this we need three additional VOPs: VOP_CREATEVOBJECT(struct vnode *vp); VOP_DESTROYVOBJECT(struct vnode *vp); VOP_GETVOBJECT(struct vnode *vp, struct vm_object **obj); First operation lets filesystem to create its own vm object, second destroy it and third - return the reference to the real vm_object. The rest of VFS/BIO/VM code should be adapted to not access v_object field directly, but use VOP_GETVOBJECT() call. In this way each layer can easily achieve cache coherency without any additional code. For example nullfs just returns reference to underlying vm_object and the rest of the code flows as usually. It is important that the above way gives full coherency no matter which layer modified data. The second big issue for vnode-based stacks of filesystems is a vnode state synchronization and tightly related problem - vnode locking. This is really weird problem but NetBSD guys seems to solve locking part of it. So I'll just explain their way because it works and works fine. At this moment vnode locking mechanism looks not quite good due to the lack of unification. nullfs requires strict synchronization of locking states across all layers, so each underlying filesystem should provide access to vnode locking structure and have ability to share it. Currently, each filesystem can either allocate its own lock structure or let VFS allocate one. This makes inconsistency with vnode.v_vnlock handling. The simple solution for it is to integrate lock structure into vnode and making vnode.v_vnlock initially point to it. Any filesystem above can pickup the pointer and assign it to v_vnlock field in its own vnode structure. If underlying filesystem doesn't provide any lock structure then bad times are coming and each layer can maintain its own lock state. As I said before, proper locking of vnodes is also tightly related and requires VFS changes to provide necessary information. This information is very important in the VOP_LOOKUP() functions because here all sorts of deadlocks can occur. This is done via new flag (PDIRUNLOCK) in the namei structure which should be carefully maintained by underlying filesystem when it locks or unlocks parent vnode. I've adapted both VFS and UFS code to handle it looking at how NetBSD done this task. Ok, I think it is enough for introduction and here is a link to the patch against recent -CURRENT: http://www.butya.kz/~bp/n9sys.diff With this patch I was able to do a 'make -j4 world' on top of the nullfs mounts as well as some mmap() related tests. Comments are welcome. P.S. Two hours ago Sheldon Hearn told me that Tor Egge and Semen Ustimenko worked together on the nullfs problem, but since discussion were private I didn't know anything about it and probably stepped on their to toes with my recent cleanup commit :( -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.10009051705530.79991-100000>