Date: Wed, 18 Jun 2003 13:53:29 +0200 From: "Poul-Henning Kamp" <phk@phk.freebsd.dk> To: Dmitry Sivachenko <demon@FreeBSD.org> Cc: arch@FreeBSD.org Subject: Re: cvs commit: src/sys/fs/nullfs null.h null_subr.c null_vnops.c Message-ID: <39081.1055937209@critter.freebsd.dk> In-Reply-To: Your message of "Wed, 18 Jun 2003 15:22:26 %2B0400." <20030618112226.GA42606@fling-wing.demos.su>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <20030618112226.GA42606@fling-wing.demos.su>, Dmitry Sivachenko writes : [I've moved this to arch@] >> The main problems with nullfs seem to be locking and trying to create clones >> of the lower vnode (wrt. the VM system and special files). Once kern/51583 > >BTW, what is the reason for creating these clone vnodes? >Why we can't simply return the original vnode? This is a question in the same caliber as a kid asking mom where the babies come from :-) Back in history, when vnodes first appeared as part of stacking filesystems, there were no merged vm/buffer cache. There were also some suboptimal design "decisions" made in the VFS implementation, made to expedite the implementation, but introducing issues which "could be cleaned up later". NFS added a few interesting wrinkles to the vnode area, mostly because it does not follow the model implicitly assumed in the VFS layering. The buffer cache expects a disk device behind all buffers, that took some hacking too. Then we got a semi-merged vm/buffer cache. Semi, becuase it was never finished so it became some sort of hybrid almost but not quite entirely unlike either state. A few filesystems got VOP_GETPAGES, none of them got VOP_PUTPAGES as far as I recall. Then we got softupdates and snapshots, which due to shortcomings in the vm/buf area could not be implemented in the architecturally obvious way, but instead had to put fingers into specfs and the buffer cache to get the job done. All of this have tangled the simple component formerly known as the buffer cache up in so many ways, that it is very hard for anybody to make heads and tails of it any more. So I am tempted to answer you question with: "Because it is all a mess" A number of us heavy-duty people have started to say rude things and do menacing gestures with our flow-diagram templates in the general direction of the buffer cache, but any real solution is unlikely to happen until we are talking 6-current. The cleanup would probably be easier to perform if we could ditch the stuff and layers which have been glued on and reduce the code to its core functionality first, and this may indeed be what we have to do, but considering the list of the stuff which are talking about, it is unlikely to be a politically feasible path to take: vinum -- abuses getebuf(), should be GEOM class. raidframe -- abuses getebuf(), should be GEOM class. cluster code -- must be rewritten snapshots -- must be untangled from the bio path. softupdates -- ditto. unionfs -- does not correctly layer VOP_STRATEGY nullfs -- maybe same problem. swap_pager -- abuses bogus vnode I am hoping that we may be able to carve a path by changing the bio structure operate on vm pages rather than KVM mapped byte arrays (most disk device drivers don't care for thing being mapped, they use bus-master DMA and only need physical location). Next, giving buffers a set of object methods could maybe avoid the detour around VOP_BMAP and VOP_STRATEGY thereby possibly making it possible for softupdates and snapshots to be implemented entirely inside UFS/FFS. I have a couple of other ideas I want to explore as well, one of them being not doing I/O via VCHR vnodes, but either at the fdesc level (when from userland) or via a dedicated API (for disk I/O from buf/vm). But I have only just started seriously investigating how all this can be done, and as I said, it is a royal mess, so it will take time no matter what I and others find. With that said, I will also add, that I will take an incredibly dim view of anybody who tries to add more gunk in this area, and that I am perfectly willing to derail unionfs and nullfs (or pretty much anything else on the list above) if that is what it takes to clean up the buffer cache. Any of those facilities can be reintroduced later on in a cleaner fashion. I agree that nullfs and unionfs are useful technologies, but if they have to be reimplemented to fit our kernel, then so be it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39081.1055937209>