From owner-freebsd-arch@FreeBSD.ORG Sat Feb 25 19:47:51 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2C53106564A; Sat, 25 Feb 2012 19:47:51 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id E76478FC08; Sat, 25 Feb 2012 19:47:50 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id A00D5A9E; Sat, 25 Feb 2012 20:47:48 +0100 (CET) Date: Sat, 25 Feb 2012 20:46:31 +0100 From: Pawel Jakub Dawidek To: Attilio Rao Message-ID: <20120225194630.GI1344@garage.freebsd.pl> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120225151334.GH1344@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="IbA9xpzOQlG26JSn" Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Konstantin Belousov , arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2012 19:47:51 -0000 --IbA9xpzOQlG26JSn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote: > Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek ha scrit= to: > > I personal opinion about rangelocks and many other VFS features we > > currently have is that it is good idea in theory, but in practise it > > tends to overcomplicate VFS. > > > > I'm in opinion that we should move as much stuff as we can to individual > > file systems. We try to implement everything in VFS itself in hope that > > this will simplify file systems we have. It then turns out only one file > > system is really using this stuff (most of the time it is UFS) and this > > is PITA for all the other file systems as well as maintaining VFS. VFS > > became so complicated over the years that there are maybe few people > > that can understand it, and every single change to VFS is a huge risk of > > potentially breaking some unrelated parts. >=20 > I think this is questionable due to the following assets: > - If the problem is filesystems writers having trouble in > understanding the necessary locking we should really provide cleaner > and more complete documentation. One would think the same with our VM > subsystem, but at least in that case there is plenty of comments that > help understanding how to deal with vm_object, vm_pages locking during > their lifelines. Documentation is not the answer here. If the code is so complex it is harder to learn, no matter how good the documentation is, it makes less people willing to learn it in the first place and it makes the code more buggy, because there are more edge/special cases you can forget about. > - Our primitives may be more complicated than the > 'all-in-the-filesystem' one, but at least they offer a complete and > centralized view over the resources we have allocated in the whole > system and they allow building better policies about how to manage > them. One problem I see here, is that those policies are not fully > implemented, tuned or just got outdated, removing one of the highest > beneficial that we have by making vnodes so generic Again, this is only nice theory, that is far from being the reality. You will never be able to have control on all the resources allocated by file systems. > About the thing I mentioned myself: > - As long as the same path now has both range-locking and vnode > locking I don't see as a good idea to keep both separated forever. > Merging them seems to me an important evolution (not only helping > shrinking the number of primitives themselves but also introducing > less overhead and likely rewamped scalability for vnodes (but I think > this needs a deep investigation). > - About ZFS rangelocks absorbing the VFS ones, I think this is a minor > point, but still, if you think it can be done efficiently and without > loosing performance I don't see why not do that. You already wrote > rangelocks for ZFS, so you are have earned a big experience in this > area and can comment on fallouts, etc., but I don't see a good reason > to not do that, unless it is just too difficult. This is not about > generalizing a new mechanism, it is using a general mechanism in a > specific implementation, if possible. I did not implement rangelocking for ZFS. It came with ZFS when I ported it. Until we want to merge changes from upstream (which is now IllumOS) we don't want to make huge changes just for the sake of proving that this is general purpose mechanism used by more than one file system. Attilio, don't get me wrong. In 99% cases it is good to make code more general and more universal and reusable, but we can't ignore reality. There are reasons why file systems like XFS, ReiserFS and others where never fully ported. I'm not saying VFS complexity was the only reason, but I'm sure it was one of them. Our VFS is very UFS-centric. We make so many assumptions that sounds fine only for UFS. I saw plenty of those while working on ZFS, like: - "Every file system needs cache. Let's make it general, so that all file systems can use it!" Well, for VFS each file system is a separate entity, which is not the case for ZFS. ZFS can cache one block only once that is used by one file system, 10 clones and 100 snapshots, which all are separate mount points from VFS perspective. The same block would be cached 111 times by the buffer cache. - "rmdir(2) on a mountpoint is bad idea, let's deny it at VFS level." It is bad idea, indeed, but in ZFS it is a nice way to remove snapshot by rmdiring .zfs/snapshot/ directory. - Noone implemented rangelocking in VFS, so no file system can use it. Even if the given file system has all the code to do it. etc. I'm also sure it will be way easier for Jeff to make VFS MP-safe if it was less complex. When looking at the big picture, it would be nice to have all this general stuff like rangelocking, quota, buffer cache, etc. as some kind of libraries for file systems to use and not something that is mandatory. If I develop a file system for FreeBSD only and I don't want to reinvent the wheel, I can use those libraries. If I port file system to FreeBSD or develop a file system that doesn't really need those libraries I'm not forced to use them. All this might make a good working group subject at BSDCan devsummit. We could cross swords there:) --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --IbA9xpzOQlG26JSn Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk9JOpYACgkQForvXbEpPzR1vgCgva2ltveZ/GgGBjfFj6J741hq IWIAn3WtFCQ7GjKfS6OwyiZTTKXQbfTG =/aPq -----END PGP SIGNATURE----- --IbA9xpzOQlG26JSn--