From owner-freebsd-arch@FreeBSD.ORG Thu Mar 1 11:46:55 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0AD59106566C; Thu, 1 Mar 2012 11:46:55 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 493D58FC16; Thu, 1 Mar 2012 11:46:54 +0000 (UTC) Received: by bkcjc3 with SMTP id jc3so505027bkc.13 for ; Thu, 01 Mar 2012 03:46:53 -0800 (PST) Received-SPF: pass (google.com: domain of gleb.kurtsou@gmail.com designates 10.112.10.169 as permitted sender) client-ip=10.112.10.169; Authentication-Results: mr.google.com; spf=pass (google.com: domain of gleb.kurtsou@gmail.com designates 10.112.10.169 as permitted sender) smtp.mail=gleb.kurtsou@gmail.com; dkim=pass header.i=gleb.kurtsou@gmail.com Received: from mr.google.com ([10.112.10.169]) by 10.112.10.169 with SMTP id j9mr2285243lbb.70.1330602413304 (num_hops = 1); Thu, 01 Mar 2012 03:46:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=YwAEYHPwri+M8qawFfwO1wxQGn4Fyk+2Z3ik2ui6j6c=; b=wUahYKlnXH/SmwD/aCBXzcM6OZy2jpQoAcUITlSyAelIeLK3igpYopnu6B/2pxqfj8 rYxij/kWrEKmAHNToItBzDSf6eEKfuIKjQ77cl23pXkLp0PHtIRk5iUxUuSuzbV/+Z7S kaaAARi8zt/DvRoMtjx7qzDv5PmbXrVV9+HgQ= Received: by 10.112.10.169 with SMTP id j9mr1820289lbb.70.1330600584015; Thu, 01 Mar 2012 03:16:24 -0800 (PST) Received: from localhost ([78.157.92.5]) by mx.google.com with ESMTPS id b3sm2460510lby.7.2012.03.01.03.16.22 (version=SSLv3 cipher=OTHER); Thu, 01 Mar 2012 03:16:22 -0800 (PST) Date: Thu, 1 Mar 2012 13:16:24 +0200 From: Gleb Kurtsou To: Pawel Jakub Dawidek Message-ID: <20120301111624.GB30991@reks> References: <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <20120225151334.GH1344@garage.freebsd.pl> <20120225194630.GI1344@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20120225194630.GI1344@garage.freebsd.pl> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Attilio Rao , Konstantin Belousov , arch@freebsd.org Subject: Re: Prefaulting for i/o buffers X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Mar 2012 11:46:55 -0000 On (25/02/2012 20:46), Pawel Jakub Dawidek wrote: > On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote: > > Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek ha scritto: > > > I personal opinion about rangelocks and many other VFS features we > > > currently have is that it is good idea in theory, but in practise it > > > tends to overcomplicate VFS. > > > > > > I'm in opinion that we should move as much stuff as we can to individual > > > file systems. We try to implement everything in VFS itself in hope that > > > this will simplify file systems we have. It then turns out only one file > > > system is really using this stuff (most of the time it is UFS) and this > > > is PITA for all the other file systems as well as maintaining VFS. VFS > > > became so complicated over the years that there are maybe few people > > > that can understand it, and every single change to VFS is a huge risk of > > > potentially breaking some unrelated parts. > > > > I think this is questionable due to the following assets: > > - If the problem is filesystems writers having trouble in > > understanding the necessary locking we should really provide cleaner > > and more complete documentation. One would think the same with our VM > > subsystem, but at least in that case there is plenty of comments that > > help understanding how to deal with vm_object, vm_pages locking during > > their lifelines. > > Documentation is not the answer here. If the code is so complex it is > harder to learn, no matter how good the documentation is, it makes less > people willing to learn it in the first place and it makes the code more > buggy, because there are more edge/special cases you can forget about. > > > - Our primitives may be more complicated than the > > 'all-in-the-filesystem' one, but at least they offer a complete and > > centralized view over the resources we have allocated in the whole > > system and they allow building better policies about how to manage > > them. One problem I see here, is that those policies are not fully > > implemented, tuned or just got outdated, removing one of the highest > > beneficial that we have by making vnodes so generic > > Again, this is only nice theory, that is far from being the reality. > You will never be able to have control on all the resources allocated by > file systems. > > > About the thing I mentioned myself: > > - As long as the same path now has both range-locking and vnode > > locking I don't see as a good idea to keep both separated forever. > > Merging them seems to me an important evolution (not only helping > > shrinking the number of primitives themselves but also introducing > > less overhead and likely rewamped scalability for vnodes (but I think > > this needs a deep investigation). > > - About ZFS rangelocks absorbing the VFS ones, I think this is a minor > > point, but still, if you think it can be done efficiently and without > > loosing performance I don't see why not do that. You already wrote > > rangelocks for ZFS, so you are have earned a big experience in this > > area and can comment on fallouts, etc., but I don't see a good reason > > to not do that, unless it is just too difficult. This is not about > > generalizing a new mechanism, it is using a general mechanism in a > > specific implementation, if possible. > > I did not implement rangelocking for ZFS. It came with ZFS when I ported > it. Until we want to merge changes from upstream (which is now IllumOS) > we don't want to make huge changes just for the sake of proving that > this is general purpose mechanism used by more than one file system. > > Attilio, don't get me wrong. In 99% cases it is good to make code more > general and more universal and reusable, but we can't ignore reality. > > There are reasons why file systems like XFS, ReiserFS and others where > never fully ported. I'm not saying VFS complexity was the only reason, > but I'm sure it was one of them. > > Our VFS is very UFS-centric. We make so many assumptions that sounds > fine only for UFS. I saw plenty of those while working on ZFS, like: > > - "Every file system needs cache. Let's make it general, so that all file > systems can use it!" Well, for VFS each file system is a separate > entity, which is not the case for ZFS. ZFS can cache one block only > once that is used by one file system, 10 clones and 100 snapshots, > which all are separate mount points from VFS perspective. > The same block would be cached 111 times by the buffer cache. Hmm. But this one is optional. Use vop_cachedlookup (or call cache_entry() on your own), add a number of cache_prune calls. It's pretty much library-like design you describe below. > > - "rmdir(2) on a mountpoint is bad idea, let's deny it at VFS level." > It is bad idea, indeed, but in ZFS it is a nice way to remove snapshot > by rmdiring .zfs/snapshot/ directory. > > - Noone implemented rangelocking in VFS, so no file system can use it. > Even if the given file system has all the code to do it. > > etc. > > I'm also sure it will be way easier for Jeff to make VFS MP-safe if it > was less complex. Everybody agrees that VFS needs more care. But there haven't been much of concrete suggestions or at least there is no VFS TODO list. > When looking at the big picture, it would be nice to have all this > general stuff like rangelocking, quota, buffer cache, etc. as some kind > of libraries for file systems to use and not something that is > mandatory. If I develop a file system for FreeBSD only and I don't want > to reinvent the wheel, I can use those libraries. If I port file system > to FreeBSD or develop a file system that doesn't really need those > libraries I'm not forced to use them. Are you aware of a real "libraries for file systems" VFS example? It sounds very interesting but I'm afraid it's going to look good only in theory. E.g. locking at file system level (Darwin, Dragonfly BSD) looks rather messy (IMHO) and more likely to be bug prone. On the other side Linux has optional per file system rename lock making VOP_RENAME implementation much easier, while ours is tremendously difficult to do right. > All this might make a good working group subject at BSDCan devsummit. > We could cross swords there:) Unfortunately I'm afraid I won't make there too. And most likely will miss EuroBSD/MeetBSD 2012 in Warsaw as well. I have a number of fresh ideas about namecache I'd love to discuss. What do you think about organising preliminary group meeting on fs@ or arch@? :) > > -- > Pawel Jakub Dawidek http://www.wheelsystems.com > FreeBSD committer http://www.FreeBSD.org > Am I Evil? Yes, I Am! http://tupytaj.pl