Date: Thu, 1 Mar 2012 15:23:21 +0000 From: Attilio Rao <attilio@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: arch@freebsd.org, Gleb Kurtsou <gleb.kurtsou@gmail.com>, Pawel Jakub Dawidek <pjd@freebsd.org> Subject: Re: Prefaulting for i/o buffers Message-ID: <CAJ-FndCoKO9ejs%2BtAjVDMfeg18n4rYxTD8qPZgCXdccdKqV%2B8A@mail.gmail.com> In-Reply-To: <20120301151642.GY55074@deviant.kiev.zoral.com.ua> References: <20120225151334.GH1344@garage.freebsd.pl> <CAJ-FndBBKHrpB1MNJTXx8gkFXR2d-O6k5-HJeOAyv2DznpN-QQ@mail.gmail.com> <20120225194630.GI1344@garage.freebsd.pl> <20120301111624.GB30991@reks> <20120301141247.GE1336@garage.freebsd.pl> <CAJ-FndCSPHLGqkeTC6qiitap_zjgLki%2B8HWta-UxReVvntA9=g@mail.gmail.com> <20120301144708.GV55074@deviant.kiev.zoral.com.ua> <CAJ-FndAKs-PK7odTMmh2bSkHvTddbUuO=Espzf8sZReT8KhbxQ@mail.gmail.com> <20120301150125.GX55074@deviant.kiev.zoral.com.ua> <CAJ-FndA=ETSTLCxG1=6G4D0ypaqQB7pDiC=VO==gDyz1BrRWFA@mail.gmail.com> <20120301151642.GY55074@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
2012/3/1, Konstantin Belousov <kostikbel@gmail.com>: > On Thu, Mar 01, 2012 at 03:11:16PM +0000, Attilio Rao wrote: >> 2012/3/1, Konstantin Belousov <kostikbel@gmail.com>: >> > On Thu, Mar 01, 2012 at 02:50:40PM +0000, Attilio Rao wrote: >> >> 2012/3/1, Konstantin Belousov <kostikbel@gmail.com>: >> >> > On Thu, Mar 01, 2012 at 02:32:33PM +0000, Attilio Rao wrote: >> >> >> 2012/3/1, Pawel Jakub Dawidek <pjd@freebsd.org>: >> >> >> > On Thu, Mar 01, 2012 at 01:16:24PM +0200, Gleb Kurtsou wrote: >> >> >> >> On (25/02/2012 20:46), Pawel Jakub Dawidek wrote: >> >> >> >> > - "Every file system needs cache. Let's make it general, so >> >> >> >> > that >> >> >> >> > all >> >> >> >> > file >> >> >> >> > systems can use it!" Well, for VFS each file system is a >> >> >> >> > separate >> >> >> >> > entity, which is not the case for ZFS. ZFS can cache one >> >> >> >> > block >> >> >> >> > only >> >> >> >> > once that is used by one file system, 10 clones and 100 >> >> >> >> > snapshots, >> >> >> >> > which all are separate mount points from VFS perspective. >> >> >> >> > The same block would be cached 111 times by the buffer cache. >> >> >> >> >> >> >> >> Hmm. But this one is optional. Use vop_cachedlookup (or call >> >> >> >> cache_entry() on your own), add a number of cache_prune calls. >> >> >> >> It's >> >> >> >> pretty much library-like design you describe below. >> >> >> > >> >> >> > Yes, namecache is already library-like, but I was talking about >> >> >> > the >> >> >> > buffer cache. I managed to bypass it eventually with suggestions >> >> >> > from >> >> >> > ups@, but for a long time I was sure it isn't at all possible. >> >> >> >> >> >> Can you please clarify on this as I really don't understand what you >> >> >> mean? >> >> >> >> >> >> > >> >> >> >> Everybody agrees that VFS needs more care. But there haven't been >> >> >> >> much >> >> >> >> of concrete suggestions or at least there is no VFS TODO list. >> >> >> > >> >> >> > Everybody agrees on that, true, but we disagree on the direction >> >> >> > we >> >> >> > should move our VFS, ie. make it more light-weight vs. more >> >> >> > heavy-weight. >> >> >> >> >> >> All I'm saying (and Gleb too) is that I don't see any benefit in >> >> >> replicating all the vnodes lifecycle at the inode level and in the >> >> >> filesystem specific implementation. >> >> >> I don't see a semplification in the work to do, I don't think this >> >> >> is >> >> >> going to be simpler for a single specific filesystem (without >> >> >> mentioning the legacy support, which means re-implement inode >> >> >> handling >> >> >> for every filesystem we have now), we just loose generality. >> >> >> >> >> >> if you want a good example of a VFS primitive that was really >> >> >> UFS-centric and it was mistakenly made generic is vn_start_write() >> >> >> and >> >> >> sibillings. I guess it was introduced just to cater UFS snapshot >> >> >> creation and then it poisoned other consumers. >> >> > >> >> > vn_start_write() has nothing to do with filesystem code at all. >> >> > It is purely VFS layer operation, which shall not be called from fs >> >> > code at all. vn_start_secondary_write() is sometimes useful for the >> >> > filesystem itself. >> >> > >> >> > Suspension (not snapshotting) is very useful and allows to avoid some >> >> > nasty issues with unmounts, remounts or guaranteed syncing of the >> >> > filesystem. The fact that only UFS utilizes this functionality just >> >> > shows that other filesystem implementors do not care about this >> >> > correctness, or that other filesystems are not maintained. >> >> >> >> I'm sure that when I looked into it only UFS suspension was being >> >> touched by it and it was introduced back in the days when snapshotting >> >> was sanitized. >> >> >> >> So what are the races it is supposed to fix and other filesystems >> >> don't care about? >> > >> > You cannot reliably sync the filesystem when other writers are active. >> > So, for instance, loop over vnodes fsyncing them in unmount code can >> > never >> > terminate. The same is true for remounts rw->ro. >> > >> > One of the possible solution there is to suspend writers. If unmount is >> > successfull, writer will get a failure from vn_start_write() call, while >> > it will proceed normal if unmount is terminated or not started at all. >> >> I don't think we implement that right now, IIRC, but it is an interesting >> idea. > > What don't we implement right now ? Take a look at r183074 (Sep 2008). Ah sorry, I looked into it before 2008 effectively (and that also reminds me why I stopped working on removing that primitive from VFS and make it UFS specific one) :) However why we cannot make a fix like that in domount()/dounmount() directly for every R/W filesystem? Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCoKO9ejs%2BtAjVDMfeg18n4rYxTD8qPZgCXdccdKqV%2B8A>