Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Mar 2012 17:01:25 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        arch@freebsd.org, Gleb Kurtsou <gleb.kurtsou@gmail.com>, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject:   Re: Prefaulting for i/o buffers
Message-ID:  <20120301150125.GX55074@deviant.kiev.zoral.com.ua>
In-Reply-To: <CAJ-FndAKs-PK7odTMmh2bSkHvTddbUuO=Espzf8sZReT8KhbxQ@mail.gmail.com>
References:  <20120203193719.GB3283@deviant.kiev.zoral.com.ua> <CAJ-FndABi21GfcCRTZizCPc_Mnxm1EY271BiXcYt9SD_zXFpXw@mail.gmail.com> <20120225151334.GH1344@garage.freebsd.pl> <CAJ-FndBBKHrpB1MNJTXx8gkFXR2d-O6k5-HJeOAyv2DznpN-QQ@mail.gmail.com> <20120225194630.GI1344@garage.freebsd.pl> <20120301111624.GB30991@reks> <20120301141247.GE1336@garage.freebsd.pl> <CAJ-FndCSPHLGqkeTC6qiitap_zjgLki%2B8HWta-UxReVvntA9=g@mail.gmail.com> <20120301144708.GV55074@deviant.kiev.zoral.com.ua> <CAJ-FndAKs-PK7odTMmh2bSkHvTddbUuO=Espzf8sZReT8KhbxQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--o+ErJpKw5D0ndpyV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 01, 2012 at 02:50:40PM +0000, Attilio Rao wrote:
> 2012/3/1, Konstantin Belousov <kostikbel@gmail.com>:
> > On Thu, Mar 01, 2012 at 02:32:33PM +0000, Attilio Rao wrote:
> >> 2012/3/1, Pawel Jakub Dawidek <pjd@freebsd.org>:
> >> > On Thu, Mar 01, 2012 at 01:16:24PM +0200, Gleb Kurtsou wrote:
> >> >> On (25/02/2012 20:46), Pawel Jakub Dawidek wrote:
> >> >> > - "Every file system needs cache. Let's make it general, so that =
all
> >> >> > file
> >> >> >   systems can use it!" Well, for VFS each file system is a separa=
te
> >> >> >   entity, which is not the case for ZFS. ZFS can cache one block =
only
> >> >> >   once that is used by one file system, 10 clones and 100 snapsho=
ts,
> >> >> >   which all are separate mount points from VFS perspective.
> >> >> >   The same block would be cached 111 times by the buffer cache.
> >> >>
> >> >> Hmm. But this one is optional. Use vop_cachedlookup (or call
> >> >> cache_entry() on your own), add a number of cache_prune calls. It's
> >> >> pretty much library-like design you describe below.
> >> >
> >> > Yes, namecache is already library-like, but I was talking about the
> >> > buffer cache. I managed to bypass it eventually with suggestions from
> >> > ups@, but for a long time I was sure it isn't at all possible.
> >>
> >> Can you please clarify on this as I really don't understand what you m=
ean?
> >>
> >> >
> >> >> Everybody agrees that VFS needs more care. But there haven't been m=
uch
> >> >> of concrete suggestions or at least there is no VFS TODO list.
> >> >
> >> > Everybody agrees on that, true, but we disagree on the direction we
> >> > should move our VFS, ie. make it more light-weight vs. more
> >> > heavy-weight.
> >>
> >> All I'm saying (and Gleb too) is that I don't see any benefit in
> >> replicating all the vnodes lifecycle at the inode level and in the
> >> filesystem specific implementation.
> >> I don't see a semplification in the work to do, I don't think this is
> >> going to be simpler for a single specific filesystem (without
> >> mentioning the legacy support, which means re-implement inode handling
> >> for every filesystem we have now), we just loose generality.
> >>
> >> if you want a good example of a VFS primitive that was really
> >> UFS-centric and it was mistakenly made generic is vn_start_write() and
> >> sibillings. I guess it was introduced just to cater UFS snapshot
> >> creation and then it poisoned other consumers.
> >
> > vn_start_write() has nothing to do with filesystem code at all.
> > It is purely VFS layer operation, which shall not be called from fs
> > code at all. vn_start_secondary_write() is sometimes useful for the
> > filesystem itself.
> >
> > Suspension (not snapshotting) is very useful and allows to avoid some
> > nasty issues with unmounts, remounts or guaranteed syncing of the
> > filesystem. The fact that only UFS utilizes this functionality just
> > shows that other filesystem implementors do not care about this
> > correctness, or that other filesystems are not maintained.
>=20
> I'm sure that when I looked into it only UFS suspension was being
> touched by it and it was introduced back in the days when snapshotting
> was sanitized.
>=20
> So what are the races it is supposed to fix and other filesystems
> don't care about?

You cannot reliably sync the filesystem when other writers are active.
So, for instance, loop over vnodes fsyncing them in unmount code can never=
=20
terminate. The same is true for remounts rw->ro.

One of the possible solution there is to suspend writers. If unmount is
successfull, writer will get a failure from vn_start_write() call, while
it will proceed normal if unmount is terminated or not started at all.

Another (proper) example of suspension use is gjournal.


--o+ErJpKw5D0ndpyV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk9Pj0UACgkQC3+MBN1Mb4gzZACfYeiuRg03EuxoUfK6NjsPNMbx
Gn4AoIjglsR1+n6ZBjpK4y2BFXmDd1ly
=/m0G
-----END PGP SIGNATURE-----

--o+ErJpKw5D0ndpyV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120301150125.GX55074>