Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Jun 2006 21:38:57 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Eric Anderson <anderson@centtech.com>
Cc:        freebsd-fs@freebsd.org, freebsd-current@freebsd.org, freebsd-geom@freebsd.org
Subject:   Re: Journaling UFS with gjournal.
Message-ID:  <20060623193857.GC40269@garage.freebsd.pl>
In-Reply-To: <449C06C6.9070801@centtech.com>
References:  <20060619131101.GD1130@garage.freebsd.pl> <449C06C6.9070801@centtech.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--XWOWbaMNXpFDWE00
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 23, 2006 at 10:20:38AM -0500, Eric Anderson wrote:
+> Pawel Jakub Dawidek wrote:
+> >Hello.
+> >For the last few months I have been working on gjournal project.
+> >To stop confusion right here, I want to note, that this project is not
+> >related to gjournal project on which Ivan Voras was working on the
+> >last SoC (2005).
+> >The lack of journaled file system in FreeBSD was a tendon of achilles
+> >for many years. We do have many file systems, but none with journaling:
+> >- ext2fs (journaling is in ext3fs),
+> >- XFS (read-only),
+> >- ReiserFS (read-only),
+> >- HFS+ (read-write, but without journaling),
+> >- NTFS (read-only).
+> >GJournal was designed to journal GEOM providers, so it actually works
+> >below file system layer, but it has hooks which allow to work with
+> >file systems. In other words, gjournal is not file system-depended,
+> >it can work probably with any file system with minimum knowledge
+> >about it. I implemented only UFS support.
+> >The patches are here:
+> >	http://people.freebsd.org/~pjd/patches/gjournal.patch (for HEAD)
+> >	http://people.freebsd.org/~pjd/patches/gjournal6.patch (for RELENG_6)
+> >To patch your sources you need to:
+> >	# cd /usr/src
+> >	# mkdir sbin/geom/class/journal sys/geom/journal sys/modules/geom/geom=
_journal
+> >	# patch < /path/to/gjournal.patch
+> >Add 'options UFS_GJOURNAL' to your kernel configuration file and
+> >recompile kernel and world.
+> >How it works (in short). You may define one or two providers which
+> >gjournal will use. If one provider is given, it will be used for both -
+> >data and journal. If two providers are given, one will be used for data
+> >and one for journal.
+> >Every few seconds (you may define how many) journal is terminated and
+> >marked as consistent and gjournal starts to copy data from it to the
+> >data provider. In the same time new data are stored in new journal.
+>=20
+> I'm not sure this is happening the way you describe exactly.  On my lapt=
op, while rsyncing my /home partition to a newly created external disk (400=
G), I see 20MB/s writing=20
+> to the journaled UFS2 device (/dev/label/backup.journal) passing through=
 to the journal device (/dev/label/journal), then it switches to no writes =
to the journaled UFS2=20
+> device (/dev/label/backup.journal) (my rsync pauses) while the journaled=
  device (/dev/label/backup) writes at 20MB/s for about 3-10 seconds.

When it is time for journal switch, we cannot switch the journals if we
still copy data from the inactive journal, so we wait then.
You can tune it a bit using those two sysctls:

kern.geom.journal.parallel_flushes - Number of flush I/O requests send
				     in parallel
kern.geom.journal.parallel_copies - Number of copy I/O requests send
				    in parallel

By default those are equal, you may increase the second one or decrease
the first one to tell gjournal to focus more on copying the data from
the inactive journal, so when journal switch time arrives, it doesn't
have to wait.
Before you do it, please consult kern.geom.journal.stats.wait_for_copy
sysctl variable, which will tell you how many times journal switch was
delayed because of inactive journal not beeing fully copied.

More waiting is because a lot of data is only in memory and when I call
file system synchronization all the data go to gjournal provider.

All modes in which UFS can operate are not optimal for gjournal - I mean
here sync, async and SU. The most optimal mode for gjournal will be
something like: send write request immediatelly and don't wait for an
answer. GJournal will take care of reordering write request to get
optimal throughput and this will allow for more balanced load.
For example SU send write requests in picks, which is bad for gjournal.

+> >Let's call the moment in which journal is terminated as "journal switch=
".
+> >Journal switch looks as follows:
+> >1. Start journal switch if we have timeout or if we run out of cache.
+> >   Don't perform journal switch if there were no write requests.
+> >2. If we have file system, synchronize it.
+> >3. Mark file system as clean.
+> >4. Block all write requests to the file system.
+> >5. Terminate the journal.
+> >6. Eventually wait if copying of the previous journal is not yet
+> >   finished.
+>=20
+> Seems like this is the point we are busy in.
+>=20
+> >7. Send BIO_FLUSH request (if the given provider supports it).
+> >8. Mark new journal position on the journal provider.
+> >9. Unblock write requests.
+> >10. Start copying data from the terminated journal to the data provider.
+>=20
+> And it seems that 10 is happening earlier on..

The point number 10 is actually after the journal switch. It is when the
active journal was turned into an inactive journal and the copy starts.

Don't take this order to strict, I more wanted to show what steps are
performed.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--XWOWbaMNXpFDWE00
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFEnENRForvXbEpPzQRAl+vAKC1c+ophEYLProOjQ1373BDyoaFKwCdH15u
Gb6918+pzKh34atzxPrxhnQ=
=opPZ
-----END PGP SIGNATURE-----

--XWOWbaMNXpFDWE00--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060623193857.GC40269>