Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Feb 2001 18:49:40 -0600
From:      Russell Cattelan <cattelan@thebarn.com>
To:        Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: Design a journalled file system
Message-ID:  <3A8884A3.FEDB15FE@thebarn.com>
References:  <Pine.SOL.4.21.0102121917080.7164-100000@onyx>

next in thread | previous in thread | raw e-mail | index | archive | help
Zhiui Zhang wrote:

> On Mon, 12 Feb 2001, Terry Lambert wrote:
>
> > > It seems to me that I have failed to explain my point again. So an example
> > > may help. Suppose I have a bitmap block buffer.  One transaction allocate
> > > some blocks from it, the other transaction free some blocks into it. If
> > > the bitmap block buffer is not locked for the duration of a transaction,
> > > then it could contain modifications made both transactions. The atomicity
> > > is violated unless you can make the two transactions merge into one later.
> > > On the other hand, if it is locked for a transaction and that transaction
> > > blocks for some other I/O, then performance will suffer (no one can use
> > > the bitmap block buffer for a while).
> >
> > Russell is right, for XFS, and for most Journalled FS's, where the
> > validity marking on the journal entry (as being the most recent)
> > is the most important thing.  All transactions are written as if
> > by way of a write-through cache of the modification data.
> >
> > In other words, in his world, there's no such conflict between
> > concurrent operations.
>
> I think I got better understanding this time. Each transaction's log entry
> only log changes *made by itself* using logical logging (instead of
> physical logging.  In physical logging, the entire bitmap block will be
> logged, potentially including modifications made by others). From time to
> time, the filesystem will force a sync operation that write the metadata
> in-place to free log space.  All transactions must be finished to reach
> such a sync point. IBM JFS seems to do logging this way.

Yes... From what  I understand of JFS it does logging very similarly to XFS.
Note: In addition to periodic FS syncing log space and subsequently meta data
can be pushed out to disk during high transaction activity,
something called "tail pushing".

XFS logging has been designed and optimized for speed.
The log will guarantee the FS is consistent after a crash, it
does not guarantee no data/transactions have been lost.
Synchronous transactions can be turned on to decrease the
amount of lost information but it significantly hurts performance.
99% of the time the log is just an expensive slow overhead,
async logging reduces the performance hit but does allow for
transactions to be lost.
Always a compromise :-)

>
>
> -Zhihui
>
> > Per a previous post, Soft Updates is all about "unless you can
> > make the two transactions merge into one later".
> >
> > Specifically, if you have a disk block, it's 512b.  An inode on
> > disk is 128b.  This means 4 inodes per block.
> >
> > Similarly, a directory entry block is 512b.  A given block will
> > contain between 1 and 16 directory entries, each of which may
> > be in the process of being manipulated.
> >
> > And so on.
> >
> > Soft Updates keeps a list of modifications to conflicted blocks,
> > in core, and actually makes a copy of the conflicted block, and
> > backs out transaction state, when committing partial transactions.
> > It does this by maintaining a state conflict domain dependency
> > list (which is why Soft Updates are sometimes called Soft
> > Dependencies instead).
> >
> >
> > Practically, for a design, you can generally reduce the domains
> > of conflict by increasing your object sizes to 512b.  This lets
> > you have things like ACL and immediate file support in inodes,
> > which you can then bill as a feature.
> >
> > For the directory entry blocks, the conflict is already somewhat
> > mitigated by the fact that anyone iterating the directory, you
> > make a copy of the block -- it is a snapshot, not the actual
> > directory contents you are iterating.  The NFS "cookie" code
> > for iteration restart is really a kludge; it could have just as
> > easily worked around the difference between on disk and wire and
> > user space directory entries within a given block, by seperating
> > the code into a "copy FS sided unit into snapshot" and "copy data
> > from snapshot into representation buffer" VOPs (I've suggested
> > this many times, and provided the code several).
> >
> > The bottom line is that bitmaps only matter if you implement
> > using bitmaps.  For inescapable conflicts (like the "last
> > modified" or "time of last update" in superblock data, which
> > you must have for recovery following a crash, the easiest method
> > to work around the problem is to log superblocks as well, and
> > then iterate to the "most recent valid", during recovery.
> >
> > Ideally, you probably _do_ want to incorporate Soft Updates
> > technology, since it lets you avoid artificial stalls when you
> > enter into an unavoidable conflict (XFS stalls and drains at
> > those points), but it's not immediately necessary (just don't
> > design against it as a future optimization).
> >
> > I really, really urge FS designers to go back to first principles
> > when examining problems, and to consider FSs as transactions to
> > be applied to persistant state data as a result of events.  If
> > you do that, then protecting the integrity of the persistant
> > state becomes obvious and easy.
> >
> >
> > Actually, this really brings home the license point for XFS,
> > since it should be obvious that it could benefit from soft
> > updates, which it won't get without paying something (like
> > access to its sources in a useful fashion for the BSD community).
> >
> > Yes, I'm still looking for a commercial license that prohibits
> > making XFS a stand-alone product, but still allows it be used
> > in a commercial setting.  The Sun License on the original SLPv1,
> > but fails to grant in perpetuity.  It may be that SGIs lawyer
> > will have to do lawyering to work out one that satisfies them.
> >
> > Hopefully SGI will learn the HP JetSend and the Sun JINI and the
> > Net/1 & Net/2 TCP/IP lesson: if you want something to be standard,
> > you can't control it, and if you control it, it won't be standard.
> >
> > Note: my March 1st offer stands.  I have yet to hear how to get
> > the unencumbered (SGI-only) GPL code... the clock's ticking.
> >
> >
> >                                       Terry Lambert
> >                                       terry@lambert.org

--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A8884A3.FEDB15FE>