FreeBSD Mail Archives

Date:      Mon, 12 Feb 2001 14:50:26 -0600
From:      Russell Cattelan <cattelan@thebarn.com>
To:        Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: Design a journalled file system
Message-ID:  <3A884C91.56037FFE@thebarn.com>
References:  <Pine.SOL.4.21.0102121516200.13995-100000@opal>

index | next in thread | previous in thread | raw e-mail


Zhiui Zhang wrote:

> On Mon, 12 Feb 2001, Russell Cattelan wrote:
>
> > > Another difficulty is that if several transactions are in progress at the
> > > same time, we must remember which metadata buffers are modified by which
> > > transactions. When we copy/rename the buffer, we must inform those
> > > transactions the fact that we did the copy/rename.  The buffers modified
> > > by one transaction must be flushed at the same time.
>
> Thanks for your reply. I mean if a transaction locks down all the metadata
> (e.g., bitmap blocks) it modified until it commits, then there is no
> problem (but this reduces concurrency). Otherwise, the same metadata
> blocks can contain modifications done by more than one transaction.

This really isn't a problem... meta data buffers have to be "pinned" but not
necessarily locked. A meta data buffer can be modified many times without
having to be written out to disk, take for example the super block, this will
get flushed out to disk occasionally but since it is being modified so often
most changes never get flushed. A log of each of those changes will
be in every transaction that touched the super block, but the super
block doesn't have to be written out every time.
The primary goal is to have a consistent file system not to be able
to rollback every change that happens.

> I do
> not know how XFS solves this problem.  Since XFS uses B+ tree, I guess
> that locking can be done in a hierarchy way easily to avoid deadlock.
> But in FFS, the bitmap blocks has no relationship with each other. Locking
> the bitmap blocks in FFS in arbitrary order can cause deadlock, I guess.
>
> IBM JFS seems to use incore log implemented as page cache. XFS has
> pagebuf.  I expect that is something similar to IBM's page cache.
>
> > Hmm I'm not sure what the problem is here.
> > A transaction log entry will log all changes necessary to complete
> > that transaction, even if it involves multiple meta data objects, which is
> > almost always does.
> > In the event of a crash and  subsequent replay of the log: the recovery code
> > will make sure all the meta data on the disk is consistent with the log.
> > If one meta data write happened but the another one didn't the recovery
> > code only updates the  one that didn't complete.
> >
> > What is the size of the disk block container on bsd buf_t's ?
> > if they are 64bit we shouldn't have a problem... simply use absolution disk
> > addressing for meta data items.
> > Why would we need  to copy a meta data buf_t?
> >
>
> In sys/buf.h of FreeBSD, it has:
>
>    daddr_t b_lblkno;               /* Logical block number. */
>    daddr_t b_blkno;                /* Underlying physical block number. */
>
> Both are 32-bit integer. I am not sure why it is not 64-bit. Maybe it has

> something to do with merged buffer cache.

Ok good so we have a spot to store the absolute block number... good.
Assuming these are in units of 512 this will work up until 2TB.
Linux has the same 2TB limit problem right now...

>
> -Zhihui

--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A884C91.56037FFE>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation