Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 May 2010 00:04:49 +0300
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        freebsd-fs@FreeBSD.org
Cc:        Jeff Roberson <jeff@FreeBSD.org>
Subject:   Re: SUJ: fsck_ufs: Sparse journal inode
Message-ID:  <86r5lr5qfi.fsf@kopusha.onet>
In-Reply-To: <86bpcwluev.fsf@kopusha.onet> (Mikolaj Golub's message of "Mon\, 03 May 2010 21\:19\:52 %2B0300")
References:  <86bpcwluev.fsf@kopusha.onet>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 03 May 2010 21:19:52 +0300 Mikolaj Golub wrote:

> Hi,
>
> Experimenting with journaled soft-updates on HAST I observed the error when
> fscking fs on the secondary after primary "crash":
>
> # fsck -y -t ufs /dev/hast/tank
> ** /dev/hast/tank
>
> USE JOURNAL?? yes
>
> ** SU+J Recovering /dev/hast/tank
> ** Reading 33554384 byte journal from inode 4.
> fsck_ufs: Sparse journal inode 4 (blocks = 16376, numfrags = 16383).
>
> (The text between the parentheses is a local modification to the fsck code to
> output some useful values).
>
> So to recover I needed to run fsck and type "no" when prompted "USE
> JOURNAL?". But I am looking for a way to script automatic recovering from this
> situation. Currently the only way I have found is to disable journal, run
> fsck, mount fs somewhere temporary, remove .sujournal, unmount, enable
> journal. Is this really so complicated or may I just miss something?
>
> BTW, I used to observe this error on every "crash" test. And "blocks" value was
> always the same: 16376. So I changed journal size to 16376 * 2048 = 33538048.
> It looks like after this the issue has gone.

Actually, this is tunefs who creates a sparse journal :-)

When creating a journal tunefs allocates size/fs_bsize blocks
(journal_alloc(size)). But if the journal size is not multiple of fs_bsize a
block for tail fragments is not allocated and we have sparse file.

Steps to reproduce:

Choose a journal size: (blocksize * N) + fragsize + something. E.g. 4198400
(2048*2048 + 2*2048).

[root@hasta ~]# newfs /dev/$dev
/dev/md0: 10.0MB (20480 sectors) block size 16384, fragment size 2048
        using 4 cylinder groups of 2.52MB, 161 blks, 384 inodes.
super-block backups (for fsck -b #) at:
 160, 5312, 10464, 15616
[root@hasta ~]# tunefs -j enable -S 4198400 /dev/$dev
Using inode 4 in cg 0 for 4198400 byte journal
tunefs: soft updates journaling set
[root@hasta ~]# fsck -f -t ufs /dev/$dev
** /dev/md0

USE JOURNAL?? [yn] y

** SU+J Recovering /dev/md0
** Reading 4198400 byte journal from inode 4.
fsck_ufs: Sparse journal inode 4.

Note, the size should be so that tail has at least one full fragment, because
in the code we have:

        blocks = ino_visit(jip, sujino, suj_add_block, 0);
        if (blocks != numfrags(fs, DIP(jip, di_size)))
                errx(1, "Sparse journal inode %d.\n", sujino);

with only one non-full fragment numfrags() will return the value equal to
blocks.

BTW, I am not sure this check would be correct even if tunefs allocated tail
fragments. As I see in indir_visit() for every found block it adds:

  (*frags) += fs->fs_frag;

so the same would be for the tail block, and ino_visit() in the code above
would return more then numfrags().

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86r5lr5qfi.fsf>