Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 May 2010 16:11:04 -1000 (HST)
From:      Jeff Roberson <jroberson@jroberson.net>
To:        Mikolaj Golub <to.my.trociny@gmail.com>
Cc:        freebsd-fs@FreeBSD.org, Jeff Roberson <jeff@FreeBSD.org>
Subject:   Re: SUJ: fsck_ufs: Sparse journal inode
Message-ID:  <alpine.BSF.2.00.1005141610120.1398@desktop>
In-Reply-To: <86r5lr5qfi.fsf@kopusha.onet>
References:  <86bpcwluev.fsf@kopusha.onet> <86r5lr5qfi.fsf@kopusha.onet>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 5 May 2010, Mikolaj Golub wrote:

> On Mon, 03 May 2010 21:19:52 +0300 Mikolaj Golub wrote:
>
>> Hi,
>>
>> Experimenting with journaled soft-updates on HAST I observed the error when
>> fscking fs on the secondary after primary "crash":
>>
>> # fsck -y -t ufs /dev/hast/tank
>> ** /dev/hast/tank
>>
>> USE JOURNAL?? yes
>>
>> ** SU+J Recovering /dev/hast/tank
>> ** Reading 33554384 byte journal from inode 4.
>> fsck_ufs: Sparse journal inode 4 (blocks = 16376, numfrags = 16383).
>>
>> (The text between the parentheses is a local modification to the fsck code to
>> output some useful values).
>>
>> So to recover I needed to run fsck and type "no" when prompted "USE
>> JOURNAL?". But I am looking for a way to script automatic recovering from this
>> situation. Currently the only way I have found is to disable journal, run
>> fsck, mount fs somewhere temporary, remove .sujournal, unmount, enable
>> journal. Is this really so complicated or may I just miss something?

I will add an option to skip the journal.

>>
>> BTW, I used to observe this error on every "crash" test. And "blocks" value was
>> always the same: 16376. So I changed journal size to 16376 * 2048 = 33538048.
>> It looks like after this the issue has gone.
>
> Actually, this is tunefs who creates a sparse journal :-)
>
> When creating a journal tunefs allocates size/fs_bsize blocks
> (journal_alloc(size)). But if the journal size is not multiple of fs_bsize a
> block for tail fragments is not allocated and we have sparse file.

You are absolutely right.  This only happens on filesystems smaller than 
16GB when we hit the max journal size.  I will resolve it immediately.

Thanks,
Jeff

>
> Steps to reproduce:
>
> Choose a journal size: (blocksize * N) + fragsize + something. E.g. 4198400
> (2048*2048 + 2*2048).
>
> [root@hasta ~]# newfs /dev/$dev
> /dev/md0: 10.0MB (20480 sectors) block size 16384, fragment size 2048
>        using 4 cylinder groups of 2.52MB, 161 blks, 384 inodes.
> super-block backups (for fsck -b #) at:
> 160, 5312, 10464, 15616
> [root@hasta ~]# tunefs -j enable -S 4198400 /dev/$dev
> Using inode 4 in cg 0 for 4198400 byte journal
> tunefs: soft updates journaling set
> [root@hasta ~]# fsck -f -t ufs /dev/$dev
> ** /dev/md0
>
> USE JOURNAL?? [yn] y
>
> ** SU+J Recovering /dev/md0
> ** Reading 4198400 byte journal from inode 4.
> fsck_ufs: Sparse journal inode 4.
>
> Note, the size should be so that tail has at least one full fragment, because
> in the code we have:
>
>        blocks = ino_visit(jip, sujino, suj_add_block, 0);
>        if (blocks != numfrags(fs, DIP(jip, di_size)))
>                errx(1, "Sparse journal inode %d.\n", sujino);
>
> with only one non-full fragment numfrags() will return the value equal to
> blocks.
>
> BTW, I am not sure this check would be correct even if tunefs allocated tail
> fragments. As I see in indir_visit() for every found block it adds:
>
>  (*frags) += fs->fs_frag;
>
> so the same would be for the tail block, and ino_visit() in the code above
> would return more then numfrags().
>
> -- 
> Mikolaj Golub
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1005141610120.1398>