Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 09 Sep 2011 17:31:49 +0700
From:      Eugene Grosbein <egrosbein@rdtc.ru>
To:        FreeBSD Stable <freebsd-stable@freebsd.org>
Cc:        pjd@freebsd.org
Subject:   Re: gmirror+gjournal often makes inconsistens file systems
Message-ID:  <4E69EB15.50808@rdtc.ru>
In-Reply-To: <4E69A152.6090408@rdtc.ru>
References:  <4E69A152.6090408@rdtc.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Dear Pawel Jakub,

09.09.2011 12:17, Eugene Grosbein writes:
> Hi!
> 
> For long time I experience same UFS2 filesystem problems with several 8.2 systems
> running on gmirror+gjournal+async. In case of unclean shutdown, kernel panic or power failure
> gjournal makes fsck skip its checks and that's why I use it.
> 
> But quite often my /var partition (and sometimes others) still has severe damage in it
> and running with such /var mounted read-write leads to another panics or hangs and so on.
> 
> For example, I have such 8.2-STABLE system with ad4 and ad6 drives combined to /dev/mirror/gm0.
> I have just removed ad6 from the mirror, ran fsck -y manually for all its filesystems,
> shut down this machine again cleanly and booted it next time from ad6
> while keeping mirror with ad4 not mounted nor checked.
> 
> Then, I ran fsck -y /dev/mirror/gm0.journals1e (/var on the mirrored drive)
> and got LOTS of bad errors on presumably clean file system.
> Of course, I've seen the same errors while checking ad6 after it was removed from running mirror.
> I have auto-sync gmirror feature turned ON. I've tried to turn it OFF but that just
> increase frequency of such damages not fixed after reboot.
> 
> It seems that gjournal cannot handle system crashes reliably, can it?
> I basically run in without any manual tuning. I've also tried to tune it - without luck,
> it works nice when there are no unclean shutdowns but it's here to deal with them in the first place.
> 
> # fsck -t ffs -y /dev/mirror/gm0.journals1e
> ** /dev/mirror/gm0.journals1e
> ** Last Mounted on /var
> ** Phase 1 - Check Blocks and Sizes
> 3955872 DUP I=989242
> 3955873 DUP I=989242
> 3955874 DUP I=989242
> 3955875 DUP I=989242
> 3955876 DUP I=989242
> 3955877 DUP I=989242
> 3955878 DUP I=989242
> 3955879 DUP I=989242
> 3955880 DUP I=989242
> 3955881 DUP I=989242
> 3955882 DUP I=989242
> EXCESSIVE DUP BLKS I=989242
> CONTINUE? yes
> 
> INCORRECT BLOCK COUNT I=989242 (448 should be 424)
> CORRECT? yes
> 
> 3955888 DUP I=989289
> 3955889 DUP I=989289
> 3955890 DUP I=989289
> 3955891 DUP I=989289
> 3955892 DUP I=989289
> 3955893 DUP I=989289
> 3955894 DUP I=989289
> 3955895 DUP I=989289
> ** Phase 1b - Rescan For More DUPS
> 3955872 DUP I=989242
> 3955873 DUP I=989242
> 3955874 DUP I=989242
> 3955875 DUP I=989242
> 3955876 DUP I=989242
> 3955877 DUP I=989242
> 3955878 DUP I=989242
> 3955879 DUP I=989242
> 3955880 DUP I=989242
> 3955881 DUP I=989242
> 3955888 DUP I=989242
> 3955889 DUP I=989242
> 3955890 DUP I=989242
> 3955891 DUP I=989242
> 3955892 DUP I=989242
> 3955893 DUP I=989242
> 3955894 DUP I=989242
> 3955895 DUP I=989242
> ** Phase 2 - Check Pathnames
> DUP/BAD  I=989289  OWNER=root MODE=100640
> SIZE=14367 MTIME=Sep  9 11:30 2011 
> FILE=/log/kernel.log
> 
> REMOVE? yes
> 
> DUP/BAD  I=989242  OWNER=root MODE=100640
> SIZE=202631 MTIME=Sep  8 19:52 2011 
> FILE=/log/mpd.log.0
> 
> REMOVE? yes
> 
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> UNREF FILE I=376866  OWNER=root MODE=140666
> SIZE=0 MTIME=Sep  5 12:27 2011 
> CLEAR? yes
> 
> UNREF FILE I=376868  OWNER=root MODE=140666
> 
> UNREF FILE I=376868  OWNER=root MODE=140666
> SIZE=0 MTIME=Sep  7 20:30 2011
> CLEAR? yes
> 
> UNREF FILE I=376869  OWNER=root MODE=140666
> SIZE=0 MTIME=Sep  8 11:17 2011
> CLEAR? yes
> 
> UNREF FILE I=376870  OWNER=root MODE=140666
> SIZE=0 MTIME=Sep  8 12:11 2011
> CLEAR? yes
> 
> BAD/DUP FILE I=989242  OWNER=root MODE=100640
> SIZE=202631 MTIME=Sep  8 19:52 2011
> CLEAR? yes
> 
> UNREF FILE  I=989259  OWNER=root MODE=100640
> SIZE=648 MTIME=Aug 27 00:00 2011
> RECONNECT? yes
> 
> BAD/DUP FILE I=989289  OWNER=root MODE=100640
> SIZE=14367 MTIME=Sep  9 11:30 2011
> CLEAR? yes
> LINK COUNT FILE I=989293  OWNER=root MODE=100640
> SIZE=961 MTIME=Sep  9 11:26 2011  COUNT 1 SHOULD BE 2
> ADJUST? yes
> 
> UNREF FILE  I=989327  OWNER=root MODE=100640
> SIZE=114 MTIME=Aug 27 00:00 2011
> RECONNECT? yes
> 
> ** Phase 5 - Check Cyl groups
> FREE BLK COUNT(S) WRONG IN SUPERBLK
> SALVAGE? yes
> 
> SUMMARY INFORMATION BAD
> SALVAGE? yes
> 
> BLK(S) MISSING IN BIT MAPS
> SALVAGE? yes
> 
> 1188 files, 90007 used, 4987072 free (360 frags, 623339 blocks, 0.0%
> fragmentation)
> 
> ***** FILE SYSTEM IS CLEAN *****
> 
> ***** FILE SYSTEM WAS MODIFIED *****

Please explain if such partitioning is supported?
physical drive - geom_mirror - geom_journal - geom_part_mbr - geom_part_bsd - journalled UFS2

If not, mounting such UFS2 should warn us, shouldn't it?
No warnings now.

Eugene Grosbein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E69EB15.50808>