From owner-freebsd-stable@FreeBSD.ORG Fri Sep 9 10:31:56 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71694106566B; Fri, 9 Sep 2011 10:31:56 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id 81DB28FC15; Fri, 9 Sep 2011 10:31:55 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p89AVs3f036137; Fri, 9 Sep 2011 17:31:54 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E69EB15.50808@rdtc.ru> Date: Fri, 09 Sep 2011 17:31:49 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: FreeBSD Stable References: <4E69A152.6090408@rdtc.ru> In-Reply-To: <4E69A152.6090408@rdtc.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: pjd@freebsd.org Subject: Re: gmirror+gjournal often makes inconsistens file systems X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Sep 2011 10:31:56 -0000 Dear Pawel Jakub, 09.09.2011 12:17, Eugene Grosbein writes: > Hi! > > For long time I experience same UFS2 filesystem problems with several 8.2 systems > running on gmirror+gjournal+async. In case of unclean shutdown, kernel panic or power failure > gjournal makes fsck skip its checks and that's why I use it. > > But quite often my /var partition (and sometimes others) still has severe damage in it > and running with such /var mounted read-write leads to another panics or hangs and so on. > > For example, I have such 8.2-STABLE system with ad4 and ad6 drives combined to /dev/mirror/gm0. > I have just removed ad6 from the mirror, ran fsck -y manually for all its filesystems, > shut down this machine again cleanly and booted it next time from ad6 > while keeping mirror with ad4 not mounted nor checked. > > Then, I ran fsck -y /dev/mirror/gm0.journals1e (/var on the mirrored drive) > and got LOTS of bad errors on presumably clean file system. > Of course, I've seen the same errors while checking ad6 after it was removed from running mirror. > I have auto-sync gmirror feature turned ON. I've tried to turn it OFF but that just > increase frequency of such damages not fixed after reboot. > > It seems that gjournal cannot handle system crashes reliably, can it? > I basically run in without any manual tuning. I've also tried to tune it - without luck, > it works nice when there are no unclean shutdowns but it's here to deal with them in the first place. > > # fsck -t ffs -y /dev/mirror/gm0.journals1e > ** /dev/mirror/gm0.journals1e > ** Last Mounted on /var > ** Phase 1 - Check Blocks and Sizes > 3955872 DUP I=989242 > 3955873 DUP I=989242 > 3955874 DUP I=989242 > 3955875 DUP I=989242 > 3955876 DUP I=989242 > 3955877 DUP I=989242 > 3955878 DUP I=989242 > 3955879 DUP I=989242 > 3955880 DUP I=989242 > 3955881 DUP I=989242 > 3955882 DUP I=989242 > EXCESSIVE DUP BLKS I=989242 > CONTINUE? yes > > INCORRECT BLOCK COUNT I=989242 (448 should be 424) > CORRECT? yes > > 3955888 DUP I=989289 > 3955889 DUP I=989289 > 3955890 DUP I=989289 > 3955891 DUP I=989289 > 3955892 DUP I=989289 > 3955893 DUP I=989289 > 3955894 DUP I=989289 > 3955895 DUP I=989289 > ** Phase 1b - Rescan For More DUPS > 3955872 DUP I=989242 > 3955873 DUP I=989242 > 3955874 DUP I=989242 > 3955875 DUP I=989242 > 3955876 DUP I=989242 > 3955877 DUP I=989242 > 3955878 DUP I=989242 > 3955879 DUP I=989242 > 3955880 DUP I=989242 > 3955881 DUP I=989242 > 3955888 DUP I=989242 > 3955889 DUP I=989242 > 3955890 DUP I=989242 > 3955891 DUP I=989242 > 3955892 DUP I=989242 > 3955893 DUP I=989242 > 3955894 DUP I=989242 > 3955895 DUP I=989242 > ** Phase 2 - Check Pathnames > DUP/BAD I=989289 OWNER=root MODE=100640 > SIZE=14367 MTIME=Sep 9 11:30 2011 > FILE=/log/kernel.log > > REMOVE? yes > > DUP/BAD I=989242 OWNER=root MODE=100640 > SIZE=202631 MTIME=Sep 8 19:52 2011 > FILE=/log/mpd.log.0 > > REMOVE? yes > > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > UNREF FILE I=376866 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 5 12:27 2011 > CLEAR? yes > > UNREF FILE I=376868 OWNER=root MODE=140666 > > UNREF FILE I=376868 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 7 20:30 2011 > CLEAR? yes > > UNREF FILE I=376869 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 8 11:17 2011 > CLEAR? yes > > UNREF FILE I=376870 OWNER=root MODE=140666 > SIZE=0 MTIME=Sep 8 12:11 2011 > CLEAR? yes > > BAD/DUP FILE I=989242 OWNER=root MODE=100640 > SIZE=202631 MTIME=Sep 8 19:52 2011 > CLEAR? yes > > UNREF FILE I=989259 OWNER=root MODE=100640 > SIZE=648 MTIME=Aug 27 00:00 2011 > RECONNECT? yes > > BAD/DUP FILE I=989289 OWNER=root MODE=100640 > SIZE=14367 MTIME=Sep 9 11:30 2011 > CLEAR? yes > LINK COUNT FILE I=989293 OWNER=root MODE=100640 > SIZE=961 MTIME=Sep 9 11:26 2011 COUNT 1 SHOULD BE 2 > ADJUST? yes > > UNREF FILE I=989327 OWNER=root MODE=100640 > SIZE=114 MTIME=Aug 27 00:00 2011 > RECONNECT? yes > > ** Phase 5 - Check Cyl groups > FREE BLK COUNT(S) WRONG IN SUPERBLK > SALVAGE? yes > > SUMMARY INFORMATION BAD > SALVAGE? yes > > BLK(S) MISSING IN BIT MAPS > SALVAGE? yes > > 1188 files, 90007 used, 4987072 free (360 frags, 623339 blocks, 0.0% > fragmentation) > > ***** FILE SYSTEM IS CLEAN ***** > > ***** FILE SYSTEM WAS MODIFIED ***** Please explain if such partitioning is supported? physical drive - geom_mirror - geom_journal - geom_part_mbr - geom_part_bsd - journalled UFS2 If not, mounting such UFS2 should warn us, shouldn't it? No warnings now. Eugene Grosbein