From owner-freebsd-current@FreeBSD.ORG Thu Apr 26 11:55:25 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 68AA216A400; Thu, 26 Apr 2007 11:55:25 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from mh2.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.freebsd.org (Postfix) with ESMTP id 44B7C13C489; Thu, 26 Apr 2007 11:55:25 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from neutrino.centtech.com (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.8/8.13.8) with ESMTP id l3QBtKtT020869; Thu, 26 Apr 2007 06:55:20 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <46309328.2010606@freebsd.org> Date: Thu, 26 Apr 2007 06:55:20 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.0 (X11/20070420) MIME-Version: 1.0 To: Rong-en Fan References: <6eb82e0704251124o53b7bc1aq9836a20ee06fcd11@mail.gmail.com> <462FA330.2020605@freebsd.org> <6eb82e0704260132o3ace4048ld29e3f110dd06c14@mail.gmail.com> In-Reply-To: <6eb82e0704260132o3ace4048ld29e3f110dd06c14@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/3164/Thu Apr 26 04:13:10 2007 on mh2.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.3 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh2.centtech.com Cc: FreeBSD Current , pjd@freebsd.org Subject: Re: panic: Journal overflow X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Apr 2007 11:55:25 -0000 On 04/26/07 03:32, Rong-en Fan wrote: > On 4/26/07, Eric Anderson wrote: >> On 04/25/07 13:24, Rong-en Fan wrote: >>> This is a i386 current SMP box as of Apr 14 or 15. Got >>> a panic with geom journal. >>> >>> panic: Journal overflow (joffset=3246704015360 active=3246705852416 inactive=324 >>> cpuid = 2 >>> KDB: stack backtrace: >>> db_trace_self_wrapper(c06b3e2c,e5321944,c04fc86e,c06c70c8,2,...) at db_trace_sel >>> kdb_backtrace(c06c70c8,2,c06ad0d2,e5321950,5,...) at kdb_backtrace+0x2f >>> panic(c06ad0d2,eea3b800,2f3,eebfc000,2f3,...) at panic+0x11f >>> g_journal_check_overflow(c4fc5c00,cb030a00,eb,95428000,eb,...) at g_journal_chec >>> g_journal_flush(c4fc5c00,0,eb,95428000,eb,...) at g_journal_flush+0x60d >>> g_journal_add_current(c4fc5c00,c9cb0948,ca8e818c,c4fc5c00,e5321cbc,...) at g_jou >>> g_journal_release_delayed(c4fc5c00,0,ca8e818c,c4fdadc0,2,...) at g_journal_relea >>> g_journal_flush_send(c4fc5c00,c8521c60,205d0000,1b2,205d4000,...) at g_journal_f >>> g_journal_worker(c4fc5c00,e5321d38,0,0,0,...) at g_journal_worker+0x7f7 >>> fork_exit(c04b5813,c4fc5c00,e5321d38) at fork_exit+0x83 >>> fork_trampoline() at fork_trampoline+0x8 >>> --- trap 0, eip = 0, esp = 0xe5321d70, ebp = 0 --- >>> >>> Sorry, that I don't have core dump available. Will set up next time. >>> After the panic, the system hangs at single user prompt: >>> >>> Trying to mount root from ufs:/dev/da0s1a >>> WARNING: / was not properly dismounted >>> Loading configuration files. >>> kernel dumps on /dev/da0s1b >>> Entropy harvesting: interrupts ethernet point_to_point kickstart. >>> swapon: adding /dev/da0s1b as swap device >>> Starting file system checks: >>> /dev/da0s1a: 1704 files, 56880 used, 196935 free (631 frags, 24538 blocks, 0.2% >>> Can't stat /dev/concat/data.journal: No such file or directory >>> /dev/da0s1f: 237 files, 1208 used, 11877856 free (64 frags, 1484724 blocks, 0.0% >>> GEOM_JOURNAL: Journal concat/data consistent. >>> /dev/da0s1e: UNREF FILE I=548536 OWNER=root MODE=100644 >>> /dev/da0s1e: SIZE=717466 MTIME=Apr 4 07:58 2007 (CLEARED) >>> /dev/da0s1e: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED) >>> /dev/da0s1e: SUMMARY INFORMATION BAD (SALVAGED) >>> /dev/da0s1e: BLK(S) MISSING IN BIT MAPS (SALVAGED) >>> /dev/da0s1e: 190105 files, 763850 used, 2281197 free (36557 frags, 280580 blocks >>> /dev/da0s1d: 17507 files, 40158 used, 972857 free (641 frags, 121527 blocks, 0.1 >>> THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: >>> ufs: /dev/concat/data.journal (/data) >>> Unknown error; help! >>> AEnter full pathname of shell or RETURN for /bin/sh: >>> # mount -a >>> WARNING: R/W mount of /backup denied. Filesystem is not clean - run fsck >>> mount: /dev/concat/data.journal : Operation not permitted >>> # fsck_ffs -p /dev/concat/data.journal >>> >>> I need to issue 'fsck_ffs -p' myself... any idea about why this happens? >>> >>> The geom journal setup: >>> >>> Geom name: gjournal 68372861 >>> ID: 68372861 >>> Providers: >>> 1. Name: concat/data.journal >>> Mediasize: 3246137539072 (3.0T) >>> Sectorsize: 512 >>> Mode: r1w1e1 >>> Consumers: >>> 1. Name: concat/data >>> Mediasize: 3247211281408 (3.0T) >>> Sectorsize: 512 >>> Mode: r1w1e1 >>> Jend: 3247211280896 >>> Jstart: 3246137539072 >>> Role: Data,Journal >>> >>> The gconcat consists two scsi disk (actually, it's raid) da0 and da1. >>> Oh no, it panics with journal overflow again while writing this message :( >>> >>> The data.journal is shared by nfs, and there are two boxes that are >>> doing a tar writing operation on this partition. >> You need to change your journal switch and cache switch times, to >> something like this: >> >> kern.geom.journal.force_switch=50 >> kern.geom.journal.cache.switch=75 >> >> Try that and see if that eases your pain a bit. > > This does not help :( > In the past two hours, I tried tuning this two sysctls a bit. > The result is panic over 10 times :( How low did you try them? > I have to remove gjournal on concat/data. I do this following > > gjournal stop concat/data > gjournal clear concat/data > tunefs -J disable /dev/concat/data > > If I want to turn on gjournal someday, can I do it without > recreate the filesystem? Without doing any research, I would say 'sure', because I can't think of a reason why not. I think just relabeling it, then turning it on via tunefs would do it.. Eric