From owner-freebsd-stable@FreeBSD.ORG  Thu Aug 31 19:01:36 2006
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9A84816A4DA
	for <freebsd-stable@freebsd.org>; Thu, 31 Aug 2006 19:01:36 +0000 (UTC)
	(envelope-from kramer@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 329F543D45
	for <freebsd-stable@freebsd.org>; Thu, 31 Aug 2006 19:01:35 +0000 (GMT)
	(envelope-from kramer@centtech.com)
Received: from [10.177.171.221] (roddick.centtech.com [10.177.171.221])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7VJ1YrA034169
	for <freebsd-stable@freebsd.org>; Thu, 31 Aug 2006 14:01:34 -0500 (CDT)
	(envelope-from kramer@centtech.com)
Message-ID: <44F7320E.6040608@centtech.com>
Date: Thu, 31 Aug 2006 14:01:34 -0500
From: Kevin Kramer <kramer@centtech.com>
User-Agent: Thunderbird 1.5.0.5 (X11/20060802)
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.87.1/1782/Thu Aug 31 11:54:15 2006 on
	mh1.centtech.com
X-Virus-Status: Clean
Subject: gjournal questions
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: kramer@centtech.com
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Aug 2006 19:01:36 -0000

Pavel,

running 6.1-stable with these patches
rebuilt kernel/world as of 8/28 @ 2p CST w/ these patches

gjournal6_20060808.patch
vfs_subr.c.3.patch

the backend RAID presents 4 luns, this is how we config'd it.
da1 - 8G
da2 - ~897G
da3 - 8G
da4 - ~897G

da2/4 have been partitioned in FreeBSD, then we did the following

gjournal label -v /dev/da2 /dev/da1
gjournal label -v /dev/da4 /dev/da3
newfs -U -L "scr09" /dev/da2.journal
newfs -U -L "scr10" /dev/da4.journal

so  1 -8 G journal for each data device.

now that the server is under load i'm seeing NFS not responding messages 
on my clients. the message corresponds to the gjournal suspend/copy 
operation, causing my clients to hang or give "no such file or directory".

we copied 137G to /scr10 and it just finished, could this be some 
remains of writes from the journal?

here is the time correlation

Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002798s
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
14.030198s
Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:55:33 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000013s
Aug 31 13:55:44 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000013s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.000010s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.000009s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.000007s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002302s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr10: 
0.029769s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr10: 
0.035259s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr10: 
10.109732s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 
0.002756s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
10.182759s
Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.
Aug 31 13:56:14 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000012s
Aug 31 13:56:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 
0.000011s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 
0.000010s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 
0.000009s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 
0.000007s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal.
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 
0.002364s
Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Data has been copied.

from syslog server

Aug 31 13:55:23 <user.notice> bowltest4 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:23 <user.notice> bowltest4 kernel: nfs: server donkey OK
Aug 31 13:55:23 <user.notice> laybox32 kernel: nfs: server donkey OK
Aug 31 13:55:29 <user.notice> b-115-4 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:29 <user.notice> b-115-4 kernel: nfs: server donkey OK
Aug 31 13:55:56 <user.notice> b-116-16 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:56 <user.notice> b-204-40 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> b-116-16 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> lic2 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> b-204-40 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> lic2 kernel: nfs: server donkey OK
Aug 31 13:55:57 <user.notice> laybox29 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:57 <user.notice> laybox26 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 <user.notice> laybox19 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:55:58 <user.notice> laybox37 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:00 <user.notice> laybox19 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox26 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox37 kernel: nfs: server donkey OK
Aug 31 13:56:00 <user.notice> laybox29 kernel: nfs: server donkey OK
Aug 31 13:56:05 <daemon.info> ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state not responding
Aug 31 13:56:05 <daemon.info> ws-119-8 amd[2640]: file server 
donkey20.centtech.com, type nfs, state ok
Aug 31 13:56:36 <user.notice> b-116-17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:36 <user.notice> b-116-17 kernel: nfs: server donkey OK
Aug 31 13:56:40 <user.notice> b-210-17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:41 <user.notice> b-204-41 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:41 <user.notice> laybox17 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:44 <user.notice> b-204-38 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:44 <user.notice> b-204-38 kernel: nfs: server donkey OK
Aug 31 13:56:44 <user.notice> bowltest3 kernel: nfs: server donkey not 
responding, still trying
Aug 31 13:56:46 <user.notice> b-210-17 kernel: nfs: server donkey OK
Aug 31 13:56:46 <user.notice> laybox17 kernel: nfs: server donkey OK

are the journal devices not large enough? is there a formula for sizing? 
sorry this is long. can i umount the data device, remove journaling and 
mount as a regular device? what are those steps? thanks and sorry for 
the long-winded posting..


------------------------------

Kevin Kramer
Sr. Systems Administrator
512.418.5725
Centaur Technology, Inc.
www.centtech.com