From owner-freebsd-stable@FreeBSD.ORG Thu Aug 31 19:01:36 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9A84816A4DA for ; Thu, 31 Aug 2006 19:01:36 +0000 (UTC) (envelope-from kramer@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 329F543D45 for ; Thu, 31 Aug 2006 19:01:35 +0000 (GMT) (envelope-from kramer@centtech.com) Received: from [10.177.171.221] (roddick.centtech.com [10.177.171.221]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k7VJ1YrA034169 for ; Thu, 31 Aug 2006 14:01:34 -0500 (CDT) (envelope-from kramer@centtech.com) Message-ID: <44F7320E.6040608@centtech.com> Date: Thu, 31 Aug 2006 14:01:34 -0500 From: Kevin Kramer User-Agent: Thunderbird 1.5.0.5 (X11/20060802) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1782/Thu Aug 31 11:54:15 2006 on mh1.centtech.com X-Virus-Status: Clean Subject: gjournal questions X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: kramer@centtech.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Aug 2006 19:01:36 -0000 Pavel, running 6.1-stable with these patches rebuilt kernel/world as of 8/28 @ 2p CST w/ these patches gjournal6_20060808.patch vfs_subr.c.3.patch the backend RAID presents 4 luns, this is how we config'd it. da1 - 8G da2 - ~897G da3 - 8G da4 - ~897G da2/4 have been partitioned in FreeBSD, then we did the following gjournal label -v /dev/da2 /dev/da1 gjournal label -v /dev/da4 /dev/da3 newfs -U -L "scr09" /dev/da2.journal newfs -U -L "scr10" /dev/da4.journal so 1 -8 G journal for each data device. now that the server is under load i'm seeing NFS not responding messages on my clients. the message corresponds to the gjournal suspend/copy operation, causing my clients to hang or give "no such file or directory". we copied 137G to /scr10 and it just finished, could this be some remains of writes from the journal? here is the time correlation Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal. Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 0.002798s Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 14.030198s Aug 31 13:55:24 donkey kernel: GEOM_JOURNAL[1]: Data has been copied. Aug 31 13:55:33 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 0.000013s Aug 31 13:55:44 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 0.000013s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 0.000010s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 0.000009s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 0.000007s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal. Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 0.002302s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied. Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr10: 0.029769s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr10: 0.035259s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr10: 10.109732s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal. Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Switch time of da4: 0.002756s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 10.182759s Aug 31 13:56:04 donkey kernel: GEOM_JOURNAL[1]: Data has been copied. Aug 31 13:56:14 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 0.000012s Aug 31 13:56:24 donkey kernel: GEOM_JOURNAL[1]: Entire switch time: 0.000011s Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Msync time of /scr09: 0.000010s Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Sync time of /scr09: 0.000009s Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Suspend time of /scr09: 0.000007s Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Starting copy of journal. Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Switch time of da2: 0.002364s Aug 31 13:56:46 donkey kernel: GEOM_JOURNAL[1]: Data has been copied. from syslog server Aug 31 13:55:23 bowltest4 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:23 bowltest4 kernel: nfs: server donkey OK Aug 31 13:55:23 laybox32 kernel: nfs: server donkey OK Aug 31 13:55:29 b-115-4 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:29 b-115-4 kernel: nfs: server donkey OK Aug 31 13:55:56 b-116-16 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:56 b-204-40 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:57 b-116-16 kernel: nfs: server donkey OK Aug 31 13:55:57 lic2 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:57 b-204-40 kernel: nfs: server donkey OK Aug 31 13:55:57 lic2 kernel: nfs: server donkey OK Aug 31 13:55:57 laybox29 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:57 laybox26 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:58 laybox19 kernel: nfs: server donkey not responding, still trying Aug 31 13:55:58 laybox37 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:00 laybox19 kernel: nfs: server donkey OK Aug 31 13:56:00 laybox26 kernel: nfs: server donkey OK Aug 31 13:56:00 laybox37 kernel: nfs: server donkey OK Aug 31 13:56:00 laybox29 kernel: nfs: server donkey OK Aug 31 13:56:05 ws-119-8 amd[2640]: file server donkey20.centtech.com, type nfs, state not responding Aug 31 13:56:05 ws-119-8 amd[2640]: file server donkey20.centtech.com, type nfs, state ok Aug 31 13:56:36 b-116-17 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:36 b-116-17 kernel: nfs: server donkey OK Aug 31 13:56:40 b-210-17 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:41 b-204-41 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:41 laybox17 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:44 b-204-38 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:44 b-204-38 kernel: nfs: server donkey OK Aug 31 13:56:44 bowltest3 kernel: nfs: server donkey not responding, still trying Aug 31 13:56:46 b-210-17 kernel: nfs: server donkey OK Aug 31 13:56:46 laybox17 kernel: nfs: server donkey OK are the journal devices not large enough? is there a formula for sizing? sorry this is long. can i umount the data device, remove journaling and mount as a regular device? what are those steps? thanks and sorry for the long-winded posting.. ------------------------------ Kevin Kramer Sr. Systems Administrator 512.418.5725 Centaur Technology, Inc. www.centtech.com