From owner-freebsd-hackers@FreeBSD.ORG  Mon Feb 13 15:04:31 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3AEE51065676
	for <freebsd-hackers@freebsd.org>; Mon, 13 Feb 2012 15:04:31 +0000 (UTC)
	(envelope-from ssanders@softhammer.net)
Received: from smtp-hq2.opnet.com (smtp-hq2.opnet.com [192.104.65.247])
	by mx1.freebsd.org (Postfix) with ESMTP id 198568FC0C
	for <freebsd-hackers@freebsd.org>; Mon, 13 Feb 2012 15:04:31 +0000 (UTC)
Received: from [172.16.9.10] (wtn09010.opnet.com [172.16.9.10])
	by smtp.opnet.com (Postfix) with ESMTPSA id CBFCD211023A
	for <freebsd-hackers@freebsd.org>; Mon, 13 Feb 2012 09:48:09 -0500 (EST)
Message-ID: <4F3922A8.2090808@softhammer.net>
Date: Mon, 13 Feb 2012 09:48:08 -0500
From: Stephen Sanders <ssanders@softhammer.net>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:10.0) Gecko/20120129 Thunderbird/10.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Odd RAID Performance Issue
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Feb 2012 15:04:31 -0000

We've an application that logs data on one very large raid6 array
and updates/accesses a database on another smaller raid5 array.

Both arrays are connected to the same PCIe 3ware RAID controller.   The
system has 2 six core 3Ghz processors and 24 GB of RAM.  The system is
running FreeBSD 8.1.

The averaged read/write rate to the database is 2MB/s while the averaged
write raid to the data  logging array is 300MB/s.  Writes to the logging
array are somewhat bursty.

The problem we're encountering is that the disk subsystem appears to
'pause' periodically.   It looks as if this is a result of disk read/write
operations from the database array taking a very long time to complete
(up to 8 sec).

When the disk read operation takes such a long time, it appears that the
system starts to run out of memory due to bio block buffering.  Most
processes end up in either getblk() or waithighrunning().

We've instrumented g_vfs_strategie() and bufdone_finish() using dtrace.  
The indication from this effort is that a number of reads and writes are
taking 4-8 seconds.

So far, it looks as if the disk driver and hardware are OK as read/write
operations appear to be in the milli-second region.  We believe that our
instrumentation is pointing to something between the VFS layer and the
CAM as the culprit.

We've gotten the same result from FreeBSD 8.2 but have not tried FreeBSD
9 as yet.

This scenario is not limited to a single system and is occurring on a
couple of systems.

Does this sound familiar to anyone out there?

Thanks