From owner-freebsd-hackers@FreeBSD.ORG  Tue Mar 27 13:40:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 81476106566C
	for <freebsd-hackers@freebsd.org>; Tue, 27 Mar 2012 13:40:11 +0000 (UTC)
	(envelope-from ssanders@softhammer.net)
Received: from oproxy8-pub.bluehost.com (oproxy8.bluehost.com
	[IPv6:2605:dc00:100:2::a8])
	by mx1.freebsd.org (Postfix) with SMTP id 9D0FC8FC1A
	for <freebsd-hackers@freebsd.org>; Tue, 27 Mar 2012 13:40:04 +0000 (UTC)
Received: (qmail 8101 invoked by uid 0); 27 Mar 2012 13:40:04 -0000
Received: from unknown (HELO host358.hostmonster.com) (66.147.240.158)
	by oproxy8.bluehost.com with SMTP; 27 Mar 2012 13:40:04 -0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=softhammer.net; s=default; 
	h=Content-Transfer-Encoding:Content-Type:In-Reply-To:References:Subject:To:MIME-Version:From:Date:Message-ID;
	bh=OHOBkpmkJNXp/TdZlGA4KLMItf0Utzl9xxN2yrUnRS8=; 
	b=TwqtAyWirnssa++kM+nrUvr7MgRTAi77QXb26uE1Nxo4wfGmewz1F+MKvIOyJSU3OACynrJXXNOui1tJE6aSkt6TILhtCSOfoiBqlCaZMgvl7JGCii8f1nCfwJWWY7Ra;
Received: from pool-173-73-60-93.washdc.fios.verizon.net ([173.73.60.93]
	helo=[192.168.1.3])
	by host358.hostmonster.com with esmtpsa (TLSv1:AES256-SHA:256)
	(Exim 4.76) (envelope-from <ssanders@softhammer.net>)
	id 1SCWd2-00070Y-4A
	for freebsd-hackers@freebsd.org; Tue, 27 Mar 2012 07:40:04 -0600
Message-ID: <4F71C333.9010506@softhammer.net>
Date: Tue, 27 Mar 2012 09:40:03 -0400
From: Steve Sanders <ssanders@softhammer.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:11.0) Gecko/20120310 Thunderbird/11.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
References: <4F3922A8.2090808@softhammer.net> <jhbf2o$7f6$1@dough.gmane.org>
In-Reply-To: <jhbf2o$7f6$1@dough.gmane.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Identified-User: {2492:host358.hostmonster.com:softhamm:softhammer.net}
	{sentby:smtp auth 173.73.60.93 authed with
	ssanders@softhammer.net}
Subject: Re: Odd RAID Performance Issue
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Mar 2012 13:40:11 -0000

Thanks for all of the suggestions.  We do tune the logging ufs partition
to have 64K blocks. 

We found a solution that makes this problem go away. 

We've modified the cam such that if a controller has 2 or more disks
attached, it divides the number of I/O slots on the card between the
disks.  So a twa card as 252 slots available and the cam splits this
between the two 'disks' attached to the controller, each disk getting
126 slots.

The queue depths reported from iostat get ridiculously long (~1000) but
we do not end up using memory from the runningspace buffers.  Since we
don't go over the vfs.hirunningspace mark, the system does not pause.

I'm now wondering what causes runningspace usage?  Could someone point
me to the code where we end up allocating blocks from high running space?

An interesting side effect of this has been to make a mess of the
iostat's reports.  'iostats -x 1' now shows the database disk as
150-200% used and very often shows service time as being 15 seconds. 
Given the fact that the data looks good and the system isn't pausing, a
15 second operation time seems unlikely.

I believe this to be an effect of a large number of NCQ operations
terminating in the 1 second elapsed time window.  So the operation
duration is adding up to be much larger than the 1 second elapsed time
window. 

Not realistic but illustrative, imagine 5 1 second NCQ operations
terminating in the 1 second window.  The current code will calculate the
duration as 5 seconds, dividing by 1 will yield 500%.

Thanks


On 02/13/2012 11:51 AM, Ivan Voras wrote:
> On 13/02/2012 15:48, Stephen Sanders wrote:
>> We've an application that logs data on one very large raid6 array
>> and updates/accesses a database on another smaller raid5 array.
> You would be better off with RAID10 for a database (or anything which
> does random IO).
>
>> Both arrays are connected to the same PCIe 3ware RAID controller.   The
>> system has 2 six core 3Ghz processors and 24 GB of RAM.  The system is
>> running FreeBSD 8.1.
> Did you do any additional OS tuning? Do you use UFS or ZFS?
>
>> The problem we're encountering is that the disk subsystem appears to
>> 'pause' periodically.   It looks as if this is a result of disk read/write
>> operations from the database array taking a very long time to complete
>> (up to 8 sec).
> You should be able to monitor this with "iostat -x 1" (or whatever
> number of seconds instead of "1") - the last three columns should tell
> you if the device(s) are extraordinarily busy, and the r/s and w/s
> columns should tell you what the real IOPS rate is. You should probably
> post a sample output from this command when the problem appears.
>
>
>