From owner-freebsd-hackers@FreeBSD.ORG  Tue Mar 27 20:55:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E8DEF1065673
	for <freebsd-hackers@freebsd.org>; Tue, 27 Mar 2012 20:55:28 +0000 (UTC)
	(envelope-from ssanders@softhammer.net)
Received: from smtp-hq2.opnet.com (smtp-hq2.opnet.com [192.104.65.247])
	by mx1.freebsd.org (Postfix) with ESMTP id A4ED88FC14
	for <freebsd-hackers@freebsd.org>; Tue, 27 Mar 2012 20:55:28 +0000 (UTC)
Received: from [172.16.9.10] (wtn09010.opnet.com [172.16.9.10])
	by smtp.opnet.com (Postfix) with ESMTPSA id 076BD211017D
	for <freebsd-hackers@freebsd.org>; Tue, 27 Mar 2012 16:48:56 -0400 (EDT)
Message-ID: <4F7227B8.5020505@softhammer.net>
Date: Tue, 27 Mar 2012 16:48:56 -0400
From: Stephen Sanders <ssanders@softhammer.net>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:11.0) Gecko/20120312 Thunderbird/11.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
References: <4F3922A8.2090808@softhammer.net> <jhbf2o$7f6$1@dough.gmane.org>
	<4F71C333.9010506@softhammer.net>
In-Reply-To: <4F71C333.9010506@softhammer.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: Odd RAID Performance Issue
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Mar 2012 20:55:29 -0000

Bit of a head space on the running space usage question.  One of the
test systems has 4 g_up/g_down threads running hence the better
runningbufspace usages.  biodone() gets called a lot more often so the
buffer usage is not backing up.

It also appears that devstat_start_transaction() /
devstat_end_transaction() is getting called from g_up/g_down.  It seems
that this should cause some of the counter updates to be subjected to
thread scheduling issues.  Like g_up() running a  lot more than g_down()
so that the start_count becomes less than end_count.

On 3/27/2012 9:40 AM, Steve Sanders wrote:
> Thanks for all of the suggestions.  We do tune the logging ufs partition
> to have 64K blocks. 
>
> We found a solution that makes this problem go away. 
>
> We've modified the cam such that if a controller has 2 or more disks
> attached, it divides the number of I/O slots on the card between the
> disks.  So a twa card as 252 slots available and the cam splits this
> between the two 'disks' attached to the controller, each disk getting
> 126 slots.
>
> The queue depths reported from iostat get ridiculously long (~1000) but
> we do not end up using memory from the runningspace buffers.  Since we
> don't go over the vfs.hirunningspace mark, the system does not pause.
>
> I'm now wondering what causes runningspace usage?  Could someone point
> me to the code where we end up allocating blocks from high running space?
>
> An interesting side effect of this has been to make a mess of the
> iostat's reports.  'iostats -x 1' now shows the database disk as
> 150-200% used and very often shows service time as being 15 seconds. 
> Given the fact that the data looks good and the system isn't pausing, a
> 15 second operation time seems unlikely.
>
> I believe this to be an effect of a large number of NCQ operations
> terminating in the 1 second elapsed time window.  So the operation
> duration is adding up to be much larger than the 1 second elapsed time
> window. 
>
> Not realistic but illustrative, imagine 5 1 second NCQ operations
> terminating in the 1 second window.  The current code will calculate the
> duration as 5 seconds, dividing by 1 will yield 500%.
>
> Thanks
>
>
> On 02/13/2012 11:51 AM, Ivan Voras wrote:
>> On 13/02/2012 15:48, Stephen Sanders wrote:
>>> We've an application that logs data on one very large raid6 array
>>> and updates/accesses a database on another smaller raid5 array.
>> You would be better off with RAID10 for a database (or anything which
>> does random IO).
>>
>>> Both arrays are connected to the same PCIe 3ware RAID controller.   The
>>> system has 2 six core 3Ghz processors and 24 GB of RAM.  The system is
>>> running FreeBSD 8.1.
>> Did you do any additional OS tuning? Do you use UFS or ZFS?
>>
>>> The problem we're encountering is that the disk subsystem appears to
>>> 'pause' periodically.   It looks as if this is a result of disk read/write
>>> operations from the database array taking a very long time to complete
>>> (up to 8 sec).
>> You should be able to monitor this with "iostat -x 1" (or whatever
>> number of seconds instead of "1") - the last three columns should tell
>> you if the device(s) are extraordinarily busy, and the r/s and w/s
>> columns should tell you what the real IOPS rate is. You should probably
>> post a sample output from this command when the problem appears.
>>
>>
>>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"