From owner-freebsd-stable@FreeBSD.ORG  Sat Apr  7 05:08:46 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 26C7A106566B
	for <freebsd-stable@freebsd.org>; Sat,  7 Apr 2012 05:08:45 +0000 (UTC)
	(envelope-from matt.thyer@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id A76278FC0C
	for <freebsd-stable@freebsd.org>; Sat,  7 Apr 2012 05:08:44 +0000 (UTC)
Received: by wgbds12 with SMTP id ds12so2654462wgb.31
	for <freebsd-stable@freebsd.org>; Fri, 06 Apr 2012 22:08:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=w55uABsLGooFu130tLQA3YldRCiBUqBZghWNFuEpBQM=;
	b=dj3jKAvBDu7OeSM9bWtKSnaLdU5YPQ0XFcgHwOQoEpCqQ9jvevS3OR0ELBtc+c3KHq
	F/QT+saKAEGfYJZ2Kv9PrmlAiTDwsc1xqTsWxTgc67+9IasVdeYZZrBX/VGMW7+tz04p
	iiKmHG6vwlG5jZwrR6nvdFUfdOjFcZ+OohBhTe+zad+FmogFjB16zYounfWpo/+F7akg
	gOwdhFm13yuHK2qge1NZoe5k0kG+xoWH1HFH44FwpgQbjjWl33/6i8Lx4LB74ZmYpLnv
	wQse+kvI4CazvSwyTtBdHbJEHUFnfKlrhJi81lCt9PsipqAVMyhKrlAY63NENK8PyGoO
	OUZg==
MIME-Version: 1.0
Received: by 10.216.132.229 with SMTP id o79mr210949wei.64.1333775323639; Fri,
	06 Apr 2012 22:08:43 -0700 (PDT)
Received: by 10.216.190.219 with HTTP; Fri, 6 Apr 2012 22:08:43 -0700 (PDT)
In-Reply-To: <CACM2+-5ujs-kz9SNV7jwEU4Df5un5H=+qY+3Z0WSxj=7v8f8uQ@mail.gmail.com>
References: <CACM2+-46zHafjZo0O1dNNvEJm+2sUcYboBWwhJ8NxVhXyvpBZQ@mail.gmail.com>
	<4F6A67C0.7000909@sentex.net>
	<CACM2+-6BubOF1uWtXcBQyZqeiuqhvXazL9564CwN9uhJVNe2_w@mail.gmail.com>
	<4F6B3B46.4060105@sentex.net>
	<CACM2+-6J+kaV3aCudDNztDFefo602kTJZ-S99HB+jhxe-tu5XA@mail.gmail.com>
	<CACM2+-7DrENh5ZBs-6vkSrH1szMqnYKi2acmzByAOgXdZvHZMg@mail.gmail.com>
	<CAOjFWZ5nhT5NPLnn_Bh7oDa0fC5KbfDE1aHA_e+X9iebxkLEaA@mail.gmail.com>
	<CACM2+-5ujs-kz9SNV7jwEU4Df5un5H=+qY+3Z0WSxj=7v8f8uQ@mail.gmail.com>
Date: Sat, 7 Apr 2012 14:38:43 +0930
Message-ID: <CACM2+-4_3p3-q74=PJMNbD65fiijf-9KYfzUTdEhsawwAQYRRw@mail.gmail.com>
From: Matt Thyer <matt.thyer@gmail.com>
To: Freddie Cash <fjwcash@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-stable@freebsd.org
Subject: Re: 157k interrupts per second causing 60% CPU load on idle system
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Apr 2012 05:08:46 -0000

On 7 April 2012 14:31, Matt Thyer <matt.thyer@gmail.com> wrote:

> On 5 April 2012 01:18, Freddie Cash <fjwcash@gmail.com> wrote:
>
>> On Wed, Apr 4, 2012 at 5:19 AM, Matt Thyer <matt.thyer@gmail.com> wrote:
>> > So it seems that both the old and new mps driver have a problem with the
>> > Western Digital WD20EARX SATA 3 drive on a SuperMicro AOC-USAS2-L8i (SAS
>> > 6G) controller (flashed with -IT firmware).
>>
>> I wouldn't say the driver has a problem with that specific drive.
>> More that it might have a problem with a mixed SATA2/SATA3 setup.
>>
>> Sorry, that's what I meant to say but it now seems that the 157K
> interrupts per second is probably not due to the SuperMicro AOC-USAS2-L8i.
>
> Since moving the SATA 3 disk to the onboard Intel SATA 2 controller I'm no
> longer having that disk evicted from the raidz2 pool with write errors and
> I thought that the high interrupt rate issue had also been solved but it's
> back again.
>
> This is on 8-STABLE at revision 230921 (before the new driver hit
> 8-STABLE).
>
> So now I need to go back to trying to determine what the cause is.
>
> I'll stop posting in this thread as I don't think it's anything to do with
> either the old or new version of this driver.
>

Oops... wrong thread I thought I was replying in -CURRENT.

So on to the root cause.

vmstat -i has shown that the issue was on irq 16.

Unfortunately there seems to be a lot of things on irq 16:

$  dmesg | grep "irq 16"
pcib1: <PCI-PCI bridge> irq 16 at device 1.0 on pci0
mps0: <LSI SAS2008> port 0xee00-0xeeff mem
0xfbdfc000-0xfbdfffff,0xfbd80000-0xfbdbffff irq 16 at device 0.0 on pci1
vgapci0: <VGA-compatible display> port 0xff00-0xff07 mem
0xfb400000-0xfb7fffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0
uhci0: <UHCI (generic) USB controller> port 0xfe00-0xfe1f irq 16 at device
26.0 on pci0
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
atapci0: <JMicron JMB368 UDMA133 controller> port
0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq
16 at device 0.0 on pci3
pcib1: <PCI-PCI bridge> irq 16 at device 1.0 on pci0
mps0: <LSI SAS2008> port 0xee00-0xeeff mem
0xfbdfc000-0xfbdfffff,0xfbd80000-0xfbdbffff irq 16 at device 0.0 on pci1
vgapci0: <VGA-compatible display> port 0xff00-0xff07 mem
0xfb400000-0xfb7fffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0
uhci0: <UHCI (generic) USB controller> port 0xfe00-0xfe1f irq 16 at device
26.0 on pci0
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
atapci0: <JMicron JMB368 UDMA133 controller> port
0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq
16 at device 0.0 on pci3

Any idea how to isolate which bit of hardware could be triggering the
interrupts ?

Unfortunately the only device I could remove would be the SuperMicro
AOC-USAS2-L8i (so yes I could eliminate that).

My biggest problem right now is not knowing how to trigger the issue.

At this stage I'm going to upgrade to 9-STABLE and see if it returns.