From owner-freebsd-stable@FreeBSD.ORG Sat Apr 7 05:08:46 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26C7A106566B for ; Sat, 7 Apr 2012 05:08:45 +0000 (UTC) (envelope-from matt.thyer@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id A76278FC0C for ; Sat, 7 Apr 2012 05:08:44 +0000 (UTC) Received: by wgbds12 with SMTP id ds12so2654462wgb.31 for ; Fri, 06 Apr 2012 22:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=w55uABsLGooFu130tLQA3YldRCiBUqBZghWNFuEpBQM=; b=dj3jKAvBDu7OeSM9bWtKSnaLdU5YPQ0XFcgHwOQoEpCqQ9jvevS3OR0ELBtc+c3KHq F/QT+saKAEGfYJZ2Kv9PrmlAiTDwsc1xqTsWxTgc67+9IasVdeYZZrBX/VGMW7+tz04p iiKmHG6vwlG5jZwrR6nvdFUfdOjFcZ+OohBhTe+zad+FmogFjB16zYounfWpo/+F7akg gOwdhFm13yuHK2qge1NZoe5k0kG+xoWH1HFH44FwpgQbjjWl33/6i8Lx4LB74ZmYpLnv wQse+kvI4CazvSwyTtBdHbJEHUFnfKlrhJi81lCt9PsipqAVMyhKrlAY63NENK8PyGoO OUZg== MIME-Version: 1.0 Received: by 10.216.132.229 with SMTP id o79mr210949wei.64.1333775323639; Fri, 06 Apr 2012 22:08:43 -0700 (PDT) Received: by 10.216.190.219 with HTTP; Fri, 6 Apr 2012 22:08:43 -0700 (PDT) In-Reply-To: References: <4F6A67C0.7000909@sentex.net> <4F6B3B46.4060105@sentex.net> Date: Sat, 7 Apr 2012 14:38:43 +0930 Message-ID: From: Matt Thyer To: Freddie Cash Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-stable@freebsd.org Subject: Re: 157k interrupts per second causing 60% CPU load on idle system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Apr 2012 05:08:46 -0000 On 7 April 2012 14:31, Matt Thyer wrote: > On 5 April 2012 01:18, Freddie Cash wrote: > >> On Wed, Apr 4, 2012 at 5:19 AM, Matt Thyer wrote: >> > So it seems that both the old and new mps driver have a problem with the >> > Western Digital WD20EARX SATA 3 drive on a SuperMicro AOC-USAS2-L8i (SAS >> > 6G) controller (flashed with -IT firmware). >> >> I wouldn't say the driver has a problem with that specific drive. >> More that it might have a problem with a mixed SATA2/SATA3 setup. >> >> Sorry, that's what I meant to say but it now seems that the 157K > interrupts per second is probably not due to the SuperMicro AOC-USAS2-L8i. > > Since moving the SATA 3 disk to the onboard Intel SATA 2 controller I'm no > longer having that disk evicted from the raidz2 pool with write errors and > I thought that the high interrupt rate issue had also been solved but it's > back again. > > This is on 8-STABLE at revision 230921 (before the new driver hit > 8-STABLE). > > So now I need to go back to trying to determine what the cause is. > > I'll stop posting in this thread as I don't think it's anything to do with > either the old or new version of this driver. > Oops... wrong thread I thought I was replying in -CURRENT. So on to the root cause. vmstat -i has shown that the issue was on irq 16. Unfortunately there seems to be a lot of things on irq 16: $ dmesg | grep "irq 16" pcib1: irq 16 at device 1.0 on pci0 mps0: port 0xee00-0xeeff mem 0xfbdfc000-0xfbdfffff,0xfbd80000-0xfbdbffff irq 16 at device 0.0 on pci1 vgapci0: port 0xff00-0xff07 mem 0xfb400000-0xfb7fffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0 uhci0: port 0xfe00-0xfe1f irq 16 at device 26.0 on pci0 pcib2: irq 16 at device 28.0 on pci0 pcib3: irq 16 at device 28.4 on pci0 atapci0: port 0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq 16 at device 0.0 on pci3 pcib1: irq 16 at device 1.0 on pci0 mps0: port 0xee00-0xeeff mem 0xfbdfc000-0xfbdfffff,0xfbd80000-0xfbdbffff irq 16 at device 0.0 on pci1 vgapci0: port 0xff00-0xff07 mem 0xfb400000-0xfb7fffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0 uhci0: port 0xfe00-0xfe1f irq 16 at device 26.0 on pci0 pcib2: irq 16 at device 28.0 on pci0 pcib3: irq 16 at device 28.4 on pci0 atapci0: port 0xdf00-0xdf07,0xde00-0xde03,0xdd00-0xdd07,0xdc00-0xdc03,0xdb00-0xdb0f irq 16 at device 0.0 on pci3 Any idea how to isolate which bit of hardware could be triggering the interrupts ? Unfortunately the only device I could remove would be the SuperMicro AOC-USAS2-L8i (so yes I could eliminate that). My biggest problem right now is not knowing how to trigger the issue. At this stage I'm going to upgrade to 9-STABLE and see if it returns.