From owner-freebsd-stable@FreeBSD.ORG Mon Jan 2 18:37:02 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9D73516A41F for ; Mon, 2 Jan 2006 18:37:02 +0000 (GMT) (envelope-from oberman@es.net) Received: from postal1.es.net (postal1.es.net [198.128.3.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id D0B3443D5D for ; Mon, 2 Jan 2006 18:37:00 +0000 (GMT) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal1.es.net (Postal Node 1) with ESMTP (SSL) id IBA74465; Mon, 02 Jan 2006 10:36:54 -0800 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 40E225D07; Mon, 2 Jan 2006 10:36:55 -0800 (PST) X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.0.4 To: Kris Kennaway In-reply-to: Your message of "Wed, 14 Dec 2005 19:52:03 EST." <20051215005203.GA89670@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="==_Exmh_1136226929_197500" Date: Mon, 02 Jan 2006 10:36:55 -0800 From: "Kevin Oberman" Message-Id: <20060102183655.40E225D07@ptavv.es.net> X-Mailman-Approved-At: Tue, 03 Jan 2006 18:13:32 +0000 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: stable@freebsd.org Subject: Re: Odd performance problems after upgrade from 4.11 to 6.0-Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jan 2006 18:37:02 -0000 This is a multipart MIME message. --==_Exmh_1136226929_197500 Content-Type: text/plain; charset=us-ascii > Date: Wed, 14 Dec 2005 19:52:03 -0500 > From: Kris Kennaway > > > --45Z9DzgjV8m4Oswq > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On Wed, Dec 14, 2005 at 04:45:47PM -0800, Kevin Oberman wrote: > > > Date: Wed, 14 Dec 2005 19:34:04 -0500 > > > From: Kris Kennaway > > >=20 > > > On Wed, Dec 14, 2005 at 04:26:18PM -0800, Kevin Oberman wrote: > > >=20 > > > > I am attaching a dmesg. I do have a few of drivers (uhci, pcm, psm, > > > > atkbd0 and ichsmb) that are still marked as GIANT-LOCKED, but I'm not > > > > using the USB very often. And I'm not using pcm or ichsmb during the > > > > dump, either. I think everyone has the mouse and keyboard under GIANT, > > > > but I can't really see those as a problem, either. > > >=20 > > > A bunch of things are sharing interrupts with USB..disable it and see > > > if that helps. Also check vmstat -i to see if some device is > > > storming. If not, turn on MUTEX_PROFILING(9) in your kernel and run > > > the dump (or something faster that also exhibits the problem), then > > > look for what is contending with Giant. > >=20 > > Yes, it may be time for MUTEX_PROFILING. I had already looked at > > interrupts. My kernel is sans APIC so I didn't really think that > > interrupts were a problems and I see: > > interrupt total rate > > irq0: clk 207037779 1000 > > irq1: atkbd0 50208 0 > > irq6: fdc0 9 0 > > irq8: rtc 26498038 128 > > irq10: pcm0 ichsmb0 2 0 > > irq11: xl0 uhci0 18076067 87 > > irq12: psm0 869500 4 > > irq13: npx0 1 0 > > irq14: ata0 10423468 50 > > irq15: ata1 112 0 > > Total 262955184 1270 > > > > Clearly no storms and nothing looks obviously broken. USB and the > > network card share an IRQ, but the USB is not connected to anything and > > I would not think that it is generating many interrupts. The network > > IS being used and I'm not seeing all that many interrupts on IRQ11. > > Whenever there is an interrupt on irq11 from the NIC, *both* drivers > will wake up to process it. uhci0 will need to acquire Giant. If > something else is also trying to acquire Giant (bufdaemon), then they > will serialize, degrading performance. This may not be the cause > since there are only a few interrupts, but MUTEX_PROFILING will tell > you. Well, with the holidays and such, this has taken a while, but here is an update. I have removed USB support. I hardly ever use it on this system, so that was an obvious step. No improvement at all. # vmstat -i interrupt total rate irq0: clk 319818027 1000 irq1: atkbd0 15443 0 irq6: fdc0 11 0 irq8: rtc 40932392 128 irq10: pcm0 ichsmb0 125545 0 irq11: xl0 3616426 11 irq12: psm0 281380 0 irq13: npx0 1 0 irq14: ata0 8756176 27 irq15: ata1 144 0 Total 373545545 1168 Only one shared interrupt and both IRQ 10 devices should have been totally quiescent during my test run. The test was building a glimpse index of my inbox. CPU at about 20%. System interactive response was terrible. Took about two minutes just to log in. Starting Gnome takes roughly forever (about 10 minutes). I collected mutex stats for just about 3 minutes and found nothing surprising, but I may not know what to look for. Nothing shows a total time of over 3.1 seconds. The total time for all of them is 28 seconds. The sum of all Giant lock times was only 4.65 seconds and the largest of these was in kern_sysctl.c, so I expect it was the profiling that ate 3.1 of those 4.65 seconds. I am attaching a spreadsheet with the profile data in case anyone wants to look at it. (Probably the mail system will strip it, so let me know if I should post it.) Still totally baffled and still feeling the pain. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 --==_Exmh_1136226929_197500--