Date: Wed, 2 May 2007 13:14:14 -0400 From: John Baldwin <jhb@freebsd.org> To: Nate Lawson <nate@root.org> Cc: cvs-src@freebsd.org, Darren Reed <darrenr@hub.freebsd.org>, src-committers@freebsd.org, cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/kern kern_intr.c src/sys/sys interrupt.h Message-ID: <200705021314.15733.jhb@freebsd.org> In-Reply-To: <4638BE29.1020505@root.org> References: <200705020615.l426FDo7015874@repoman.freebsd.org> <4638BAC9.7000603@root.org> <4638BE29.1020505@root.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 02 May 2007 12:36:57 pm Nate Lawson wrote: > Nate Lawson wrote: > > John Baldwin wrote: > >> On Wednesday 02 May 2007 03:07:07 am Darren Reed wrote: > >>> On Wed, May 02, 2007 at 06:15:13AM +0000, Nate Lawson wrote: > >>>> njl 2007-05-02 06:15:13 UTC > >>>> > >>>> FreeBSD src repository > >>>> > >>>> Modified files: (Branch: RELENG_6) > >>>> sys/kern kern_intr.c > >>>> sys/sys interrupt.h > >>>> Log: > >>>> MFC: rate-check the interrupt storm message and bump the counter 500 -> > >> 1000 > >>> Is this number, "500" or "1000" somehow "magical" for modern hardware? > >>> > >>> If I had a 500MHZ, 1GHz, 1.5GHz, 2GHz, 2.5GHz machines, each with the > >>> appropriate architecture, what would the correct value for this be? > >>> Is i always 1000 or should it be calculated? > >> It's a SWAG and tunable for machines where it doesn't work. In practice the > >> old setting seemed to be a bit too trigger-happy as I know my printer always > >> triggered it, for example. > >> > > > > There's more to it than just your Ghz number. It's a counter of the > > number of times an interrupt has triggered while the previous one was > > being serviced. The faster your kernel, the lower the number could be. > > > > I have a slow early SMP Celeron system with a dc(4) adapter with 4 ports > > sharing an irq with my ata. At 3 am, the nightly script kicks off > > enough IO that it triggers a bug in my dc(4) card that causes it to mask > > the interrupt too long. Then, the irq storm suppression logic kicked > > in, causing ata to timeout the request. The drive is on a mirror so I'd > > lose half the mirror, then rebuild in the morning. With this value > > bumped, I don't have that problem any more but the real issue is why > > dc(4) is being so quirky under heavy shared irq load. > > > > This is on 6.x btw. Is there any reason why our retries is so low? > > sys/dev/ata/ata-disk.c: request->retries = 2; At work we up the timeout from 5 to 30, but we leave retries at 2. > Note that I still got a timeout but it succeeded without error. I think > this is a combination of the dc(4) and highpoint hpt366 driver > interaction. dc(4) is probably holding Giant or something too long and > ata is being too sensitive to the slow hw. Neither dc(4) nor ata(4) hold Giant, FWIW. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200705021314.15733.jhb>