From owner-freebsd-scsi@FreeBSD.ORG Tue Mar 28 15:45:35 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BE46316A401; Tue, 28 Mar 2006 15:45:35 +0000 (UTC) (envelope-from os@rsu.ru) Received: from mail.r61.net (mail.r61.net [195.208.245.235]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2AB4643D45; Tue, 28 Mar 2006 15:45:34 +0000 (GMT) (envelope-from os@rsu.ru) Received: from brain.cc.rsu.ru (brain.cc.rsu.ru [195.208.252.154]) (authenticated bits=0) by mail.r61.net (8.13.4/8.13.4) with ESMTP id k2SFjTQM002736 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 28 Mar 2006 19:45:29 +0400 (MSD) (envelope-from os@rsu.ru) Date: Tue, 28 Mar 2006 19:45:29 +0400 (MSD) From: Oleg Sharoiko To: John Baldwin In-Reply-To: <200603271607.09550.jhb@freebsd.org> Message-ID: <20060328185449.F763@brain.cc.rsu.ru> References: <20060215102749.D58480@brain.cc.rsu.ru> <200603241718.49362.jhb@freebsd.org> <20060327234908.K831@wolf.os.rsu.ru> <200603271607.09550.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: ClamAV version 0.86.2, clamav-milter version 0.86 on asterix.r61.net X-Virus-Status: Clean Cc: freebsd-scsi@freebsd.org, Andrey Beresovsky Subject: Re: Boot hangs on ips0: resetting adapter, this may take up to 5 minutes X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Mar 2006 15:45:35 -0000 On Mon, 27 Mar 2006, John Baldwin wrote: JB>Which device is not getting interrupts and hanging? You said all your JB>SCSI cards work fine with bge0 (IRQ16) is not in the kernel, yes? What JB>if you disable just the devices on IRQ16 (bge and usb) do all of your JB>various SCSI cards work fine in that case? Yes, as soon as I remove bge from kernel all other devices work fine. I couldn't figure out which particular device is mis-routed, because when interrupt storm happens the system becomes unresponsive and it's impossible to tell which devices work and which not. I tried to track this with KTR, but I only made a trace for good case. When I enabled KTR for interrupt storm case the behaviour changed and I only get repeated clock interrupts - no storm on bge. And interrupt storm only happens on single CPU when PREEMPTION is in effect. Without PREEMPTION kernel with bge works. With SMP kernel I can kldload if_bge when 2nd CPU has been already initialized, though I haven't run many tests and this last setup is probably unstable. By the way, shouldn't ithread_execute_handlers detect interrupt storm condition? As I can see it has corresponding code, but in my case storm is not detected. BIOS has a page "PCI interrupt routing" which currently contains: Planar USB IRQ [Auto Configure] Current Interrupt Assigned 10 SCSI INTA IRQ [Auto Configure] Current Interrupt Assigned 11 SCSI INTB IRQ [Auto Configure] Current Interrupt Assigned 10 Planar Video IRQ [Auto Configure] Current Interrupt Assigned 11 Planar Ethernet IRQ [Auto Configure] Current Interrupt Assigned 11 Slot1 INTA IRQ [No IRQ reqested] Current Interrupt Assigned No IRQ reqested Slot2 INTA IRQ [No IRQ reqested] Current Interrupt Assigned No IRQ reqested Slot3 INTA IRQ [No IRQ reqested] Current Interrupt Assigned No IRQ reqested Slot4 INTA IRQ [Auto Configure] Current Interrupt Assigned 11 Slot5 INTA IRQ [No IRQ reqested] Current Interrupt Assigned No IRQ reqested Slot6 INTA IRQ [No IRQ reqested] Current Interrupt Assigned No IRQ reqested In this menu SLOT4 INTA is linked with Ethernet and they change together whenever I alter one of them. I suppose, this values correspond to 'irq=' from dmesg output: found-> vendor=0x14e4, dev=0x1659, revid=0x11 bus=1, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0146, statreg=0x0010, cachelnsz=8 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 2 supports D0 D3 current D0 MSI supports 8 messages, 64 bit map[10]: type 1, range 64, base d0100000, size 16, enabled pcib1: (null) requested memory range 0xd0100000-0xd010ffff: good Dut I don't understand the next lines pcib0: matched entry for 0.2.INTA pcib0: slot 2 INTA hardwired to IRQ 16 pcib1: slot 0 INTA is routed to irq 16 What does 'hardwired' mean and why irq number is different. I'm probably asking dumb questions, that's because I only know for sure two things about interrupts: there 4 interrupt lines on PCI bus (A-D) which devices use to trigger interrupts and there are interrupt handlers. How signals from these lines are delivered to proper handlers, what is the role of APICs and how these things work together - all this is covered by darkness for me. -- Oleg Sharoiko. Software and Network Engineer Computer Center of Rostov State University.