From owner-freebsd-stable@FreeBSD.ORG Sat Oct 14 04:43:08 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D29E416A523 for ; Sat, 14 Oct 2006 04:43:08 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4C3A643D45 for ; Sat, 14 Oct 2006 04:43:07 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k9E4VrA7083600; Fri, 13 Oct 2006 22:31:58 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <45306837.4010100@samsco.org> Date: Fri, 13 Oct 2006 22:31:51 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5 MIME-Version: 1.0 To: Mike Tancsa References: <45244053.6030706@samsco.org> <20061005200552.GA80162@xor.obsecurity.org> <20061006023424.GA86250@xor.obsecurity.org> <7.0.1.0.0.20061014002001.124d6120@sentex.net> In-Reply-To: <7.0.1.0.0.20061014002001.124d6120@sentex.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-stable@freebsd.org, Kris Kennaway Subject: Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Oct 2006 04:43:08 -0000 Mike Tancsa wrote: > At 10:34 PM 10/5/2006, Kris Kennaway wrote: > >> Based on successful testing on a machine with shared em interrupt, the >> following patch should work around the problem *in that case*. >> >> Note that this patch will not help you if you are not using the em >> driver, or if you are seeing the problem with non-shared em interrupt >> (I have investigated on such outlier, which seems to be a problem with >> a particular model of em hardware and not a generic problem with the >> driver). >> >> Please let Scott and I know whether or not this patch works for you >> (in addition to the information previously requested, if you have not >> already sent it). Unfortunately it is only a workaround, but it >> points to an underlying problem with fast interrupt handlers on a >> shared irq that can be studied separately. > > I ran into a em0 timeout on a box I just started testing. The patch > seems to fix the issue. > (before the patch) > Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting > Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN > Oct 13 21:42:58 am64 kernel: em0: link state changed to UP > > dmesg with patch > > Copyright (c) 1992-2006 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006 > mdtancsa@am64.sentex.ca:/usr/obj/usr/src/sys/up > ACPI APIC Table: > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 > > Features=0xbfebfbff > > Features2=0x649d> > AMD Features=0x20000800 > Logical CPUs per core: 2 > real memory = 3481198592 (3319 MB) > avail memory = 3360186368 (3204 MB) > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-47 on motherboard > ioapic2 irqs 48-71 on motherboard > kbd1 at kbdmux0 > acpi0: on motherboard > acpi_bus_number: can't get _ADR > acpi_bus_number: can't get _ADR > acpi0: Power Button (fixed) > acpi0: reservation of 500, 10 (4) failed > acpi0: reservation of 560, 20 (4) failed > Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 > cpu0: on acpi0 > acpi_throttle0: on cpu0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pci0: at device 2.0 (no driver attached) > pcib1: irq 16 at device 28.0 on pci0 > pci2: on pcib1 > pcib2: at device 0.0 on pci2 > pci4: on pcib2 > pcib3: at device 0.2 on pci2 > pci3: on pcib3 > 3ware device driver for 9000 series storage controllers, version: > 3.60.02.012 > twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem > 0xfebff000-0xfebfffff irq 53 at device 2.0 on pci3 > twa0: [GIANT-LOCKED] > twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 > ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024 > uhci0: port > 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 > uhci0: [GIANT-LOCKED] > usb0: on uhci0 > usb0: USB revision 1.0 > uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub0: 2 ports with 2 removable, self powered > uhci1: port > 0xcc80-0xcc9f irq 19 at device 29.1 on pci0 > uhci1: [GIANT-LOCKED] > usb1: on uhci1 > usb1: USB revision 1.0 > uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub1: 2 ports with 2 removable, self powered > uhci2: port > 0xcd00-0xcd1f irq 18 at device 29.2 on pci0 > uhci2: [GIANT-LOCKED] > usb2: on uhci2 > usb2: USB revision 1.0 > uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 > uhub2: 2 ports with 2 removable, self powered > ehci0: mem > 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0 > ehci0: [GIANT-LOCKED] > usb3: EHCI version 1.0 > usb3: companion controllers, 2 ports each: usb0 usb1 usb2 > usb3: on ehci0 > usb3: USB revision 2.0 > uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 > uhub3: 6 ports with 6 removable, self powered > pcib4: at device 30.0 on pci0 > pci1: on pcib4 > em0: port > 0xdf80-0xdfbf mem 0xfeae0000-0xfeafffff irq 18 at device 3.0 on pci1 > em0: Ethernet address: 00:0e:0c:4b:15:eb > isab0: at device 31.0 on pci0 > isa0: on isab0 > atapci0: port > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0 > ata0: on atapci0 > ata1: on atapci0 > atapci1: port > 0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f > mem 0xfe9ffc00-0xfe9fffff irq 19 at device 31.2 on pci0 > ata2: on atapci1 > ata3: on atapci1 > pci0: at device 31.3 (no driver attached) > acpi_button0: on acpi0 > atkbdc0: port 0x60,0x64 irq 1 on acpi0 > atkbd0: flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on > acpi0 > sio0: type 16550A > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 > on acpi0 > fdc0: [FAST] > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > orm0: at iomem > 0xc9800-0xcafff,0xcb000-0xcbfff,0xcc000-0xccfff,0xdc000-0xdffff on isa0 > ppc0: cannot reserve I/O port range > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounter "TSC" frequency 2992709460 Hz quality 800 > Timecounters tick every 1.000 msec > ad0: 38166MB at ata0-master UDMA100 > acd0: DVDR at ata0-slave UDMA33 > da0 at twa0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-3 device > da0: 100.000MB/s transfers > da0: 152566MB (312455168 512 byte sectors: 255H 63S/T 19449C) > Trying to mount root from ufs:/dev/ad0s1a > [am64]# vmstat -i > interrupt total rate > irq1: atkbd0 4 0 > irq6: fdc0 9 0 > irq14: ata0 6274 1 > irq18: em0 uhci2 127128 25 > irq53: twa0 188226 37 > cpu0: timer 9911543 1999 > Total 10233184 2064 > [am64]# > > em0@pci1:3:0: class=0x020000 card=0x34448086 chip=0x10768086 rev=0x05 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82547EI Gigabit Ethernet Controller' > class = network > subclass = ethernet > > The Intel board has the latest BIOS update as well, HTT disabled in the > BIOS. If helpful, I can hook this box up to the netperf cluster which > has remote power and serial console access (including to the BIOS) > > ---Mike Mike, I have a new patch that I hope addresses the actual bug, instead of shuffling the timing. Would you be willing to test it? I can't guarantee that it's safe for production use yet, though. It seems to work, but it might set your dog on fire too. Scott