Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Oct 2006 22:31:51 -0600
From:      Scott Long <scottl@samsco.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Message-ID:  <45306837.4010100@samsco.org>
In-Reply-To: <7.0.1.0.0.20061014002001.124d6120@sentex.net>
References:  <45244053.6030706@samsco.org> <20061005200552.GA80162@xor.obsecurity.org> <20061006023424.GA86250@xor.obsecurity.org> <7.0.1.0.0.20061014002001.124d6120@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Tancsa wrote:
> At 10:34 PM 10/5/2006, Kris Kennaway wrote:
> 
>> Based on successful testing on a machine with shared em interrupt, the
>> following patch should work around the problem *in that case*.
>>
>> Note that this patch will not help you if you are not using the em
>> driver, or if you are seeing the problem with non-shared em interrupt
>> (I have investigated on such outlier, which seems to be a problem with
>> a particular model of em hardware and not a generic problem with the
>> driver).
>>
>> Please let Scott and I know whether or not this patch works for you
>> (in addition to the information previously requested, if you have not
>> already sent it).  Unfortunately it is only a workaround, but it
>> points to an underlying problem with fast interrupt handlers on a
>> shared irq that can be studied separately.
> 
> I ran into a em0 timeout on a box I just started testing. The patch 
> seems to fix the issue.
> (before the patch)
> Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting
> Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN
> Oct 13 21:42:58 am64 kernel: em0: link state changed to UP
> 
> dmesg with patch
> 
> Copyright (c) 1992-2006 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006
>     mdtancsa@am64.sentex.ca:/usr/obj/usr/src/sys/up
> ACPI APIC Table: <A M I  OEMAPIC >
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU)
>   Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
>   
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> 
> 
>   Features2=0x649d<SSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,<b14>>
>   AMD Features=0x20000800<SYSCALL,LM>
>   Logical CPUs per core: 2
> real memory  = 3481198592 (3319 MB)
> avail memory = 3360186368 (3204 MB)
> ioapic0 <Version 2.0> irqs 0-23 on motherboard
> ioapic1 <Version 2.0> irqs 24-47 on motherboard
> ioapic2 <Version 2.0> irqs 48-71 on motherboard
> kbd1 at kbdmux0
> acpi0: <A M I 7221BK1E> on motherboard
> acpi_bus_number: can't get _ADR
> acpi_bus_number: can't get _ADR
> acpi0: Power Button (fixed)
> acpi0: reservation of 500, 10 (4) failed
> acpi0: reservation of 560, 20 (4) failed
> Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> cpu0: <ACPI CPU> on acpi0
> acpi_throttle0: <ACPI CPU Throttling> on cpu0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pci0: <display, VGA> at device 2.0 (no driver attached)
> pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
> pci2: <ACPI PCI bus> on pcib1
> pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci2
> pci4: <ACPI PCI bus> on pcib2
> pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci2
> pci3: <ACPI PCI bus> on pcib3
> 3ware device driver for 9000 series storage controllers, version: 
> 3.60.02.012
> twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem 
> 0xfebff000-0xfebfffff irq 53 at device 2.0 on pci3
> twa0: [GIANT-LOCKED]
> twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
> ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024
> uhci0: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A> port 
> 0xcc00-0xcc1f irq 23 at device 29.0 on pci0
> uhci0: [GIANT-LOCKED]
> usb0: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A> on uhci0
> usb0: USB revision 1.0
> uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> uhci1: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B> port 
> 0xcc80-0xcc9f irq 19 at device 29.1 on pci0
> uhci1: [GIANT-LOCKED]
> usb1: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B> on uhci1
> usb1: USB revision 1.0
> uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub1: 2 ports with 2 removable, self powered
> uhci2: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C> port 
> 0xcd00-0xcd1f irq 18 at device 29.2 on pci0
> uhci2: [GIANT-LOCKED]
> usb2: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C> on uhci2
> usb2: USB revision 1.0
> uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub2: 2 ports with 2 removable, self powered
> ehci0: <Intel 82801FB (ICH6) USB 2.0 controller> mem 
> 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0
> ehci0: [GIANT-LOCKED]
> usb3: EHCI version 1.0
> usb3: companion controllers, 2 ports each: usb0 usb1 usb2
> usb3: <Intel 82801FB (ICH6) USB 2.0 controller> on ehci0
> usb3: USB revision 2.0
> uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
> uhub3: 6 ports with 6 removable, self powered
> pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0
> pci1: <ACPI PCI bus> on pcib4
> em0: <Intel(R) PRO/1000 Network Connection Version - 6.1.4> port 
> 0xdf80-0xdfbf mem 0xfeae0000-0xfeafffff irq 18 at device 3.0 on pci1
> em0: Ethernet address: 00:0e:0c:4b:15:eb
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> atapci0: <Intel ICH6 UDMA100 controller> port 
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0
> ata0: <ATA channel 0> on atapci0
> ata1: <ATA channel 1> on atapci0
> atapci1: <Intel ICH6 SATA150 controller> port 
> 0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f 
> mem 0xfe9ffc00-0xfe9fffff irq 19 at device 31.2 on pci0
> ata2: <ATA channel 0> on atapci1
> ata3: <ATA channel 1> on atapci1
> pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
> acpi_button0: <Power Button> on acpi0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
> acpi0
> sio0: type 16550A
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
> sio1: type 16550A
> fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 
> on acpi0
> fdc0: [FAST]
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> orm0: <ISA Option ROMs> at iomem 
> 0xc9800-0xcafff,0xcb000-0xcbfff,0xcc000-0xccfff,0xdc000-0xdffff on isa0
> ppc0: cannot reserve I/O port range
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounter "TSC" frequency 2992709460 Hz quality 800
> Timecounters tick every 1.000 msec
> ad0: 38166MB <Seagate ST340014A 3.06> at ata0-master UDMA100
> acd0: DVDR <AOPEN 8X8 DVD Dual AAN/1.4A> at ata0-slave UDMA33
> da0 at twa0 bus 0 target 0 lun 0
> da0: <AMCC 9550SX-4LP DISK 3.01> Fixed Direct Access SCSI-3 device
> da0: 100.000MB/s transfers
> da0: 152566MB (312455168 512 byte sectors: 255H 63S/T 19449C)
> Trying to mount root from ufs:/dev/ad0s1a
> [am64]# vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           4          0
> irq6: fdc0                             9          0
> irq14: ata0                         6274          1
> irq18: em0 uhci2                  127128         25
> irq53: twa0                       188226         37
> cpu0: timer                      9911543       1999
> Total                           10233184       2064
> [am64]#
> 
> em0@pci1:3:0:   class=0x020000 card=0x34448086 chip=0x10768086 rev=0x05 
> hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82547EI Gigabit Ethernet Controller'
>     class    = network
>     subclass = ethernet
> 
> The Intel board has the latest BIOS update as well, HTT disabled in the 
> BIOS.  If helpful, I can hook this box up to the netperf cluster which 
> has remote power and serial console access (including to the BIOS)
> 
>         ---Mike

Mike,

I have a new patch that I hope addresses the actual bug, instead of 
shuffling the timing.  Would you be willing to test it?  I can't 
guarantee that it's safe for production use yet, though.  It seems
to work, but it might set your dog on fire too.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45306837.4010100>