Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 02 Feb 2006 11:00:45 -0800
From:      Julian Elischer <julian@elischer.org>
To:        Scott Long <scottl@samsco.org>
Cc:        Michal Mertl <mime@traveller.cz>, freebsd-current@freebsd.org
Subject:   Re: em(4) stops forwarding
Message-ID:  <43E256DD.1030504@elischer.org>
In-Reply-To: <43E2184B.3040606@samsco.org>
References:  <1138813174.1358.34.camel@genius.i.cz>	 <43E0FE09.50804@samsco.org>	<1138875351.1807.12.camel@genius.i.cz>		<43E203F9.9060307@samsco.org> <1138890130.9192.3.camel@genius.i.cz> <43E2184B.3040606@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Scott Long wrote:

> Michal Mertl wrote:
>
>> Scott Long wrote:
>>
>>> Michal Mertl wrote:
>>>
>>>> Scott Long wrote:
>>>>
>>>>
>>>>> Michal Mertl wrote:
>>>>>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've been running CURRENT for long time and never experienced 
>>>>>> problem
>>>>>> with the built-in em(4) card before. Recently (I first noticed it 
>>>>>> on Jan
>>>>>> 24) the card has stopped working several times. Nothing gets into 
>>>>>> the
>>>>>> log file. Carrier is still detected properly but no data is 
>>>>>> exchanged.
>>>>>> Ifconfig up/down doesn't help but kldunload/load does. When I run
>>>>>> tcpdump I don't see any packet coming in but I see some outgoing.
>>>>>>
>>>>>> Can someone suggest what to look at when it happens the next time? I
>>>>>> have DDB compiled in. I will try to sniff the wire using another 
>>>>>> machine
>>>>>> next time to see if the card sends out anything.
>>>>>>
>>>>>> The command 'pciconf -lv' says about the card this:
>>>>>> em0@pci2:1:0:   class=0x020000 card=0x05491014 chip=0x101e8086 
>>>>>> rev=0x03
>>>>>> hdr=0x00
>>>>>>   vendor   = 'Intel Corporation'
>>>>>>   device   = '82540EP Gigabit Ethernet Controller (Mobile)'
>>>>>>   class    = network
>>>>>>   subclass = ethernet
>>>>>>
>>>>>> The dmesg:
>>>>>> em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port
>>>>>> 0x8000-0x803f mem 0xc0220000-0xc023ffff,0xc0200000-0xc020ffff irq 
>>>>>> 11 at
>>>>>> device 1.0 on pci2
>>>>>> em0: Ethernet address: 00:0d:60:cd:ae:e2
>>>>>> em0: [FAST]
>>>>>>
>>>>>> The interrupt is shared since the machine is a notebook. I don't 
>>>>>> know if
>>>>>> it was just a coincidence but I think that it happened at the 
>>>>>> same time
>>>>>> as my USB mouse stopped working - the USB controller is on the 
>>>>>> same irq.
>>>>>>
>>>>>> Michal
>>>>>>
>>>>>
>>>>> What is sharing the interrupt?
>>>>
>>>>
>>>>
>>>> vgapci0, ipw0, ehci0, uhci0-2. I don't think vgapci0 and ipw0 are 
>>>> really
>>>> using the interrupt when I use em0.
>>>>
>>>>
>>>
>>> Ouch.  For now, edit /sys/dev/em/if_em.c and add the following line 
>>> to the top of the file:
>>>
>>> #define NO_EM_FASTINTR
>>
>>
>>
>> Do you know the reason of the problem? Wouldn't it be better if I used
>> stock driver and got some information for you when it doesn't work? I
>> use the machine as my workstation so it isn't such a big problem when it
>> looses the network.
>>
>
> The problem is that the drivers that are sharing the interrupt,
> particularly the USB ones, can spend a very very long time waiting on
> locks to service the interrupt.  During that time, the interrupt pin is
> masked and the all interrupts from all shared devices don't get
> delivered. So even though the if_em driver has a very fast interrupt
> handler, it still has to wait on the USB drivers.  During that wait, a
> burst of network traffic might come into the card, filling its buffers
> and triggering an overflow.  This would be especially likely to happen
> while the kernel is flushing out filesystem i/o.  In theory the
> interrupt service latency shouldn't be any different whether the if_em
> driver is fast or not, but there might be coincidental timing issues
> that I don't understand.  That's why I'd like you to set the #ifdef in
> the driver to revert it back to it's classic behaviour and see if the
> problem persists.  If it doesn't, then I'll have to rethink some of the
> changes that I made to it.
>
> Scott
>
>>
>>> Also, does your kernel config include the apic device?
>>
>>
>>
>> Yes, it does. But I believe that the chipset doesn't have it and neither
>> the CPU supports it.
>>
>> Michal
>>

the big "workaround" that may save lives would be to have a "storm 
detection" that detects that
an interrupt is not being reset, say, by noticing that the same 
interrupt has fired 32 times without
ever giving the system a chance to even get out of the interrupt 
handling layer,
and then on detection call the interrupt routine sof EVERY DRIVER that 
is registerred
for interrupts.

I have done similar to this on one of our machines where the redirected 
interrupt is being sent
to the interrupt used by em4, when em0 gets delayed.

My solution for this embedded hardware is to add a hack so that when em4 
gets an interrupt and there
isn't one, it goes and services all the other em devices as well.
(remember this is for a particular hardware config so I can use 
non-general solutions)..

Another way to achieve this would be to have a special driver that you 
register on the 'target'
interrupt, that when run, calls the correct interrupt handlers :-)
you could have a kernel module that you compile up with the correct two 
interrupts in it
and on loading it would do the trick..



>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to 
> "freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43E256DD.1030504>