From owner-freebsd-current@FreeBSD.ORG  Fri Feb 24 16:52:19 2006
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4B4AA16A420
	for <freebsd-current@freebsd.org>; Fri, 24 Feb 2006 16:52:19 +0000 (GMT)
	(envelope-from mime@traveller.cz)
Received: from ss.eunet.cz (ss.eunet.cz [193.85.228.13])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C93B743D5F
	for <freebsd-current@freebsd.org>; Fri, 24 Feb 2006 16:52:11 +0000 (GMT)
	(envelope-from mime@traveller.cz)
Received: from localhost.i.cz (ss.eunet.cz [193.85.228.13])
	by ss.eunet.cz (8.13.3/8.13.1) with ESMTP id k1OGq6ec098220
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO);
	Fri, 24 Feb 2006 17:52:09 +0100 (CET)
	(envelope-from mime@traveller.cz)
From: Michal Mertl <mime@traveller.cz>
To: Scott Long <scottl@samsco.org>
In-Reply-To: <43E2184B.3040606@samsco.org>
References: <1138813174.1358.34.camel@genius.i.cz>
	<43E0FE09.50804@samsco.org> <1138875351.1807.12.camel@genius.i.cz>
	<43E203F9.9060307@samsco.org> <1138890130.9192.3.camel@genius.i.cz>
	<43E2184B.3040606@samsco.org>
Content-Type: text/plain
Date: Fri, 24 Feb 2006 17:51:55 +0100
Message-Id: <1140799915.867.17.camel@genius.i.cz>
Mime-Version: 1.0
X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: freebsd-current@freebsd.org
Subject: Re: em(4) stops forwarding
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Feb 2006 16:52:19 -0000

Scott Long wrote:
> Michal Mertl wrote:
> 
> > Scott Long wrote:
> > 
> >>Michal Mertl wrote:
> >>
> >>>Scott Long wrote:
> >>>
> >>>
> >>>>Michal Mertl wrote:
> >>>>
> >>>>
> >>>>>Hello,
> >>>>>
> >>>>>I've been running CURRENT for long time and never experienced problem
> >>>>>with the built-in em(4) card before. Recently (I first noticed it on Jan
> >>>>>24) the card has stopped working several times. Nothing gets into the
> >>>>>log file. Carrier is still detected properly but no data is exchanged.
> >>>>>Ifconfig up/down doesn't help but kldunload/load does. When I run
> >>>>>tcpdump I don't see any packet coming in but I see some outgoing.
> >>>>>
> >>>>>Can someone suggest what to look at when it happens the next time? I
> >>>>>have DDB compiled in. I will try to sniff the wire using another machine
> >>>>>next time to see if the card sends out anything.
> >>>>>
> >>>>>The command 'pciconf -lv' says about the card this:
> >>>>>em0@pci2:1:0:   class=0x020000 card=0x05491014 chip=0x101e8086 rev=0x03
> >>>>>hdr=0x00
> >>>>>   vendor   = 'Intel Corporation'
> >>>>>   device   = '82540EP Gigabit Ethernet Controller (Mobile)'
> >>>>>   class    = network
> >>>>>   subclass = ethernet
> >>>>>
> >>>>>The dmesg:
> >>>>>em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port
> >>>>>0x8000-0x803f mem 0xc0220000-0xc023ffff,0xc0200000-0xc020ffff irq 11 at
> >>>>>device 1.0 on pci2
> >>>>>em0: Ethernet address: 00:0d:60:cd:ae:e2
> >>>>>em0: [FAST]
> >>>>>
> >>>>>The interrupt is shared since the machine is a notebook. I don't know if
> >>>>>it was just a coincidence but I think that it happened at the same time
> >>>>>as my USB mouse stopped working - the USB controller is on the same irq.
> >>>>>
> >>>>>Michal
> >>>>>
> >>>>
> >>>>What is sharing the interrupt?
> >>>
> >>>
> >>>vgapci0, ipw0, ehci0, uhci0-2. I don't think vgapci0 and ipw0 are really
> >>>using the interrupt when I use em0.
> >>>
> >>>
> >>
> >>Ouch.  For now, edit /sys/dev/em/if_em.c and add the following line to 
> >>the top of the file:
> >>
> >>#define NO_EM_FASTINTR
> > 
> > 
> > Do you know the reason of the problem? Wouldn't it be better if I used
> > stock driver and got some information for you when it doesn't work? I
> > use the machine as my workstation so it isn't such a big problem when it
> > looses the network.
> > 
> 
> The problem is that the drivers that are sharing the interrupt,
> particularly the USB ones, can spend a very very long time waiting on
> locks to service the interrupt.  During that time, the interrupt pin is
> masked and the all interrupts from all shared devices don't get
> delivered. So even though the if_em driver has a very fast interrupt
> handler, it still has to wait on the USB drivers.  During that wait, a
> burst of network traffic might come into the card, filling its buffers
> and triggering an overflow.  This would be especially likely to happen
> while the kernel is flushing out filesystem i/o.  In theory the
> interrupt service latency shouldn't be any different whether the if_em
> driver is fast or not, but there might be coincidental timing issues
> that I don't understand.  That's why I'd like you to set the #ifdef in
> the driver to revert it back to it's classic behaviour and see if the
> problem persists.  If it doesn't, then I'll have to rethink some of the
> changes that I made to it.
> 

I thought I should let you know if I still experience the em lock up.
The answer is unfotunately that it didn't happen any more neither with
NO_EM_FASTINTR defined or not.

> Scott
> 
> > 
> >>Also, does your kernel config include the apic device?
> > 
> > 
> > Yes, it does. But I believe that the chipset doesn't have it and neither
> > the CPU supports it.
> > 
> > Michal
> > 
> 
>