Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Aug 2006 10:06:39 +0200 (CEST)
From:      Daniel Ryslink <daniel.ryslink@col.cz>
To:        Landon Fuller <landonf@opendarwin.org>
Cc:        freebsd-net@freebsd.org, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: Problems with em interfaces on FreeBSD 6.1
Message-ID:  <20060812100354.I43868@k2.vol.cz>
In-Reply-To: <B0C441A8-22F7-45C8-B847-B0BDA7DE7779@opendarwin.org>
References:  <20060811100536.V80282@k2.vol.cz> <20060811111240.GD96644@FreeBSD.org> <20060811133531.D80282@k2.vol.cz> <20060811125825.GH96644@cell.sick.ru> <2a41acea0608110922h4bed63b1ke09f91b610819805@mail.gmail.com> <B0C441A8-22F7-45C8-B847-B0BDA7DE7779@opendarwin.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Hello,

Our machine is not running in SMP mode, it's single CPU with 
hyperthreading switched off.

I would also like to point out that similar problem occured on yet another 
machine running as a web server with FreeBSD 6.1 and em driver. Also, 
hardware problems are unlikely, since we tried three different servers 
already and we have also changed ethernet cables on the whole route to the 
uplink switch.

Best Regards
Daniel Ryslink

On Fri, 11 Aug 2006, Landon Fuller wrote:

>
> On Aug 11, 2006, at 09:22, Jack Vogel wrote:
>
>> On 8/11/06, Gleb Smirnoff <glebius@freebsd.org> wrote:
>>>  Daniel,
>>> 
>>> On Fri, Aug 11, 2006 at 01:42:32PM +0200, Daniel Ryslink wrote:
>>> D> We have started to use the em driver only recently, after the upgrade 
>>> to
>>> D> gigabit connectivity (100 MBit NICs from Intel used the fxp driver).
>>> D>
>>> D> As for the frequency of the incidents, here is a grep of the messages:
>>> D>
>>> D> ~~~~~~~~~~~~ <slash> ~~~~~~~~~~~~~~
>>> D> Aug  4 22:35:23 b2 kernel: em0: watchdog timeout -- resetting
>>> D> Aug  5 00:09:20 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  5 06:08:59 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  6 12:38:16 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  6 20:39:47 b2 kernel: em0: watchdog timeout -- resetting
>>> D> Aug  7 18:37:29 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  8 07:27:48 b2 kernel: em0: watchdog timeout -- resetting
>>> D> Aug  8 09:38:17 b2 kernel: em0: watchdog timeout -- resetting
>>> D> Aug  8 12:54:54 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  8 22:41:17 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  9 05:17:24 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  9 10:56:10 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug  9 20:10:06 b2 kernel: em1: watchdog timeout -- resetting
>>> D> Aug 11 08:41:44 b2 kernel: em0: watchdog timeout -- resetting
>>> D> Aug 11 10:35:43 b2 kernel: em0: watchdog timeout -- resetting
>>> D> ~~~~~~~~~~~~ <slash> ~~~~~~~~~~~~~~
>>> D>
>>> D> The driver used is version 3.2.18 (I wanted to use the Intel 6.1.4 as a
>>> D> module, but I have found out that I made a mistake and accidentally 
>>> loaded
>>> D> the old 3.2.18 driver).
>>> D>
>>> D> I have dilemma now - which new driver to try? The 6.0.5 submitted to 
>>> the
>>> D> current FreeBSD 6.1 branch (modified by you, I believe, on 8th August), 
>>> or
>>> D> the newest driver from Intel 6.1.4? Do you think one of these drivers
>>> D> could solve my problems?
>>> 
>>> I'm not sure whether new driver will solve your problems. You should give
>>> a try to 6.1-STABLE which has 6.0.5 in it. The difference between 6.1.4 
>>> and
>>> 6.0.5 is quite small, I doubt that 6.1.4 worth a try in your case.
>> 
>> Gleb is right, the difference between my 6.0.5 and 6.1.4 driver are minor
>> and don't seem to have anything to do with your problem.
>> 
>> I am happy Gleb got my code merged with tip of STABLE and would take
>> that driver code if I were you, it will become 6.2 before long :)
>> 
>> Watchdogs happen because of transmit cleanup failing, your instances
>> are pretty widely seperated, it looks like some external network problem
>> perhaps?
>
> We saw this issue here on SMP systems running 6.1; I've been meaning to set 
> up a reproduction case in the lab and dig into the issue further.
> Disabling the mpsafe network stack (debug.mpsafenet=0) is our temporary 
> work-around; rwatson mentioned that this has the effect of forcing the 
> interrupt handler for if_em to not run in parallel with the transmit code, 
> which is likely what caused the problem to disappear.
>
> -landonf



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060812100354.I43868>