From owner-freebsd-net@FreeBSD.ORG Thu Oct 19 07:10:15 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 15A7116A415 for ; Thu, 19 Oct 2006 07:10:15 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1DA2143D5F for ; Thu, 19 Oct 2006 07:10:13 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k9J7A7F0021835; Thu, 19 Oct 2006 01:10:12 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <453724CA.8070609@samsco.org> Date: Thu, 19 Oct 2006 01:10:02 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5 MIME-Version: 1.0 To: Bruce Evans References: <2a41acea0610181046k822afd1qcec4187dc8514187@mail.gmail.com> <2a41acea0610181531y732cd5sa7bf733cc445491c@mail.gmail.com> <20061018224233.GA1632@xor.obsecurity.org> <20061019110950.X75878@delplex.bde.org> <4536EF19.2060201@samsco.org> <20061019141748.Y76352@delplex.bde.org> <45371B91.5090507@samsco.org> <20061019164814.L76712@delplex.bde.org> In-Reply-To: <20061019164814.L76712@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: Kip Macy , freebsd-net , Jack Vogel , Kris Kennaway Subject: Re: em network issues X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Oct 2006 07:10:15 -0000 Bruce Evans wrote: > On Thu, 19 Oct 2006, Scott Long wrote: > >> Bruce Evans wrote: > >>>>> On Wed, 18 Oct 2006, Kris Kennaway wrote: >>>>>> I have been working with someone's system that has em shared with >>>>>> fxp, >>>>>> and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is >>>>>> enough to produce watchdog timeouts after a few seconds. >>>>> >>>>> em_intr_fast() has no locking whatsoever. I would be very surprised >>>>> if it even seemed to work for SMP. For UP, masking of CPU interrupts >>>>> (as is automatic in fast interrupt handlers) might provide sufficient >>>>> locking, ... >>> >>> I barely noticed the point about it being shared. With sharing, and >>> probably especially with fast and normal interrupt handlers sharing an >>> IRQ, locking is more needed. There are many possibilities for races. >>> One likely one is: >>> - em interrupt task running. Device interrupts are disabled, so the >>> task thinks it won't be interfered with by the em interrupt handler. >> >> What interference are you talking about? em_intr_fast changes no state >> in the driver softc (aside from the silly bookkeeping). It only reads >> from one register, and writes to no registers or shared memory. > > It disables interrupts. To do that, it calls em_disable_intr(). The > hardware is simple enough for em_disable_intr() not to have to make > many state changes, but it certainly has to make at least 1 to work. > It uses several layers of macros which I think ends up doing a write > to 1 register in bus space. > >>> - shared fxp interrupt. The em interrupt handler is called. Without >>> any explicit synchonization, bad things may happen and apparently do. >>> In the UP case, there is some implicit synchronization which may help >>> but is hard to understand. >> >> Can you be more specific as to the 'bad things'? > > Not very. Maybe interrupts don't get reenabled as intended. Then the > symptoms get mutated by watchdog timeouts. > > Bruce Then yes, I'm already thinking of a better way to do the interrupt enable/disable thing. I am still very surprised that the hardware cannot be silenced by doing a read and/or write of a status register, like most other hardware. If that were possible, this would be a very simple problem. Scott