Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Jul 2006 19:54:05 GMT
From:      Guillaume Ballet <asqyzeron@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/100356: Fix: "Non-maskable interrupt while in kernel mode with" a TI firewire controler
Message-ID:  <200607151954.k6FJs5Gh028258@www.freebsd.org>
Resent-Message-ID: <200607152000.k6FK0YSv057867@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         100356
>Category:       i386
>Synopsis:       Fix: "Non-maskable interrupt while in kernel mode with" a TI firewire controler
>Confidential:   no
>Severity:       critical
>Priority:       low
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jul 15 20:00:33 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Guillaume Ballet
>Release:        All of them since 5.2 at least
>Organization:
>Environment:
GENERIC - and any kernel including the firewire driver
>Description:
At boot time, when initializing the firewire driver with a machine having a TI controler (0x104c, 0x8032 at least), the following error message appears.

RAM parity error, likely hardware failure.
Fatal trap 19: non-maskable interrupt trap while in kernel mode.
instruction pointer = 0x20:0xc0528586
stack pointer       = 0x28:0xc10209c4
code segment        = base 0x0, limit 0xfffff, type 0x1b
                   = DPL0, pres 1, def32 1, gran 1
processor eflags    = interupt enabled, IOPL = 0
current process     = 0 (swapper)
trap number         = 19
panic : non-maskable interrupt trap

This is due to the fact that the controler and/or the PCI bus doesn't react quickly enough to the first OWRITE function (see code below, from sys/dev/firewire/fwohci.c).

312         OWRITE(sc, FWOHCI_INTSTATCLR, OHCI_INT_REG_FAIL);
313         fun = PHYDEV_RDCMD | (addr << PHYDEV_REGADDR);
314         OWRITE(sc, OHCI_PHYACCESS, fun);
315         for ( i = 0 ; i < MAX_RETRY ; i ++ ){
316                 fun = OREAD(sc, OHCI_PHYACCESS);
317                 if ((fun & PHYDEV_RDCMD) == 0 && (fun & PHYDEV_RDDONE) != 0)
318                         break;
319                 DELAY(100);
320         }

When performing the second OWRITE, an uninitialized value makes its way to eax, and at the instruction:

<fwphy_rddata+156>:  mov    0xec(%eax),%eax

it fails. The debugger told me eax = 0xffffffff.

A read error on the PCI bus is wrongly interpreted as an ISA NMI error, and the kernel crashes.

The problem seems to only happens when trying to read the speed, thus pointing at slow update on the PCI bus at init time.
>How-To-Repeat:
Insert any freebsd install CD into the drive at boot time. Wait. Enjoy :P
>Fix:
Fixing the problem is fairly simple : Just give more time to the bus or the controler to configure the right port. This is done by altering the code, as below

OWRITE(sc, FWOHCI_INTSTATCLR, OHCI_INT_REG_FAIL);
if (addr == FW_PHY_SPD_REG)
	DELAY(500);
fun = PHYDEV_RDCMD | (addr << PHYDEV_REGADDR);
OWRITE(sc, OHCI_PHYACCESS, fun);
for ( i = 0 ; i < MAX_RETRY ; i ++ ){
        fun = OREAD(sc, OHCI_PHYACCESS);
            if ((fun & PHYDEV_RDCMD) == 0 && (fun & PHYDEV_RDDONE) != 0)
                break;
}

which, given as a diff on sys/dev/firewire/fwohci.c is:
313a314,315
>	if (addr == FW_PHY_SPD_REG)
>		DELAY(500);

It has been tested on several machines and works fine. Of course, it could be more elegant to write specific code for when trying to determine the speed instead of adding a if in a function that doesn't use it most of the time. I prefer to let the maintainer of this file decide what is best.
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200607151954.k6FJs5Gh028258>