Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jan 2007 10:42:34 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        "Jack Vogel" <jfvogel@gmail.com>
Cc:        freebsd-current@freebsd.org, Mark Atkinson <atkin901@yahoo.com>
Subject:   Re: another msi blacklist candidate?
Message-ID:  <200701221042.35008.jhb@freebsd.org>
In-Reply-To: <2a41acea0701212106t31b5478di8817cfda25637945@mail.gmail.com>
References:  <eoqo83$m7j$1@sea.gmane.org> <45B3A821.3060605@samsco.org> <2a41acea0701212106t31b5478di8817cfda25637945@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 22 January 2007 00:06, Jack Vogel wrote:
> On 1/21/07, Scott Long <scottl@samsco.org> wrote:
> > Jack Vogel wrote:
> > > On 1/20/07, Scott Long <scottl@samsco.org> wrote:
> > >> Jack Vogel wrote:
> > >> > On 1/20/07, John Baldwin <jhb@freebsd.org> wrote:
> > >> >> On Friday 19 January 2007 13:55, Jack Vogel wrote:
> > >> >> > On 1/19/07, Mark Atkinson <atkin901@yahoo.com> wrote:
> > >> >> > > I upgraded a box to -current yesterday with the following pci 
card
> > >> >> in it,
> > >> >> > > (this is the msi disabled verbose boot below) but upon bootup, 
any
> > >> >> heavy
> > >> >> > > network activity caused watchdog timeouts and resets.   
Disabling
> > >> >> msi via
> > >> >> > > the two tunables fixed the problem.
> > >> >> > >
> > >> >> > > What info do you need on this problem?
> > >> >> > >
> > >> >> > > found-> vendor=0x8086, dev=0x1076, revid=0x00
> > >> >> > >         bus=4, slot=2, func=0
> > >> >> > >         class=02-00-00, hdrtype=0x00, mfdev=0
> > >> >> > >         cmdreg=0x0117, statreg=0x0230, cachelnsz=16 (dwords)
> > >> >> > >         lattimer=0x40 (1920 ns), mingnt=0xff (63750 ns),
> > >> >> maxlat=0x00 (0 ns)
> > >> >> > >         intpin=a, irq=10
> > >> >> > >         powerspec 2  supports D0 D3  current D0
> > >> >> > >         MSI supports 1 message, 64 bit
> > >> >> > >         map[10]: type 1, range 32, base 0xdf9c0000, size 17,
> > >> enabled
> > >> >> > > pcib4: requested memory range 0xdf9c0000-0xdf9dffff: good
> > >> >> > >         map[14]: type 1, range 32, base 0xdf9e0000, size 17,
> > >> enabled
> > >> >> > > pcib4: requested memory range 0xdf9e0000-0xdf9fffff: good
> > >> >> > >         map[18]: type 4, range 32, base 0xdcc0, size  6, enabled
> > >> >> > > pcib4: requested I/O range 0xdcc0-0xdcff: in range
> > >> >> > > pcib4: matched entry for 4.2.INTA
> > >> >> > > pcib4: slot 2 INTA hardwired to IRQ 18
> > >> >> > > em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> > >> >> > > 0xdcc0-0xdcff m
> > >> >> > > em 0xdf9c0000-0xdf9dffff,0xdf9e0000-0xdf9fffff irq 18 at device
> > >> >> 2.0 on pci4
> > >> >> > > em0: Reserved 0x20000 bytes for rid 0x10 type 3 at 0xdf9c0000
> > >> >> > > em0: Reserved 0x40 bytes for rid 0x18 type 4 at 0xdcc0
> > >> >> > > em0: bpf attached
> > >> >> > > em0: Ethernet address: 00:0e:0c:6e:a1:39
> > >> >> > > em0: [FAST]
> > >> >> >
> > >> >> > Talked about this internally, and the advise here is that the em
> > >> >> driver change
> > >> >> > so that only PCI-E adapters can use MSI, this would eliminate the
> > >> >> need to
> > >> >> > blacklist in the kernel PCI code.
> > >> >>
> > >> >> It's not em(4) that is the problem, but the system, and I'd rather 
we
> > >> >> fix it
> > >> >> generically rather than in each driver.  Maybe we should disable MSI
> > >> >> for non-PCIe
> > >> >> systems?
> > >> >
> > >> > Depends what that means, say a system HAS PCI-E, but also a PCI 
and/or
> > >> > a PCI-X slot will MSI be unavailable in those slots, that's what I
> > >> would
> > >> > prefer.
> > >> >
> > >> > Jack
> > >>
> > >> Are you saying that MSI should only be available to PCIe devices?  That
> > >> will break legitimate PCI-X devices.
> > >
> > > True, the question is how many of those devices are problematic and need
> > > blacklisting anyway? I don't have a feel for this, do you Scott?
> > >
> > > Jack
> >
> > It's up to the driver writers to keep tabs on their peripherals.  If the
> > Intel 12345 PCI-X NIC can't do MSI but the Intel 23456 PCI-X NIC can,
> > then it's up to the driver to know that.  Chipset support is the
> > responsibility of the OS, and that's where it gets more difficult
> > because MSI is still fairly immature on the x86/x64 platform.
> >
> > Scott
> >
> 
> LOL, this conversation started because I said I was going to disallow
> some adapters from MSI and John said it should be in the OS not
> all the drivers :)
> 
> I'm happy to do it the way I planned at first anyway :)

Umm, is there an errata where non-PCI-e em(4) adapters don't support MSI?
If not, then don't disable MSI.  Unless that is the case, the more likely
cause is that the OP's chipset doesn't support MSI _at all_, and that type
of issue should be in the OS.  If you know from errata that non-PCI-e em(4) 
parts can't do MSI, then that is something you should handle in the driver.

Maybe it would help if I explained how MSI works on x86:

By default, the local APIC is mapped at 0xfee00000 on x86 machines (both ia32
and amd64) so writes by the CPU to that range of physical addresses never make 
it out of the CPU and onto the bus.  As a result, that bit of address space 
is essentially dead to the rest of the system (there could be RAM backing it, 
but if you DMA'd data into it, the CPU could never access it).  Thus, for MSI 
on x86, the chipset reuses that "dead" address space.  It has a device in the 
chipset (probably in the MCH on Intel, it's in HT-PCI bridges on amd) that 
listens for writes to that address range and generates an APIC message that 
is sent to the appropriate CPU(s) to trigger an interrupt.    All that magic 
is in the chipset, it's not in the CPU, it's not in any of the PCI devices 
sending messages (they just do normal memory writes).  Assuming non PCI-e 
em(4) parts are capable of correctly generating a memory write (:-P) they 
should work fine so long as something in the chipset is "listening" for the 
messages to convert them into APIC interrupt messages.  Older chipsets don't 
have anything listening.  Typically what happens then is that the device gets 
a target abort on the memory write and asserts SERR#.  On some systems this 
is just ignored (and you get watchdog timeouts, etc.) and on other systems 
this triggers an NMI.  However, none of this is em(4) specific.

My suggestion is that we should blacklist non-PCI-e chipsets by default
(maybe with explicit whitelisting for non-PCI-e ones that do work).  In a 
working PCI-e system if you have a non-PCI-e em(4) part that can do MSI part 
whose parent bridge is a PCIe-PCI bridge and the chipset has 
something "listening" for writes to 0xfee00000, then the memory write that 
MSI triggers is going to propagate up to the chipset and work just as well as 
for a PCI-e device in the system.

In short, you should only disable devices in the driver if you know they are 
broken in the actual device itself, not as a poor way to try to handle broken 
chipsets.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701221042.35008.jhb>