From owner-freebsd-current@FreeBSD.ORG Mon Jan 22 15:54:54 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8340116A409 for ; Mon, 22 Jan 2007 15:54:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.freebsd.org (Postfix) with ESMTP id 543C013C45B for ; Mon, 22 Jan 2007 15:54:53 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l0MFsJ2k099205; Mon, 22 Jan 2007 10:54:31 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Jack Vogel" Date: Mon, 22 Jan 2007 10:42:34 -0500 User-Agent: KMail/1.9.1 References: <45B3A821.3060605@samsco.org> <2a41acea0701212106t31b5478di8817cfda25637945@mail.gmail.com> In-Reply-To: <2a41acea0701212106t31b5478di8817cfda25637945@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200701221042.35008.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Mon, 22 Jan 2007 10:54:31 -0500 (EST) X-Virus-Scanned: ClamAV 0.88.3/2477/Mon Jan 22 10:10:05 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-current@freebsd.org, Mark Atkinson Subject: Re: another msi blacklist candidate? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jan 2007 15:54:54 -0000 On Monday 22 January 2007 00:06, Jack Vogel wrote: > On 1/21/07, Scott Long wrote: > > Jack Vogel wrote: > > > On 1/20/07, Scott Long wrote: > > >> Jack Vogel wrote: > > >> > On 1/20/07, John Baldwin wrote: > > >> >> On Friday 19 January 2007 13:55, Jack Vogel wrote: > > >> >> > On 1/19/07, Mark Atkinson wrote: > > >> >> > > I upgraded a box to -current yesterday with the following pci card > > >> >> in it, > > >> >> > > (this is the msi disabled verbose boot below) but upon bootup, any > > >> >> heavy > > >> >> > > network activity caused watchdog timeouts and resets. Disabling > > >> >> msi via > > >> >> > > the two tunables fixed the problem. > > >> >> > > > > >> >> > > What info do you need on this problem? > > >> >> > > > > >> >> > > found-> vendor=0x8086, dev=0x1076, revid=0x00 > > >> >> > > bus=4, slot=2, func=0 > > >> >> > > class=02-00-00, hdrtype=0x00, mfdev=0 > > >> >> > > cmdreg=0x0117, statreg=0x0230, cachelnsz=16 (dwords) > > >> >> > > lattimer=0x40 (1920 ns), mingnt=0xff (63750 ns), > > >> >> maxlat=0x00 (0 ns) > > >> >> > > intpin=a, irq=10 > > >> >> > > powerspec 2 supports D0 D3 current D0 > > >> >> > > MSI supports 1 message, 64 bit > > >> >> > > map[10]: type 1, range 32, base 0xdf9c0000, size 17, > > >> enabled > > >> >> > > pcib4: requested memory range 0xdf9c0000-0xdf9dffff: good > > >> >> > > map[14]: type 1, range 32, base 0xdf9e0000, size 17, > > >> enabled > > >> >> > > pcib4: requested memory range 0xdf9e0000-0xdf9fffff: good > > >> >> > > map[18]: type 4, range 32, base 0xdcc0, size 6, enabled > > >> >> > > pcib4: requested I/O range 0xdcc0-0xdcff: in range > > >> >> > > pcib4: matched entry for 4.2.INTA > > >> >> > > pcib4: slot 2 INTA hardwired to IRQ 18 > > >> >> > > em0: port > > >> >> > > 0xdcc0-0xdcff m > > >> >> > > em 0xdf9c0000-0xdf9dffff,0xdf9e0000-0xdf9fffff irq 18 at device > > >> >> 2.0 on pci4 > > >> >> > > em0: Reserved 0x20000 bytes for rid 0x10 type 3 at 0xdf9c0000 > > >> >> > > em0: Reserved 0x40 bytes for rid 0x18 type 4 at 0xdcc0 > > >> >> > > em0: bpf attached > > >> >> > > em0: Ethernet address: 00:0e:0c:6e:a1:39 > > >> >> > > em0: [FAST] > > >> >> > > > >> >> > Talked about this internally, and the advise here is that the em > > >> >> driver change > > >> >> > so that only PCI-E adapters can use MSI, this would eliminate the > > >> >> need to > > >> >> > blacklist in the kernel PCI code. > > >> >> > > >> >> It's not em(4) that is the problem, but the system, and I'd rather we > > >> >> fix it > > >> >> generically rather than in each driver. Maybe we should disable MSI > > >> >> for non-PCIe > > >> >> systems? > > >> > > > >> > Depends what that means, say a system HAS PCI-E, but also a PCI and/or > > >> > a PCI-X slot will MSI be unavailable in those slots, that's what I > > >> would > > >> > prefer. > > >> > > > >> > Jack > > >> > > >> Are you saying that MSI should only be available to PCIe devices? That > > >> will break legitimate PCI-X devices. > > > > > > True, the question is how many of those devices are problematic and need > > > blacklisting anyway? I don't have a feel for this, do you Scott? > > > > > > Jack > > > > It's up to the driver writers to keep tabs on their peripherals. If the > > Intel 12345 PCI-X NIC can't do MSI but the Intel 23456 PCI-X NIC can, > > then it's up to the driver to know that. Chipset support is the > > responsibility of the OS, and that's where it gets more difficult > > because MSI is still fairly immature on the x86/x64 platform. > > > > Scott > > > > LOL, this conversation started because I said I was going to disallow > some adapters from MSI and John said it should be in the OS not > all the drivers :) > > I'm happy to do it the way I planned at first anyway :) Umm, is there an errata where non-PCI-e em(4) adapters don't support MSI? If not, then don't disable MSI. Unless that is the case, the more likely cause is that the OP's chipset doesn't support MSI _at all_, and that type of issue should be in the OS. If you know from errata that non-PCI-e em(4) parts can't do MSI, then that is something you should handle in the driver. Maybe it would help if I explained how MSI works on x86: By default, the local APIC is mapped at 0xfee00000 on x86 machines (both ia32 and amd64) so writes by the CPU to that range of physical addresses never make it out of the CPU and onto the bus. As a result, that bit of address space is essentially dead to the rest of the system (there could be RAM backing it, but if you DMA'd data into it, the CPU could never access it). Thus, for MSI on x86, the chipset reuses that "dead" address space. It has a device in the chipset (probably in the MCH on Intel, it's in HT-PCI bridges on amd) that listens for writes to that address range and generates an APIC message that is sent to the appropriate CPU(s) to trigger an interrupt. All that magic is in the chipset, it's not in the CPU, it's not in any of the PCI devices sending messages (they just do normal memory writes). Assuming non PCI-e em(4) parts are capable of correctly generating a memory write (:-P) they should work fine so long as something in the chipset is "listening" for the messages to convert them into APIC interrupt messages. Older chipsets don't have anything listening. Typically what happens then is that the device gets a target abort on the memory write and asserts SERR#. On some systems this is just ignored (and you get watchdog timeouts, etc.) and on other systems this triggers an NMI. However, none of this is em(4) specific. My suggestion is that we should blacklist non-PCI-e chipsets by default (maybe with explicit whitelisting for non-PCI-e ones that do work). In a working PCI-e system if you have a non-PCI-e em(4) part that can do MSI part whose parent bridge is a PCIe-PCI bridge and the chipset has something "listening" for writes to 0xfee00000, then the memory write that MSI triggers is going to propagate up to the chipset and work just as well as for a PCI-e device in the system. In short, you should only disable devices in the driver if you know they are broken in the actual device itself, not as a poor way to try to handle broken chipsets. -- John Baldwin