From owner-freebsd-current@FreeBSD.ORG Wed Jan 11 18:38:52 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 239C11065676 for ; Wed, 11 Jan 2012 18:38:52 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id AF5F38FC12 for ; Wed, 11 Jan 2012 18:38:51 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.5/8.14.5) with ESMTP id q0BIcgO9013123; Wed, 11 Jan 2012 11:38:42 -0700 (MST) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <1326301842.2419.80.camel@revolution.hippie.lan> Date: Wed, 11 Jan 2012 11:38:42 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <3E27CFAB-DCB3-49E4-9C2A-DD8449B15D64@samsco.org> References: <20120110213719.GA92799@onelab2.iet.unipi.it> <20120110224100.GB93082@onelab2.iet.unipi.it> <201201111005.28610.jhb@freebsd.org> <20120111162944.GB2266@onelab2.iet.unipi.it> <4E8FCE8E-DDCB-4B38-9BFD-2A67BF03D50F@samsco.org> <1326301842.2419.80.camel@revolution.hippie.lan> To: Ian Lepore X-Mailer: Apple Mail (2.1251.1) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: FreeBSD current , Luigi Rizzo Subject: Re: memory barriers in bus_dmamap_sync() ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 18:38:52 -0000 On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote: > On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote: >>=20 >> Where barriers _are_ needed is in interrupt handlers, and I can >> discuss that if you're interested. >>=20 >> Scott >>=20 >=20 > I'd be interested in hearing about that (and in general I'm loving the > details coming out in your explanations -- thanks!). >=20 > -- Ian >=20 >=20 Well, I unfortunately wasn't as clear as I should have been. Interrupt = handlers need bus barriers, not cpu cache/instruction barriers. This is = because the interrupt signal can arrive at the CPU before data and = control words are finished being DMA's up from the controller. Also, = many controllers require an acknowledgement write to be performed before = leaving the interrupt handler, so the driver needs to do a bus barrier = to ensure that the write flushes. But these are two different topics, = so let me start with the interrupt handler. Legacy interrupts in PCI are carried on discrete pins and are level = triggered. When the device wants to signal an interrupt, it asserts the = pin. That assertion is seen at the IOAPIC on the host bridge and = converted to an interrupt message, which is then sent immediately to the = CPU's lAPIC. This all happened very, very quickly. Meanwhile, the = interrupt condition could have been predicated on the device DMA'ing = bytes up to host memory, and those DMA writes could have gotten stalled = and buffered on the way up the PCI topology. The end result is often = that the driver interrupt handler runs before those writes have hit host = memory. To fix this, drivers do a read of a card register as the first = step in the interrupt handler, even if the read is just a dummy and the = result is thrown away. Thanks to PCI ordering, the read will ensure = that any pending writes from the card have flushed all the way up, and = everything will be coherent by the time the read completes. MSI and MSIX interrupts on modern PCI and PCIe fix this. These = interrupts are sent as byte messages that are DMA'd to the host bridge. = Since they are in-band data, they are subject to the same ordering rules = as all other data on the bus, and thus ordering for them is implicit. = When the MSI message reaches the host bridge, it's converted into an = lAPIC message just like before. However, the driver doesn't need to do = a flushing read because it knows that the MSI message was the last write = on the bus, therefore everything prior to it has arrived and everything = is coherent. Since reads are expensive in PCI, this saves a = considerable amount of time in the driver. Unfortunately, it adds = non-deterministic latency to the interrupt since the MSI message is = in-band and has no way to force priority flushing on a busy bus. So = while MSI/MSIX save some time in the interrupt handler, they actually = make the overall latency situation potentially worse (thanks Intel!). The acknowledgement write issue is a little more straight forward. If = the card requires an acknowledgment write from the driver to know that = the interrupt has been serviced (so that it'll then know to de-assert = the interrupt line), that write has to be flushed to the hardware before = the interrupt handler completes. Otherwise, the write could get = stalled, the interrupt remain asserted, and in the interrupt erroneously = re-trigger on the host CPU. I've seen cases where this devolves into = the card getting out of sync with the driver to the point that = interrupts get missed. Also, this gets a little weird sometimes with = buggy MSI hacks in both device and PCI bridge hardware. Scott