Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jan 2012 11:38:42 -0700
From:      Scott Long <scottl@samsco.org>
To:        Ian Lepore <freebsd@damnhippie.dyndns.org>
Cc:        FreeBSD current <freebsd-current@freebsd.org>, Luigi Rizzo <rizzo@iet.unipi.it>
Subject:   Re: memory barriers in bus_dmamap_sync() ?
Message-ID:  <3E27CFAB-DCB3-49E4-9C2A-DD8449B15D64@samsco.org>
In-Reply-To: <1326301842.2419.80.camel@revolution.hippie.lan>
References:  <20120110213719.GA92799@onelab2.iet.unipi.it> <CAJ-VmomdQ5ZWBf_h1xJhppO8WsinvK7RJiDSgDrYKpo%2BJ8eGYQ@mail.gmail.com> <20120110224100.GB93082@onelab2.iet.unipi.it> <201201111005.28610.jhb@freebsd.org> <20120111162944.GB2266@onelab2.iet.unipi.it> <4E8FCE8E-DDCB-4B38-9BFD-2A67BF03D50F@samsco.org> <1326301842.2419.80.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote:

> On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
>>=20
>> Where barriers _are_ needed is in interrupt handlers, and I can
>> discuss that if you're interested.
>>=20
>> Scott
>>=20
>=20
> I'd be interested in hearing about that (and in general I'm loving the
> details coming out in your explanations -- thanks!).
>=20
> -- Ian
>=20
>=20

Well, I unfortunately wasn't as clear as I should have been.  Interrupt =
handlers need bus barriers, not cpu cache/instruction barriers.  This is =
because the interrupt signal can arrive at the CPU before data and =
control words are finished being DMA's up from the controller.  Also, =
many controllers require an acknowledgement write to be performed before =
leaving the interrupt handler, so the driver needs to do a bus barrier =
to ensure that the write flushes.  But these are two different topics, =
so let me start with the interrupt handler.

Legacy interrupts in PCI are carried on discrete pins and are level =
triggered.  When the device wants to signal an interrupt, it asserts the =
pin.  That assertion is seen at the IOAPIC on the host bridge and =
converted to an interrupt message, which is then sent immediately to the =
CPU's lAPIC.  This all happened very, very quickly.  Meanwhile, the =
interrupt condition could have been predicated on the device DMA'ing =
bytes up to host memory, and those DMA writes could have gotten stalled =
and buffered on the way up the PCI topology.  The end result is often =
that the driver interrupt handler runs before those writes have hit host =
memory.  To fix this, drivers do a read of a card register as the first =
step in the interrupt handler, even if the read is just a dummy and the =
result is thrown away.  Thanks to PCI ordering, the read will ensure =
that any pending writes from the card have flushed all the way up, and =
everything will be coherent by the time the read completes.

MSI and MSIX interrupts on modern PCI and PCIe fix this.  These =
interrupts are sent as byte messages that are DMA'd to the host bridge.  =
Since they are in-band data, they are subject to the same ordering rules =
as all other data on the bus, and thus ordering for them is implicit.  =
When the MSI message reaches the host bridge, it's converted into an =
lAPIC message just like before.  However, the driver doesn't need to do =
a flushing read because it knows that the MSI message was the last write =
on the bus, therefore everything prior to it has arrived and everything =
is coherent.  Since reads are expensive in PCI, this saves a =
considerable amount of time in the driver.  Unfortunately, it adds =
non-deterministic latency to the interrupt since the MSI message is =
in-band and has no way to force priority flushing on a busy bus.  So =
while MSI/MSIX save some time in the interrupt handler, they actually =
make the overall latency situation potentially worse (thanks Intel!).

The acknowledgement write issue is a little more straight forward.  If =
the card requires an acknowledgment write from the driver to know that =
the interrupt has been serviced (so that it'll then know to de-assert =
the interrupt line), that write has to be flushed to the hardware before =
the interrupt handler completes.  Otherwise, the write could get =
stalled, the interrupt remain asserted, and in the interrupt erroneously =
re-trigger on the host CPU.  I've seen cases where this devolves into =
the card getting out of sync with the driver to the point that =
interrupts get missed.  Also, this gets a little weird sometimes with =
buggy MSI hacks in both device and PCI bridge hardware.

Scott






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E27CFAB-DCB3-49E4-9C2A-DD8449B15D64>