Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 2015 23:14:20 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc:        Alexey Dokuchaev <danfe@FreeBSD.org>, "freebsd-sparc64@freebsd.org" <freebsd-sparc64@freebsd.org>
Subject:   Re: PCI range checking under qemu-system-sparc64
Message-ID:  <20150919211420.GK18789@alchemy.franken.de>
In-Reply-To: <55FBB662.4080708@ilande.co.uk>
References:  <55EDFE00.9090109@ilande.co.uk> <20150913022143.GA7862@alchemy.franken.de> <20150913103940.GA60101@FreeBSD.org> <20150913180126.GC7862@alchemy.franken.de> <55F89861.1030107@ilande.co.uk> <20150916031030.GA6711@FreeBSD.org> <55F9C2B8.7030605@ilande.co.uk> <20150916211914.GD18789@alchemy.franken.de> <20150917082817.GA71811@FreeBSD.org> <55FBB662.4080708@ilande.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 18, 2015 at 07:59:46AM +0100, Mark Cave-Ayland wrote:
> On 17/09/15 09:28, Alexey Dokuchaev wrote:
> 
> > On Wed, Sep 16, 2015 at 11:19:15PM +0200, Marius Strobl wrote:
> >> [...]
> >> Which suggest that the next thing to investigate is the CMD646
> >> emulation. Is there a particular reason why QEMU emulates a
> >> CMD646U rather than a plain CMD646 as found in the real sun4u
> >> machines of the USIIe/i era?
> >>
> >> Alexey, does building the port with CDROM_DMA disabled make
> >> a difference?
> > 
> > Ironically I had it already disabled prior to your question; but I've
> > rebuilt the port enabling it for completeness' sake.  It did not make
> > a difference.
> > 
> > Then I've disabled all CAM/ATA stuff (scbus, ata, umass, etc.) in the
> > kernel config and that's what I see now (this is with CDROM_DMA=on):
> 
> What does the CAM/ATA stuff do here? Does this mean it may not
> necessarily be an interrupt issue if you can get to mounting the root fs
> with CDROM_DMA=on?

I think we are looking at multiple issues here. First off, based
on the interrupt-map provided by OpenBIOS, we route the intpin of
the ATA controller to INO 20 (which uses the interrupt mapping
register at offset 0xc28). If I additionally enable the code for
debugging interrupt routing problems, which just clears all
interrupts handle by the Sabre and then in all its interrupt
mapping registers enables all INOs by writing the valid bit,
Sabre IGN and CPU module ID there, I see 5 interrupts for INO
20 at the time the kernel hangs under QEMU. I don't get any
stray interrupts for INOs that we're not actively routing.
Based on what Alexey wrote, he doesn't see interrupts for INO 20,
i. e. the ATA controller, with a kernel not including the debug
code. That doesn't make sense to me. Without the debug code, we
still enable the INOs for which drivers request them when they
attach to devices the same way as described above, just not
unconditionally for all interrupts handle by the Sabre. So
there should be no difference for correctly routed interrupts.

Second, even when I see interrupts for the ATA controller,
enumeration of storage devices hangs somehow. On a real
machine when booting verbose, this looks like (mainly output
from ata_generic_reset():
<...>
IPsec: Initialized Security Association Processing.
lo0: bpf attached
ata2: reset tp1 mask=00 ostat0=ff ostat1=ff
ata3: reset tp1 mask=03 ostat0=50 ostat1=00
ata3: stat0=0x80 err=0x80 lsb=0x80 msb=0x80
ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
ata3: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb
ata3: reset tp2 stat0=50 stat1=00 devices=0x20001
GEOM: new disk cd0
cd0 at ata3 bus 0 scbus1 target 1 lun 0
cd0: <TEAC CD-224E 1.7A> Removable CD-ROM SCSI device
cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
<...>

With QEMU:
<...>
IPsec: Initialized Security Association Processing.
lo0: bpf attached
ata2: reset tp1 mask=03 ostat0=00 ostat1=00
ata2: stat0=0x00 err=0x00 lsb=0x00 msb=0x00
ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00
ata2: reset tp2 stat0=00 stat1=00 devices=0x0
ata3: reset tp1 mask=03 ostat0=50 ostat1=00
ata3: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
ata3: stat1=0x00 err=0x00 lsb=0xff msb=0xff
ata3: reset tp2 stat0=00 stat1=00 devices=0x10000
<hang>
<...>
db> show intrcnt
pil3: ithrd             5
vec2004: atapci0        5
QEMU: Terminated
<...>

So after a reset, real iron additionally sets the READY and
SERVICE bits in the ATA status register but that doesn't
explain the hang. Code that is waiting for READY to get set
also uses a timeout.

I patched QEMU to identify the ATA controller as a plain
CMD646 one so the UDMA isn't tried in the first place. I
also limited the ATA mode used by the kernel to PIO4 at
maximum. Neither made a difference, i. e. the hang still
occurs.

Another striking thing is that the interrupt statistics
show to CPU tick interrupts. I'm unsure at which point
the kernel actually enables them but they really should
have been engaged at the time the hang occurs, otherwise
scheduling won't work properly, which also might be an
explanation for the hang, i. e. the kernel might never
switch to the CAM thread(s).

All in all, interrupts in QEMU seem buggy in one way
or another.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150919211420.GK18789>