Date: Sat, 19 Sep 2015 23:14:20 +0200 From: Marius Strobl <marius@alchemy.franken.de> To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Cc: Alexey Dokuchaev <danfe@FreeBSD.org>, "freebsd-sparc64@freebsd.org" <freebsd-sparc64@freebsd.org> Subject: Re: PCI range checking under qemu-system-sparc64 Message-ID: <20150919211420.GK18789@alchemy.franken.de> In-Reply-To: <55FBB662.4080708@ilande.co.uk> References: <55EDFE00.9090109@ilande.co.uk> <20150913022143.GA7862@alchemy.franken.de> <20150913103940.GA60101@FreeBSD.org> <20150913180126.GC7862@alchemy.franken.de> <55F89861.1030107@ilande.co.uk> <20150916031030.GA6711@FreeBSD.org> <55F9C2B8.7030605@ilande.co.uk> <20150916211914.GD18789@alchemy.franken.de> <20150917082817.GA71811@FreeBSD.org> <55FBB662.4080708@ilande.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 18, 2015 at 07:59:46AM +0100, Mark Cave-Ayland wrote: > On 17/09/15 09:28, Alexey Dokuchaev wrote: > > > On Wed, Sep 16, 2015 at 11:19:15PM +0200, Marius Strobl wrote: > >> [...] > >> Which suggest that the next thing to investigate is the CMD646 > >> emulation. Is there a particular reason why QEMU emulates a > >> CMD646U rather than a plain CMD646 as found in the real sun4u > >> machines of the USIIe/i era? > >> > >> Alexey, does building the port with CDROM_DMA disabled make > >> a difference? > > > > Ironically I had it already disabled prior to your question; but I've > > rebuilt the port enabling it for completeness' sake. It did not make > > a difference. > > > > Then I've disabled all CAM/ATA stuff (scbus, ata, umass, etc.) in the > > kernel config and that's what I see now (this is with CDROM_DMA=on): > > What does the CAM/ATA stuff do here? Does this mean it may not > necessarily be an interrupt issue if you can get to mounting the root fs > with CDROM_DMA=on? I think we are looking at multiple issues here. First off, based on the interrupt-map provided by OpenBIOS, we route the intpin of the ATA controller to INO 20 (which uses the interrupt mapping register at offset 0xc28). If I additionally enable the code for debugging interrupt routing problems, which just clears all interrupts handle by the Sabre and then in all its interrupt mapping registers enables all INOs by writing the valid bit, Sabre IGN and CPU module ID there, I see 5 interrupts for INO 20 at the time the kernel hangs under QEMU. I don't get any stray interrupts for INOs that we're not actively routing. Based on what Alexey wrote, he doesn't see interrupts for INO 20, i. e. the ATA controller, with a kernel not including the debug code. That doesn't make sense to me. Without the debug code, we still enable the INOs for which drivers request them when they attach to devices the same way as described above, just not unconditionally for all interrupts handle by the Sabre. So there should be no difference for correctly routed interrupts. Second, even when I see interrupts for the ATA controller, enumeration of storage devices hangs somehow. On a real machine when booting verbose, this looks like (mainly output from ata_generic_reset(): <...> IPsec: Initialized Security Association Processing. lo0: bpf attached ata2: reset tp1 mask=00 ostat0=ff ostat1=ff ata3: reset tp1 mask=03 ostat0=50 ostat1=00 ata3: stat0=0x80 err=0x80 lsb=0x80 msb=0x80 ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata3: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb ata3: reset tp2 stat0=50 stat1=00 devices=0x20001 GEOM: new disk cd0 cd0 at ata3 bus 0 scbus1 target 1 lun 0 cd0: <TEAC CD-224E 1.7A> Removable CD-ROM SCSI device cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present <...> With QEMU: <...> IPsec: Initialized Security Association Processing. lo0: bpf attached ata2: reset tp1 mask=03 ostat0=00 ostat1=00 ata2: stat0=0x00 err=0x00 lsb=0x00 msb=0x00 ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00 ata2: reset tp2 stat0=00 stat1=00 devices=0x0 ata3: reset tp1 mask=03 ostat0=50 ostat1=00 ata3: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb ata3: stat1=0x00 err=0x00 lsb=0xff msb=0xff ata3: reset tp2 stat0=00 stat1=00 devices=0x10000 <hang> <...> db> show intrcnt pil3: ithrd 5 vec2004: atapci0 5 QEMU: Terminated <...> So after a reset, real iron additionally sets the READY and SERVICE bits in the ATA status register but that doesn't explain the hang. Code that is waiting for READY to get set also uses a timeout. I patched QEMU to identify the ATA controller as a plain CMD646 one so the UDMA isn't tried in the first place. I also limited the ATA mode used by the kernel to PIO4 at maximum. Neither made a difference, i. e. the hang still occurs. Another striking thing is that the interrupt statistics show to CPU tick interrupts. I'm unsure at which point the kernel actually enables them but they really should have been engaged at the time the hang occurs, otherwise scheduling won't work properly, which also might be an explanation for the hang, i. e. the kernel might never switch to the CAM thread(s). All in all, interrupts in QEMU seem buggy in one way or another. Marius
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150919211420.GK18789>