Date: Sun, 20 Sep 2015 00:05:32 +0100 From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> To: Marius Strobl <marius@alchemy.franken.de> Cc: Alexey Dokuchaev <danfe@FreeBSD.org>, "freebsd-sparc64@freebsd.org" <freebsd-sparc64@freebsd.org> Subject: Re: PCI range checking under qemu-system-sparc64 Message-ID: <55FDEA3C.1010804@ilande.co.uk> In-Reply-To: <20150919211420.GK18789@alchemy.franken.de> References: <55EDFE00.9090109@ilande.co.uk> <20150913022143.GA7862@alchemy.franken.de> <20150913103940.GA60101@FreeBSD.org> <20150913180126.GC7862@alchemy.franken.de> <55F89861.1030107@ilande.co.uk> <20150916031030.GA6711@FreeBSD.org> <55F9C2B8.7030605@ilande.co.uk> <20150916211914.GD18789@alchemy.franken.de> <20150917082817.GA71811@FreeBSD.org> <55FBB662.4080708@ilande.co.uk> <20150919211420.GK18789@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------060504080405090800060005 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 19/09/15 22:14, Marius Strobl wrote: > On Fri, Sep 18, 2015 at 07:59:46AM +0100, Mark Cave-Ayland wrote: >> On 17/09/15 09:28, Alexey Dokuchaev wrote: >> >>> On Wed, Sep 16, 2015 at 11:19:15PM +0200, Marius Strobl wrote: >>>> [...] >>>> Which suggest that the next thing to investigate is the CMD646 >>>> emulation. Is there a particular reason why QEMU emulates a >>>> CMD646U rather than a plain CMD646 as found in the real sun4u >>>> machines of the USIIe/i era? >>>> >>>> Alexey, does building the port with CDROM_DMA disabled make >>>> a difference? >>> >>> Ironically I had it already disabled prior to your question; but I've >>> rebuilt the port enabling it for completeness' sake. It did not make >>> a difference. >>> >>> Then I've disabled all CAM/ATA stuff (scbus, ata, umass, etc.) in the >>> kernel config and that's what I see now (this is with CDROM_DMA=on): >> >> What does the CAM/ATA stuff do here? Does this mean it may not >> necessarily be an interrupt issue if you can get to mounting the root fs >> with CDROM_DMA=on? > > I think we are looking at multiple issues here. First off, based > on the interrupt-map provided by OpenBIOS, we route the intpin of > the ATA controller to INO 20 (which uses the interrupt mapping > register at offset 0xc28). If I additionally enable the code for > debugging interrupt routing problems, which just clears all > interrupts handle by the Sabre and then in all its interrupt > mapping registers enables all INOs by writing the valid bit, > Sabre IGN and CPU module ID there, I see 5 interrupts for INO > 20 at the time the kernel hangs under QEMU. I don't get any > stray interrupts for INOs that we're not actively routing. > Based on what Alexey wrote, he doesn't see interrupts for INO 20, > i. e. the ATA controller, with a kernel not including the debug > code. That doesn't make sense to me. Without the debug code, we > still enable the INOs for which drivers request them when they > attach to devices the same way as described above, just not > unconditionally for all interrupts handle by the Sabre. So > there should be no difference for correctly routed interrupts. > > Second, even when I see interrupts for the ATA controller, > enumeration of storage devices hangs somehow. On a real > machine when booting verbose, this looks like (mainly output > from ata_generic_reset(): > <...> > IPsec: Initialized Security Association Processing. > lo0: bpf attached > ata2: reset tp1 mask=00 ostat0=ff ostat1=ff > ata3: reset tp1 mask=03 ostat0=50 ostat1=00 > ata3: stat0=0x80 err=0x80 lsb=0x80 msb=0x80 > ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 > ata3: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb > ata3: reset tp2 stat0=50 stat1=00 devices=0x20001 > GEOM: new disk cd0 > cd0 at ata3 bus 0 scbus1 target 1 lun 0 > cd0: <TEAC CD-224E 1.7A> Removable CD-ROM SCSI device > cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes) > cd0: Attempt to query device size failed: NOT READY, Medium not present > <...> > > With QEMU: > <...> > IPsec: Initialized Security Association Processing. > lo0: bpf attached > ata2: reset tp1 mask=03 ostat0=00 ostat1=00 > ata2: stat0=0x00 err=0x00 lsb=0x00 msb=0x00 > ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00 > ata2: reset tp2 stat0=00 stat1=00 devices=0x0 > ata3: reset tp1 mask=03 ostat0=50 ostat1=00 > ata3: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb > ata3: stat1=0x00 err=0x00 lsb=0xff msb=0xff > ata3: reset tp2 stat0=00 stat1=00 devices=0x10000 > <hang> > <...> > db> show intrcnt > pil3: ithrd 5 > vec2004: atapci0 5 > QEMU: Terminated > <...> > > So after a reset, real iron additionally sets the READY and > SERVICE bits in the ATA status register but that doesn't > explain the hang. Code that is waiting for READY to get set > also uses a timeout. > > I patched QEMU to identify the ATA controller as a plain > CMD646 one so the UDMA isn't tried in the first place. I > also limited the ATA mode used by the kernel to PIO4 at > maximum. Neither made a difference, i. e. the hang still > occurs. > > Another striking thing is that the interrupt statistics > show to CPU tick interrupts. I'm unsure at which point > the kernel actually enables them but they really should > have been engaged at the time the hang occurs, otherwise > scheduling won't work properly, which also might be an > explanation for the hang, i. e. the kernel might never > switch to the CAM thread(s). > > All in all, interrupts in QEMU seem buggy in one way > or another. Thanks for looking into this in detail Marius - plenty of information to start debugging this further. While I don't have any insight on the CPU tick interrupt yet, my initial feeling is that the ATA hang could be related to the PCI interrupt clearing issue that I started looking into a while back. Although it isn't a complete fix, does the attached patch against QEMU help at all? Otherwise it will require a deeper dive into the QEMU interrupt emulation. ATB, Mark. --------------060504080405090800060005 Content-Type: text/x-diff; name="apb-no-clear.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="apb-no-clear.patch" diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c index 599768e..714e654 100644 --- a/hw/pci-host/apb.c +++ b/hw/pci-host/apb.c @@ -454,9 +454,10 @@ static void apb_config_writel (void *opaque, hwaddr addr, break; case 0x1400 ... 0x14ff: /* PCI interrupt clear */ if (addr & 4) { - unsigned int ino = (addr & 0xff) >> 5; - if ((s->irq_request / 4) == ino) { - pbm_clear_request(s, s->irq_request); + unsigned int ino = (addr & 0xff) >> 3; + if (s->irq_request == ino) { + s->pci_irq_in &= ~(1ULL << ino); + s->irq_request = NO_IRQ_REQUEST; pbm_check_irqs(s); } } @@ -465,7 +466,8 @@ static void apb_config_writel (void *opaque, hwaddr addr, if (addr & 4) { unsigned int ino = ((addr & 0xff) >> 3) | 0x20; if (s->irq_request == ino) { - pbm_clear_request(s, ino); + s->pci_irq_in &= ~(1ULL << ino); + s->irq_request = NO_IRQ_REQUEST; pbm_check_irqs(s); } } --------------060504080405090800060005--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55FDEA3C.1010804>