Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Sep 2015 00:05:32 +0100
From:      Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        Alexey Dokuchaev <danfe@FreeBSD.org>,  "freebsd-sparc64@freebsd.org" <freebsd-sparc64@freebsd.org>
Subject:   Re: PCI range checking under qemu-system-sparc64
Message-ID:  <55FDEA3C.1010804@ilande.co.uk>
In-Reply-To: <20150919211420.GK18789@alchemy.franken.de>
References:  <55EDFE00.9090109@ilande.co.uk> <20150913022143.GA7862@alchemy.franken.de> <20150913103940.GA60101@FreeBSD.org> <20150913180126.GC7862@alchemy.franken.de> <55F89861.1030107@ilande.co.uk> <20150916031030.GA6711@FreeBSD.org> <55F9C2B8.7030605@ilande.co.uk> <20150916211914.GD18789@alchemy.franken.de> <20150917082817.GA71811@FreeBSD.org> <55FBB662.4080708@ilande.co.uk> <20150919211420.GK18789@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------060504080405090800060005
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit

On 19/09/15 22:14, Marius Strobl wrote:

> On Fri, Sep 18, 2015 at 07:59:46AM +0100, Mark Cave-Ayland wrote:
>> On 17/09/15 09:28, Alexey Dokuchaev wrote:
>>
>>> On Wed, Sep 16, 2015 at 11:19:15PM +0200, Marius Strobl wrote:
>>>> [...]
>>>> Which suggest that the next thing to investigate is the CMD646
>>>> emulation. Is there a particular reason why QEMU emulates a
>>>> CMD646U rather than a plain CMD646 as found in the real sun4u
>>>> machines of the USIIe/i era?
>>>>
>>>> Alexey, does building the port with CDROM_DMA disabled make
>>>> a difference?
>>>
>>> Ironically I had it already disabled prior to your question; but I've
>>> rebuilt the port enabling it for completeness' sake.  It did not make
>>> a difference.
>>>
>>> Then I've disabled all CAM/ATA stuff (scbus, ata, umass, etc.) in the
>>> kernel config and that's what I see now (this is with CDROM_DMA=on):
>>
>> What does the CAM/ATA stuff do here? Does this mean it may not
>> necessarily be an interrupt issue if you can get to mounting the root fs
>> with CDROM_DMA=on?
> 
> I think we are looking at multiple issues here. First off, based
> on the interrupt-map provided by OpenBIOS, we route the intpin of
> the ATA controller to INO 20 (which uses the interrupt mapping
> register at offset 0xc28). If I additionally enable the code for
> debugging interrupt routing problems, which just clears all
> interrupts handle by the Sabre and then in all its interrupt
> mapping registers enables all INOs by writing the valid bit,
> Sabre IGN and CPU module ID there, I see 5 interrupts for INO
> 20 at the time the kernel hangs under QEMU. I don't get any
> stray interrupts for INOs that we're not actively routing.
> Based on what Alexey wrote, he doesn't see interrupts for INO 20,
> i. e. the ATA controller, with a kernel not including the debug
> code. That doesn't make sense to me. Without the debug code, we
> still enable the INOs for which drivers request them when they
> attach to devices the same way as described above, just not
> unconditionally for all interrupts handle by the Sabre. So
> there should be no difference for correctly routed interrupts.
> 
> Second, even when I see interrupts for the ATA controller,
> enumeration of storage devices hangs somehow. On a real
> machine when booting verbose, this looks like (mainly output
> from ata_generic_reset():
> <...>
> IPsec: Initialized Security Association Processing.
> lo0: bpf attached
> ata2: reset tp1 mask=00 ostat0=ff ostat1=ff
> ata3: reset tp1 mask=03 ostat0=50 ostat1=00
> ata3: stat0=0x80 err=0x80 lsb=0x80 msb=0x80
> ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> ata3: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata3: reset tp2 stat0=50 stat1=00 devices=0x20001
> GEOM: new disk cd0
> cd0 at ata3 bus 0 scbus1 target 1 lun 0
> cd0: <TEAC CD-224E 1.7A> Removable CD-ROM SCSI device
> cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> <...>
> 
> With QEMU:
> <...>
> IPsec: Initialized Security Association Processing.
> lo0: bpf attached
> ata2: reset tp1 mask=03 ostat0=00 ostat1=00
> ata2: stat0=0x00 err=0x00 lsb=0x00 msb=0x00
> ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00
> ata2: reset tp2 stat0=00 stat1=00 devices=0x0
> ata3: reset tp1 mask=03 ostat0=50 ostat1=00
> ata3: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata3: stat1=0x00 err=0x00 lsb=0xff msb=0xff
> ata3: reset tp2 stat0=00 stat1=00 devices=0x10000
> <hang>
> <...>
> db> show intrcnt
> pil3: ithrd             5
> vec2004: atapci0        5
> QEMU: Terminated
> <...>
> 
> So after a reset, real iron additionally sets the READY and
> SERVICE bits in the ATA status register but that doesn't
> explain the hang. Code that is waiting for READY to get set
> also uses a timeout.
> 
> I patched QEMU to identify the ATA controller as a plain
> CMD646 one so the UDMA isn't tried in the first place. I
> also limited the ATA mode used by the kernel to PIO4 at
> maximum. Neither made a difference, i. e. the hang still
> occurs.
> 
> Another striking thing is that the interrupt statistics
> show to CPU tick interrupts. I'm unsure at which point
> the kernel actually enables them but they really should
> have been engaged at the time the hang occurs, otherwise
> scheduling won't work properly, which also might be an
> explanation for the hang, i. e. the kernel might never
> switch to the CAM thread(s).
> 
> All in all, interrupts in QEMU seem buggy in one way
> or another.

Thanks for looking into this in detail Marius - plenty of information to
start debugging this further.

While I don't have any insight on the CPU tick interrupt yet, my initial
feeling is that the ATA hang could be related to the PCI interrupt
clearing issue that I started looking into a while back. Although it
isn't a complete fix, does the attached patch against QEMU help at all?
Otherwise it will require a deeper dive into the QEMU interrupt emulation.


ATB,

Mark.


--------------060504080405090800060005
Content-Type: text/x-diff;
 name="apb-no-clear.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="apb-no-clear.patch"

diff --git a/hw/pci-host/apb.c b/hw/pci-host/apb.c
index 599768e..714e654 100644
--- a/hw/pci-host/apb.c
+++ b/hw/pci-host/apb.c
@@ -454,9 +454,10 @@ static void apb_config_writel (void *opaque, hwaddr addr,
         break;
     case 0x1400 ... 0x14ff: /* PCI interrupt clear */
         if (addr & 4) {
-            unsigned int ino = (addr & 0xff) >> 5;
-            if ((s->irq_request / 4)  == ino) {
-                pbm_clear_request(s, s->irq_request);
+            unsigned int ino = (addr & 0xff) >> 3;
+            if (s->irq_request == ino) {
+                s->pci_irq_in &= ~(1ULL << ino);
+                s->irq_request = NO_IRQ_REQUEST;
                 pbm_check_irqs(s);
             }
         }
@@ -465,7 +466,8 @@ static void apb_config_writel (void *opaque, hwaddr addr,
         if (addr & 4) {
             unsigned int ino = ((addr & 0xff) >> 3) | 0x20;
             if (s->irq_request == ino) {
-                pbm_clear_request(s, ino);
+                s->pci_irq_in &= ~(1ULL << ino);
+                s->irq_request = NO_IRQ_REQUEST;
                 pbm_check_irqs(s);
             }
         }

--------------060504080405090800060005--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55FDEA3C.1010804>