Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Oct 2024 19:43:28 +0000
From:      Vasily Postnicov <shamaz.mazum@gmail.com>
To:        Peter Grehan <grehan@freebsd.org>
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: Running Mezzano in bhyve
Message-ID:  <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com>
In-Reply-To: <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>
References:  <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
I suspect PCI interrupts are not functioning correctly.

Look at this code:
    ;; Attach interrupt handler.
    (sup:debug-print-line "Handler: " (ahci-irq-handler ahci))
    (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))
                    (ahci-irq-handler-function ahci)
                    ahci)

and this

(defun pci-intr-line (device)
  (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the
constant is #x3c

I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are
not supported by bhyve, is that correct? If it's true, I need either to
teach bhyve how to deal with legacy interrupts or to teach Mezzano to
understand MSI. What would be easier in your opinion?

чт, 10 окт. 2024 г. в 17:12, Vasily Postnicov <shamaz.mazum@gmail.com>:

> I was able to fix panics in both virtio and AHCI. This is what I found:
>
> 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to
> some IO port in the runtime doing something like (funcall (intern (format
> nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in
> the name of one of the accessors, so FUNCALL tried to call an unbound
> symbol, hence the page fault.
> 2) AHCI had the following code:
>
> ;; Magic hacks for Intel devices?
> ;; Set port enable bits in Port Control and Status on Intel controllers.
> (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)
>   (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)
>                            (ahci-global-register ahci
> +ahci-register-CAP+))))
>          (pcs (pci:pci-config/16 location #x92)))
>     (setf (pci:pci-config/16 location #x92) (logior pcs
>                                                     (ash #xFF (- (- 8
> n-ports)))))))
>
> I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports)))
> is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS =
> 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:
>
> Detected AHCI ABAR at C1002000
> AHCI IRQ is B
> Host Capabilities FF30FF25
> Global Host Control 80000000
> Interrupt Status 0
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> AHCI HBA version 1.300
> Handler: 0
> Config register: 17
> Port 0
> Waiting for CR/FR to stop.
> Allocated port data at 105C33000
> Command List at 105C33000
> Received FIS at 105C33400
> Command Tabl at 105C33500
> Initializing device on port 0
>  Command List Base Address 5C33000
>  Command List Base Address Upper 32-bits 1
>  FIS Base Address 5C33400
>  FIS Base Address Upper 32-bits 1
>  Interrupt Status 0
>  Interrupt Enable 7D80003F
>  Command and Status 1C017
>  Task File Data 50
>  Signature 101
>  SATA Status (SCR0: SStatus) 133
>  SATA Control (SCR2: SControl) 300
>  SATA Error (SCR1: SError) 0
>  SATA Active (SCR3: SActive) 0
>  Command Issue 0
>  SATA Notification (SCR4: SNotification) 0
>  FIS-based Switching Control 0
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Command completed.
> 105C33600: 28A20040 100000 0 3F
> 105C33610: 0 59564248 4644452D 2D413239
> 105C33620: 382D4136 39433646 0 30300000
> 105C33630: 20203120 42482020 45205956 54415341
> 105C33640: 49532044 20204B20 20202020 20202020
> 105C33650: 20202020 20202020 20202020 80802020
> 105C33660: B000000 4000 60000 0
> 105C33670: 0 0 A00000 70000
> 105C33680: 780003 780078 40200078 0
> 105C33690: 0 1F0000 40010E 0
> 105C336A0: 2803F0 74004068 40684000 4000B400
> 105C336B0: 7F 0 0 0
> 105C336C0: 0 0 A00000 0
> 105C336D0: 10000 6008 0 0
> 105C336E0: 0 0 0 40080000
> 105C336F0: 4008 0 0 0
> 105C33700: 0 0 0 0
> 105C33710: 0 0 0 0
> 105C33720: 0 0 0 0
> 105C33730: 0 0 0 0
> 105C33740: 0 0 0 0
> 105C33750: 10000 0 0 0
> 105C33760: 0 0 0 0
> 105C33770: 0 0 0 0
> 105C33780: 0 0 0 0
> 105C33790: 0 0 0 0
> 105C337A0: 40000000 0 0 0
> 105C337B0: 0 0 0 1020
> 105C337C0: 0 0 0 0
> 105C337D0: 0 0 0 0
> 105C337E0: 0 0 0 0
> 105C337F0: 0 0 0 78A50000
> Features (83): 7400
> Sector size: 200
> Sector count: A00000
> Serial: BHYVE-FD29-AA68-6F9C
> Model: BHYVE SATA DISK
> Registered new R/W disk #<149CAC9> sectors:A00000
> Host Capabilities FF30FF25
> Global Host Control 80000002
> Interrupt Status 1
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
>     40: Unknown capability 10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Detected MBR style parition table on disk #<149CAC9>
> Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800
> Registered new R/W disk #<149CCD9> sectors:800
> Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800
> Registered new R/W disk #<149CD89> sectors:800
> Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000
> Registered new R/W disk #<149CE39> sectors:9FE000
> Looking for paging disk with UUID
> 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on
> disk #<149CE39>
> Found boot image on disk #<149CE39>!
> BML4 at -7FFFFFEFD000
> Store freelist block is 2
>
> It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED"
> messages. For virtio-blk, it's almost the same with an exception that it
> hangs completely. I'll try to investigate further. Meanwhile, can you make
> any suggestions why those magic intel AHCI controller hacks are required
> and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?
>

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr">I suspect PCI interrupts are not functioning correctly.</div><div dir="ltr"><br></div><div>Look at this code:</div><div>    ;; Attach interrupt handler.<br>    (sup:debug-print-line &quot;Handler: &quot; (ahci-irq-handler ahci))<br>    (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))<br>                    (ahci-irq-handler-function ahci)<br>                    ahci)<br></div><div><br></div>and this<div><br></div><div>(defun pci-intr-line (device)<br>  (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the constant is #x3c<br></div><div><br></div><div>I found that &quot;PCI 0x3c&quot; means PCI interrupt pin. AFAIK, interrupt pins are not supported by bhyve, is that correct? If it&#39;s true, I need either to teach bhyve how to deal with legacy interrupts or to teach Mezzano to understand MSI. What would be easier in your opinion?</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">чт, 10 окт. 2024 г. в 17:12, Vasily Postnicov &lt;<a href="mailto:shamaz.mazum@gmail.com">shamaz.mazum@gmail.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">I was able to fix panics in both virtio and AHCI. This is what I found:<div><br></div><div>1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to some IO port in the runtime doing something like (funcall (intern (format nil &quot;~a-~a&quot; bus-name slot-name)) ...). Surely, the creator made an error in the name of one of the accessors, so FUNCALL tried to call an unbound symbol, hence the page fault.</div><div>2) AHCI had the following code:</div><div><br></div><div>;; Magic hacks for Intel devices?<br>;; Set port enable bits in Port Control and Status on Intel controllers.<br></div>(when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)<br>  (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)<br>                           (ahci-global-register ahci +ahci-register-CAP+))))<br>         (pcs (pci:pci-config/16 location #x92)))<br>    (setf (pci:pci-config/16 location #x92) (logior pcs<br><div>                                                    (ash #xFF (- (- 8 n-ports)))))))</div><div><br></div><div>I checked the value of N-PORTS, it&#39;s 20, so (ash #xff (- (- 8 n-ports))) is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS = 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:</div><div><br></div><div>Detected AHCI ABAR at C1002000<br>AHCI IRQ is B<br>Host Capabilities FF30FF25<br>Global Host Control 80000000<br>Interrupt Status 0<br>Ports Implemented 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handoff Control and Status 0<br>AHCI HBA version 1.300<br>Handler: 0<br>Config register: 17<br>Port 0<br>Waiting for CR/FR to stop.<br>Allocated port data at 105C33000<br>Command List at 105C33000<br>Received FIS at 105C33400<br>Command Tabl at 105C33500<br>Initializing device on port 0<br> Command List Base Address 5C33000<br> Command List Base Address Upper 32-bits 1<br> FIS Base Address 5C33400<br> FIS Base Address Upper 32-bits 1<br> Interrupt Status 0<br> Interrupt Enable 7D80003F<br> Command and Status 1C017<br> Task File Data 50<br> Signature 101<br> SATA Status (SCR0: SStatus) 133<br> SATA Control (SCR2: SControl) 300<br> SATA Error (SCR1: SError) 0<br> SATA Active (SCR3: SActive) 0<br> Command Issue 0<br> SATA Notification (SCR4: SNotification) 0<br> FIS-based Switching Control 0<br>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br>Command completed.<br>105C33600: 28A20040 100000 0 3F<br>105C33610: 0 59564248 4644452D 2D413239<br>105C33620: 382D4136 39433646 0 30300000<br>105C33630: 20203120 42482020 45205956 54415341<br>105C33640: 49532044 20204B20 20202020 20202020<br>105C33650: 20202020 20202020 20202020 80802020<br>105C33660: B000000 4000 60000 0<br>105C33670: 0 0 A00000 70000<br>105C33680: 780003 780078 40200078 0<br>105C33690: 0 1F0000 40010E 0<br>105C336A0: 2803F0 74004068 40684000 4000B400<br>105C336B0: 7F 0 0 0<br>105C336C0: 0 0 A00000 0<br>105C336D0: 10000 6008 0 0<br>105C336E0: 0 0 0 40080000<br>105C336F0: 4008 0 0 0<br>105C33700: 0 0 0 0<br>105C33710: 0 0 0 0<br>105C33720: 0 0 0 0<br>105C33730: 0 0 0 0<br>105C33740: 0 0 0 0<br>105C33750: 10000 0 0 0<br>105C33760: 0 0 0 0<br>105C33770: 0 0 0 0<br>105C33780: 0 0 0 0<br>105C33790: 0 0 0 0<br>105C337A0: 40000000 0 0 0<br>105C337B0: 0 0 0 1020<br>105C337C0: 0 0 0 0<br>105C337D0: 0 0 0 0<br>105C337E0: 0 0 0 0<br>105C337F0: 0 0 0 78A50000<br>Features (83): 7400<br>Sector size: 200<br>Sector count: A00000<br>Serial: BHYVE-FD29-AA68-6F9C<br>Model: BHYVE SATA DISK<br>Registered new R/W disk #&lt;149CAC9&gt; sectors:A00000<br>Host Capabilities FF30FF25<br>Global Host Control 80000002<br>Interrupt Status 1<br>Ports Implemented 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handoff Control and Status 0<br>PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF<br>    40: Unknown capability 10<br>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>Detected MBR style parition table on disk #&lt;149CAC9&gt;<br>Detected partition 0 on disk #&lt;149CAC9&gt;. Start: 800 size: 800<br>Registered new R/W disk #&lt;149CCD9&gt; sectors:800<br>Detected partition 1 on disk #&lt;149CAC9&gt;. Start: 1000 size: 800<br>Registered new R/W disk #&lt;149CD89&gt; sectors:800<br>Detected partition 2 on disk #&lt;149CAC9&gt;. Start: 2000 size: 9FE000<br>Registered new R/W disk #&lt;149CE39&gt; sectors:9FE000<br>Looking for paging disk with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10<br></div><div>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div>Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on disk #&lt;149CE39&gt;<br>Found boot image on disk #&lt;149CE39&gt;!<br>BML4 at -7FFFFFEFD000<br>Store freelist block is 2<div><br></div><div>It seems it is booting, but very very slowly with those &quot;TIMEOUT EXPIRED&quot; messages. For virtio-blk, it&#39;s almost the same with an exception that it hangs completely. I&#39;ll try to investigate further. Meanwhile, can you make any suggestions why those magic intel AHCI controller hacks are required and why sc-&gt;ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?</div></div>
</blockquote></div></div></div>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ>