Date: Thu, 10 Oct 2024 19:43:28 +0000 From: Vasily Postnicov <shamaz.mazum@gmail.com> To: Peter Grehan <grehan@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: Running Mezzano in bhyve Message-ID: <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com> In-Reply-To: <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com> References: <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
I suspect PCI interrupts are not functioning correctly.
Look at this code:
;; Attach interrupt handler.
(sup:debug-print-line "Handler: " (ahci-irq-handler ahci))
(sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))
(ahci-irq-handler-function ahci)
ahci)
and this
(defun pci-intr-line (device)
(pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the
constant is #x3c
I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are
not supported by bhyve, is that correct? If it's true, I need either to
teach bhyve how to deal with legacy interrupts or to teach Mezzano to
understand MSI. What would be easier in your opinion?
чт, 10 окт. 2024 г. в 17:12, Vasily Postnicov <shamaz.mazum@gmail.com>:
> I was able to fix panics in both virtio and AHCI. This is what I found:
>
> 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to
> some IO port in the runtime doing something like (funcall (intern (format
> nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in
> the name of one of the accessors, so FUNCALL tried to call an unbound
> symbol, hence the page fault.
> 2) AHCI had the following code:
>
> ;; Magic hacks for Intel devices?
> ;; Set port enable bits in Port Control and Status on Intel controllers.
> (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)
> (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)
> (ahci-global-register ahci
> +ahci-register-CAP+))))
> (pcs (pci:pci-config/16 location #x92)))
> (setf (pci:pci-config/16 location #x92) (logior pcs
> (ash #xFF (- (- 8
> n-ports)))))))
>
> I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports)))
> is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS =
> 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:
>
> Detected AHCI ABAR at C1002000
> AHCI IRQ is B
> Host Capabilities FF30FF25
> Global Host Control 80000000
> Interrupt Status 0
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> AHCI HBA version 1.300
> Handler: 0
> Config register: 17
> Port 0
> Waiting for CR/FR to stop.
> Allocated port data at 105C33000
> Command List at 105C33000
> Received FIS at 105C33400
> Command Tabl at 105C33500
> Initializing device on port 0
> Command List Base Address 5C33000
> Command List Base Address Upper 32-bits 1
> FIS Base Address 5C33400
> FIS Base Address Upper 32-bits 1
> Interrupt Status 0
> Interrupt Enable 7D80003F
> Command and Status 1C017
> Task File Data 50
> Signature 101
> SATA Status (SCR0: SStatus) 133
> SATA Control (SCR2: SControl) 300
> SATA Error (SCR1: SError) 0
> SATA Active (SCR3: SActive) 0
> Command Issue 0
> SATA Notification (SCR4: SNotification) 0
> FIS-based Switching Control 0
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Command completed.
> 105C33600: 28A20040 100000 0 3F
> 105C33610: 0 59564248 4644452D 2D413239
> 105C33620: 382D4136 39433646 0 30300000
> 105C33630: 20203120 42482020 45205956 54415341
> 105C33640: 49532044 20204B20 20202020 20202020
> 105C33650: 20202020 20202020 20202020 80802020
> 105C33660: B000000 4000 60000 0
> 105C33670: 0 0 A00000 70000
> 105C33680: 780003 780078 40200078 0
> 105C33690: 0 1F0000 40010E 0
> 105C336A0: 2803F0 74004068 40684000 4000B400
> 105C336B0: 7F 0 0 0
> 105C336C0: 0 0 A00000 0
> 105C336D0: 10000 6008 0 0
> 105C336E0: 0 0 0 40080000
> 105C336F0: 4008 0 0 0
> 105C33700: 0 0 0 0
> 105C33710: 0 0 0 0
> 105C33720: 0 0 0 0
> 105C33730: 0 0 0 0
> 105C33740: 0 0 0 0
> 105C33750: 10000 0 0 0
> 105C33760: 0 0 0 0
> 105C33770: 0 0 0 0
> 105C33780: 0 0 0 0
> 105C33790: 0 0 0 0
> 105C337A0: 40000000 0 0 0
> 105C337B0: 0 0 0 1020
> 105C337C0: 0 0 0 0
> 105C337D0: 0 0 0 0
> 105C337E0: 0 0 0 0
> 105C337F0: 0 0 0 78A50000
> Features (83): 7400
> Sector size: 200
> Sector count: A00000
> Serial: BHYVE-FD29-AA68-6F9C
> Model: BHYVE SATA DISK
> Registered new R/W disk #<149CAC9> sectors:A00000
> Host Capabilities FF30FF25
> Global Host Control 80000002
> Interrupt Status 1
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
> 40: Unknown capability 10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Detected MBR style parition table on disk #<149CAC9>
> Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800
> Registered new R/W disk #<149CCD9> sectors:800
> Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800
> Registered new R/W disk #<149CD89> sectors:800
> Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000
> Registered new R/W disk #<149CE39> sectors:9FE000
> Looking for paging disk with UUID
> 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on
> disk #<149CE39>
> Found boot image on disk #<149CE39>!
> BML4 at -7FFFFFEFD000
> Store freelist block is 2
>
> It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED"
> messages. For virtio-blk, it's almost the same with an exception that it
> hangs completely. I'll try to investigate further. Meanwhile, can you make
> any suggestions why those magic intel AHCI controller hacks are required
> and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?
>
[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr">I suspect PCI interrupts are not functioning correctly.</div><div dir="ltr"><br></div><div>Look at this code:</div><div> ;; Attach interrupt handler.<br> (sup:debug-print-line "Handler: " (ahci-irq-handler ahci))<br> (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))<br> (ahci-irq-handler-function ahci)<br> ahci)<br></div><div><br></div>and this<div><br></div><div>(defun pci-intr-line (device)<br> (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the constant is #x3c<br></div><div><br></div><div>I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are not supported by bhyve, is that correct? If it's true, I need either to teach bhyve how to deal with legacy interrupts or to teach Mezzano to understand MSI. What would be easier in your opinion?</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">чт, 10 окт. 2024 г. в 17:12, Vasily Postnicov <<a href="mailto:shamaz.mazum@gmail.com">shamaz.mazum@gmail.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">I was able to fix panics in both virtio and AHCI. This is what I found:<div><br></div><div>1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to some IO port in the runtime doing something like (funcall (intern (format nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in the name of one of the accessors, so FUNCALL tried to call an unbound symbol, hence the page fault.</div><div>2) AHCI had the following code:</div><div><br></div><div>;; Magic hacks for Intel devices?<br>;; Set port enable bits in Port Control and Status on Intel controllers.<br></div>(when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)<br> (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)<br> (ahci-global-register ahci +ahci-register-CAP+))))<br> (pcs (pci:pci-config/16 location #x92)))<br> (setf (pci:pci-config/16 location #x92) (logior pcs<br><div> (ash #xFF (- (- 8 n-ports)))))))</div><div><br></div><div>I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports))) is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS = 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:</div><div><br></div><div>Detected AHCI ABAR at C1002000<br>AHCI IRQ is B<br>Host Capabilities FF30FF25<br>Global Host Control 80000000<br>Interrupt Status 0<br>Ports Implemented 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handoff Control and Status 0<br>AHCI HBA version 1.300<br>Handler: 0<br>Config register: 17<br>Port 0<br>Waiting for CR/FR to stop.<br>Allocated port data at 105C33000<br>Command List at 105C33000<br>Received FIS at 105C33400<br>Command Tabl at 105C33500<br>Initializing device on port 0<br> Command List Base Address 5C33000<br> Command List Base Address Upper 32-bits 1<br> FIS Base Address 5C33400<br> FIS Base Address Upper 32-bits 1<br> Interrupt Status 0<br> Interrupt Enable 7D80003F<br> Command and Status 1C017<br> Task File Data 50<br> Signature 101<br> SATA Status (SCR0: SStatus) 133<br> SATA Control (SCR2: SControl) 300<br> SATA Error (SCR1: SError) 0<br> SATA Active (SCR3: SActive) 0<br> Command Issue 0<br> SATA Notification (SCR4: SNotification) 0<br> FIS-based Switching Control 0<br>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br>Command completed.<br>105C33600: 28A20040 100000 0 3F<br>105C33610: 0 59564248 4644452D 2D413239<br>105C33620: 382D4136 39433646 0 30300000<br>105C33630: 20203120 42482020 45205956 54415341<br>105C33640: 49532044 20204B20 20202020 20202020<br>105C33650: 20202020 20202020 20202020 80802020<br>105C33660: B000000 4000 60000 0<br>105C33670: 0 0 A00000 70000<br>105C33680: 780003 780078 40200078 0<br>105C33690: 0 1F0000 40010E 0<br>105C336A0: 2803F0 74004068 40684000 4000B400<br>105C336B0: 7F 0 0 0<br>105C336C0: 0 0 A00000 0<br>105C336D0: 10000 6008 0 0<br>105C336E0: 0 0 0 40080000<br>105C336F0: 4008 0 0 0<br>105C33700: 0 0 0 0<br>105C33710: 0 0 0 0<br>105C33720: 0 0 0 0<br>105C33730: 0 0 0 0<br>105C33740: 0 0 0 0<br>105C33750: 10000 0 0 0<br>105C33760: 0 0 0 0<br>105C33770: 0 0 0 0<br>105C33780: 0 0 0 0<br>105C33790: 0 0 0 0<br>105C337A0: 40000000 0 0 0<br>105C337B0: 0 0 0 1020<br>105C337C0: 0 0 0 0<br>105C337D0: 0 0 0 0<br>105C337E0: 0 0 0 0<br>105C337F0: 0 0 0 78A50000<br>Features (83): 7400<br>Sector size: 200<br>Sector count: A00000<br>Serial: BHYVE-FD29-AA68-6F9C<br>Model: BHYVE SATA DISK<br>Registered new R/W disk #<149CAC9> sectors:A00000<br>Host Capabilities FF30FF25<br>Global Host Control 80000002<br>Interrupt Status 1<br>Ports Implemented 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handoff Control and Status 0<br>PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF<br> 40: Unknown capability 10<br>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>Detected MBR style parition table on disk #<149CAC9><br>Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800<br>Registered new R/W disk #<149CCD9> sectors:800<br>Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800<br>Registered new R/W disk #<149CD89> sectors:800<br>Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000<br>Registered new R/W disk #<149CE39> sectors:9FE000<br>Looking for paging disk with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10<br></div><div>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div>Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on disk #<149CE39><br>Found boot image on disk #<149CE39>!<br>BML4 at -7FFFFFEFD000<br>Store freelist block is 2<div><br></div><div>It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED" messages. For virtio-blk, it's almost the same with an exception that it hangs completely. I'll try to investigate further. Meanwhile, can you make any suggestions why those magic intel AHCI controller hacks are required and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?</div></div>
</blockquote></div></div></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ>
