Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Oct 2024 19:43:28 +0000
From:      Vasily Postnicov <shamaz.mazum@gmail.com>
To:        Peter Grehan <grehan@freebsd.org>
Cc:        freebsd-virtualization@freebsd.org
Subject:   Re: Running Mezzano in bhyve
Message-ID:  <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com>
In-Reply-To: <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>
References:  <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000004f2cfa06242496a6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I suspect PCI interrupts are not functioning correctly.

Look at this code:
    ;; Attach interrupt handler.
    (sup:debug-print-line "Handler: " (ahci-irq-handler ahci))
    (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))
                    (ahci-irq-handler-function ahci)
                    ahci)

and this

(defun pci-intr-line (device)
  (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the
constant is #x3c

I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are
not supported by bhyve, is that correct? If it's true, I need either to
teach bhyve how to deal with legacy interrupts or to teach Mezzano to
understand MSI. What would be easier in your opinion?

=D1=87=D1=82, 10 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. =D0=B2 17:12, Vas=
ily Postnicov <shamaz.mazum@gmail.com>:

> I was able to fix panics in both virtio and AHCI. This is what I found:
>
> 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to
> some IO port in the runtime doing something like (funcall (intern (format
> nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error =
in
> the name of one of the accessors, so FUNCALL tried to call an unbound
> symbol, hence the page fault.
> 2) AHCI had the following code:
>
> ;; Magic hacks for Intel devices?
> ;; Set port enable bits in Port Control and Status on Intel controllers.
> (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)
>   (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position=
+)
>                            (ahci-global-register ahci
> +ahci-register-CAP+))))
>          (pcs (pci:pci-config/16 location #x92)))
>     (setf (pci:pci-config/16 location #x92) (logior pcs
>                                                     (ash #xFF (- (- 8
> n-ports)))))))
>
> I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports)))
> is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS=
 =3D
> 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:
>
> Detected AHCI ABAR at C1002000
> AHCI IRQ is B
> Host Capabilities FF30FF25
> Global Host Control 80000000
> Interrupt Status 0
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> AHCI HBA version 1.300
> Handler: 0
> Config register: 17
> Port 0
> Waiting for CR/FR to stop.
> Allocated port data at 105C33000
> Command List at 105C33000
> Received FIS at 105C33400
> Command Tabl at 105C33500
> Initializing device on port 0
>  Command List Base Address 5C33000
>  Command List Base Address Upper 32-bits 1
>  FIS Base Address 5C33400
>  FIS Base Address Upper 32-bits 1
>  Interrupt Status 0
>  Interrupt Enable 7D80003F
>  Command and Status 1C017
>  Task File Data 50
>  Signature 101
>  SATA Status (SCR0: SStatus) 133
>  SATA Control (SCR2: SControl) 300
>  SATA Error (SCR1: SError) 0
>  SATA Active (SCR3: SActive) 0
>  Command Issue 0
>  SATA Notification (SCR4: SNotification) 0
>  FIS-based Switching Control 0
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Command completed.
> 105C33600: 28A20040 100000 0 3F
> 105C33610: 0 59564248 4644452D 2D413239
> 105C33620: 382D4136 39433646 0 30300000
> 105C33630: 20203120 42482020 45205956 54415341
> 105C33640: 49532044 20204B20 20202020 20202020
> 105C33650: 20202020 20202020 20202020 80802020
> 105C33660: B000000 4000 60000 0
> 105C33670: 0 0 A00000 70000
> 105C33680: 780003 780078 40200078 0
> 105C33690: 0 1F0000 40010E 0
> 105C336A0: 2803F0 74004068 40684000 4000B400
> 105C336B0: 7F 0 0 0
> 105C336C0: 0 0 A00000 0
> 105C336D0: 10000 6008 0 0
> 105C336E0: 0 0 0 40080000
> 105C336F0: 4008 0 0 0
> 105C33700: 0 0 0 0
> 105C33710: 0 0 0 0
> 105C33720: 0 0 0 0
> 105C33730: 0 0 0 0
> 105C33740: 0 0 0 0
> 105C33750: 10000 0 0 0
> 105C33760: 0 0 0 0
> 105C33770: 0 0 0 0
> 105C33780: 0 0 0 0
> 105C33790: 0 0 0 0
> 105C337A0: 40000000 0 0 0
> 105C337B0: 0 0 0 1020
> 105C337C0: 0 0 0 0
> 105C337D0: 0 0 0 0
> 105C337E0: 0 0 0 0
> 105C337F0: 0 0 0 78A50000
> Features (83): 7400
> Sector size: 200
> Sector count: A00000
> Serial: BHYVE-FD29-AA68-6F9C
> Model: BHYVE SATA DISK
> Registered new R/W disk #<149CAC9> sectors:A00000
> Host Capabilities FF30FF25
> Global Host Control 80000002
> Interrupt Status 1
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
>     40: Unknown capability 10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Detected MBR style parition table on disk #<149CAC9>
> Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800
> Registered new R/W disk #<149CCD9> sectors:800
> Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800
> Registered new R/W disk #<149CD89> sectors:800
> Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000
> Registered new R/W disk #<149CE39> sectors:9FE000
> Looking for paging disk with UUID
> 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on
> disk #<149CE39>
> Found boot image on disk #<149CE39>!
> BML4 at -7FFFFFEFD000
> Store freelist block is 2
>
> It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED"
> messages. For virtio-blk, it's almost the same with an exception that it
> hangs completely. I'll try to investigate further. Meanwhile, can you mak=
e
> any suggestions why those magic intel AHCI controller hacks are required
> and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve=
?
>

--0000000000004f2cfa06242496a6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">I suspect PCI interrupts are not function=
ing correctly.</div><div dir=3D"ltr"><br></div><div>Look at this code:</div=
><div>=C2=A0 =C2=A0 ;; Attach interrupt handler.<br>=C2=A0 =C2=A0 (sup:debu=
g-print-line &quot;Handler: &quot; (ahci-irq-handler ahci))<br>=C2=A0 =C2=
=A0 (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (ahci-ir=
q-handler-function ahci)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 ahci)<br></div><div><br></div>and this<div><br></d=
iv><div>(defun pci-intr-line (device)<br>=C2=A0 (pci-config/8 device +pci-c=
onfig-intr-line+)) ;; comment by me: the constant is #x3c<br></div><div><br=
></div><div>I found that &quot;PCI 0x3c&quot; means PCI interrupt pin. AFAI=
K, interrupt pins are not supported by bhyve, is that=C2=A0correct? If it&#=
39;s true, I need either to teach bhyve how to deal with legacy interrupts =
or to teach Mezzano to understand MSI. What would be easier in your=C2=A0op=
inion?</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"g=
mail_attr">=D1=87=D1=82, 10 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. =D0=B2=
 17:12, Vasily Postnicov &lt;<a href=3D"mailto:shamaz.mazum@gmail.com">sham=
az.mazum@gmail.com</a>&gt;:<br></div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid=
;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">I wa=
s able to fix panics in both virtio and AHCI. This is what=C2=A0I found:<di=
v><br></div><div>1) Virtio had a stupid bug, namely Mezzano tried to find a=
n accessor to some IO port in the runtime doing something like (funcall (in=
tern (format nil &quot;~a-~a&quot; bus-name slot-name)) ...). Surely, the c=
reator made an error in the name of one of the accessors, so FUNCALL tried =
to call an unbound symbol, hence the page fault.</div><div>2) AHCI had the =
following code:</div><div><br></div><div>;; Magic hacks for Intel devices?<=
br>;; Set port enable bits in Port Control and Status on Intel controllers.=
<br></div>(when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+)=
 #x8086)<br>=C2=A0 (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-=
CAP-NP-position+)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(ahci-global-register ahci +ah=
ci-register-CAP+))))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(pcs (pci:pci-con=
fig/16 location #x92)))<br>=C2=A0 =C2=A0 (setf (pci:pci-config/16 location =
#x92) (logior pcs<br><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (ash #xFF (- (-=
 8 n-ports)))))))</div><div><br></div><div>I checked the value of N-PORTS, =
it&#39;s 20, so (ash #xff (- (- 8 n-ports))) is=C2=A01044480 which is bigge=
r than 2^16-1. I recompiled bhyve with MAX_PORTS =3D 6 in bhyve/pci_ahci.c =
and the panic disappeared. Now I have this output:</div><div><br></div><div=
>Detected AHCI ABAR at C1002000<br>AHCI IRQ is B<br>Host Capabilities FF30F=
F25<br>Global Host Control 80000000<br>Interrupt Status 0<br>Ports Implemen=
ted 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Comman=
d Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclo=
sure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handof=
f Control and Status 0<br>AHCI HBA version 1.300<br>Handler: 0<br>Config re=
gister: 17<br>Port 0<br>Waiting for CR/FR to stop.<br>Allocated port data a=
t 105C33000<br>Command List at 105C33000<br>Received FIS at 105C33400<br>Co=
mmand Tabl at 105C33500<br>Initializing device on port 0<br>=C2=A0Command L=
ist Base Address 5C33000<br>=C2=A0Command List Base Address Upper 32-bits 1=
<br>=C2=A0FIS Base Address 5C33400<br>=C2=A0FIS Base Address Upper 32-bits =
1<br>=C2=A0Interrupt Status 0<br>=C2=A0Interrupt Enable 7D80003F<br>=C2=A0C=
ommand and Status 1C017<br>=C2=A0Task File Data 50<br>=C2=A0Signature 101<b=
r>=C2=A0SATA Status (SCR0: SStatus) 133<br>=C2=A0SATA Control (SCR2: SContr=
ol) 300<br>=C2=A0SATA Error (SCR1: SError) 0<br>=C2=A0SATA Active (SCR3: SA=
ctive) 0<br>=C2=A0Command Issue 0<br>=C2=A0SATA Notification (SCR4: SNotifi=
cation) 0<br>=C2=A0FIS-based Switching Control 0<br>*** AHCI-RUN-COMMAND TI=
MEOUT EXPIRED! ***<br>Command completed.<br>105C33600: 28A20040 100000 0 3F=
<br>105C33610: 0 59564248 4644452D 2D413239<br>105C33620: 382D4136 39433646=
 0 30300000<br>105C33630: 20203120 42482020 45205956 54415341<br>105C33640:=
 49532044 20204B20 20202020 20202020<br>105C33650: 20202020 20202020 202020=
20 80802020<br>105C33660: B000000 4000 60000 0<br>105C33670: 0 0 A00000 700=
00<br>105C33680: 780003 780078 40200078 0<br>105C33690: 0 1F0000 40010E 0<b=
r>105C336A0: 2803F0 74004068 40684000 4000B400<br>105C336B0: 7F 0 0 0<br>10=
5C336C0: 0 0 A00000 0<br>105C336D0: 10000 6008 0 0<br>105C336E0: 0 0 0 4008=
0000<br>105C336F0: 4008 0 0 0<br>105C33700: 0 0 0 0<br>105C33710: 0 0 0 0<b=
r>105C33720: 0 0 0 0<br>105C33730: 0 0 0 0<br>105C33740: 0 0 0 0<br>105C337=
50: 10000 0 0 0<br>105C33760: 0 0 0 0<br>105C33770: 0 0 0 0<br>105C33780: 0=
 0 0 0<br>105C33790: 0 0 0 0<br>105C337A0: 40000000 0 0 0<br>105C337B0: 0 0=
 0 1020<br>105C337C0: 0 0 0 0<br>105C337D0: 0 0 0 0<br>105C337E0: 0 0 0 0<b=
r>105C337F0: 0 0 0 78A50000<br>Features (83): 7400<br>Sector size: 200<br>S=
ector count: A00000<br>Serial: BHYVE-FD29-AA68-6F9C<br>Model: BHYVE SATA DI=
SK<br>Registered new R/W disk #&lt;149CAC9&gt; sectors:A00000<br>Host Capab=
ilities FF30FF25<br>Global Host Control 80000002<br>Interrupt Status 1<br>P=
orts Implemented 1<br>Version 10300<br>Command Completion Coalescing Contro=
l 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Locati=
on 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>B=
IOS/OS Handoff Control and Status 0<br>PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 =
rid: 0 hdr: 0 intr: FF<br>=C2=A0 =C2=A0 40: Unknown capability 10<br>*** AH=
CI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>*** AHCI-RUN-COMMAND TIME=
OUT EXPIRED! ***<br></div><div>Detected MBR style parition table on disk #&=
lt;149CAC9&gt;<br>Detected partition 0 on disk #&lt;149CAC9&gt;. Start: 800=
 size: 800<br>Registered new R/W disk #&lt;149CCD9&gt; sectors:800<br>Detec=
ted partition 1 on disk #&lt;149CAC9&gt;. Start: 1000 size: 800<br>Register=
ed new R/W disk #&lt;149CD89&gt; sectors:800<br>Detected partition 2 on dis=
k #&lt;149CAC9&gt;. Start: 2000 size: 9FE000<br>Registered new R/W disk #&l=
t;149CE39&gt; sectors:9FE000<br>Looking for paging disk with UUID 5C:F6:EE:=
79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10<br></div><div>*** AHCI-RUN-COMMAND T=
IMEOUT EXPIRED! ***<br></div>Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:=
BA:2B:63:25:C4:1A:5F:10 on disk #&lt;149CE39&gt;<br>Found boot image on dis=
k #&lt;149CE39&gt;!<br>BML4 at -7FFFFFEFD000<br>Store freelist block is 2<d=
iv><br></div><div>It seems it is booting, but very very slowly with those &=
quot;TIMEOUT EXPIRED&quot; messages. For virtio-blk, it&#39;s almost the sa=
me with an exception=C2=A0that=C2=A0it hangs completely. I&#39;ll try to in=
vestigate further. Meanwhile, can you make any suggestions why those magic =
intel AHCI controller hacks are required and why sc-&gt;ports can get bigge=
r than DEF_PORTS in pci_ahci_init in bhyve?</div></div>
</blockquote></div></div></div>

--0000000000004f2cfa06242496a6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ>