Date: Thu, 10 Oct 2024 19:43:28 +0000 From: Vasily Postnicov <shamaz.mazum@gmail.com> To: Peter Grehan <grehan@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: Running Mezzano in bhyve Message-ID: <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com> In-Reply-To: <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com> References: <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000004f2cfa06242496a6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I suspect PCI interrupts are not functioning correctly. Look at this code: ;; Attach interrupt handler. (sup:debug-print-line "Handler: " (ahci-irq-handler ahci)) (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location)) (ahci-irq-handler-function ahci) ahci) and this (defun pci-intr-line (device) (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the constant is #x3c I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are not supported by bhyve, is that correct? If it's true, I need either to teach bhyve how to deal with legacy interrupts or to teach Mezzano to understand MSI. What would be easier in your opinion? =D1=87=D1=82, 10 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. =D0=B2 17:12, Vas= ily Postnicov <shamaz.mazum@gmail.com>: > I was able to fix panics in both virtio and AHCI. This is what I found: > > 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to > some IO port in the runtime doing something like (funcall (intern (format > nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error = in > the name of one of the accessors, so FUNCALL tried to call an unbound > symbol, hence the page fault. > 2) AHCI had the following code: > > ;; Magic hacks for Intel devices? > ;; Set port enable bits in Port Control and Status on Intel controllers. > (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086) > (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position= +) > (ahci-global-register ahci > +ahci-register-CAP+)))) > (pcs (pci:pci-config/16 location #x92))) > (setf (pci:pci-config/16 location #x92) (logior pcs > (ash #xFF (- (- 8 > n-ports))))))) > > I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports))) > is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS= =3D > 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output: > > Detected AHCI ABAR at C1002000 > AHCI IRQ is B > Host Capabilities FF30FF25 > Global Host Control 80000000 > Interrupt Status 0 > Ports Implemented 1 > Version 10300 > Command Completion Coalescing Control 0 > Command Completion Coalescing Ports 0 > Enclosure Management Location 0 > Enclosure Management Control 0 > Host Capabilities Extended 4 > BIOS/OS Handoff Control and Status 0 > AHCI HBA version 1.300 > Handler: 0 > Config register: 17 > Port 0 > Waiting for CR/FR to stop. > Allocated port data at 105C33000 > Command List at 105C33000 > Received FIS at 105C33400 > Command Tabl at 105C33500 > Initializing device on port 0 > Command List Base Address 5C33000 > Command List Base Address Upper 32-bits 1 > FIS Base Address 5C33400 > FIS Base Address Upper 32-bits 1 > Interrupt Status 0 > Interrupt Enable 7D80003F > Command and Status 1C017 > Task File Data 50 > Signature 101 > SATA Status (SCR0: SStatus) 133 > SATA Control (SCR2: SControl) 300 > SATA Error (SCR1: SError) 0 > SATA Active (SCR3: SActive) 0 > Command Issue 0 > SATA Notification (SCR4: SNotification) 0 > FIS-based Switching Control 0 > *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** > Command completed. > 105C33600: 28A20040 100000 0 3F > 105C33610: 0 59564248 4644452D 2D413239 > 105C33620: 382D4136 39433646 0 30300000 > 105C33630: 20203120 42482020 45205956 54415341 > 105C33640: 49532044 20204B20 20202020 20202020 > 105C33650: 20202020 20202020 20202020 80802020 > 105C33660: B000000 4000 60000 0 > 105C33670: 0 0 A00000 70000 > 105C33680: 780003 780078 40200078 0 > 105C33690: 0 1F0000 40010E 0 > 105C336A0: 2803F0 74004068 40684000 4000B400 > 105C336B0: 7F 0 0 0 > 105C336C0: 0 0 A00000 0 > 105C336D0: 10000 6008 0 0 > 105C336E0: 0 0 0 40080000 > 105C336F0: 4008 0 0 0 > 105C33700: 0 0 0 0 > 105C33710: 0 0 0 0 > 105C33720: 0 0 0 0 > 105C33730: 0 0 0 0 > 105C33740: 0 0 0 0 > 105C33750: 10000 0 0 0 > 105C33760: 0 0 0 0 > 105C33770: 0 0 0 0 > 105C33780: 0 0 0 0 > 105C33790: 0 0 0 0 > 105C337A0: 40000000 0 0 0 > 105C337B0: 0 0 0 1020 > 105C337C0: 0 0 0 0 > 105C337D0: 0 0 0 0 > 105C337E0: 0 0 0 0 > 105C337F0: 0 0 0 78A50000 > Features (83): 7400 > Sector size: 200 > Sector count: A00000 > Serial: BHYVE-FD29-AA68-6F9C > Model: BHYVE SATA DISK > Registered new R/W disk #<149CAC9> sectors:A00000 > Host Capabilities FF30FF25 > Global Host Control 80000002 > Interrupt Status 1 > Ports Implemented 1 > Version 10300 > Command Completion Coalescing Control 0 > Command Completion Coalescing Ports 0 > Enclosure Management Location 0 > Enclosure Management Control 0 > Host Capabilities Extended 4 > BIOS/OS Handoff Control and Status 0 > PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF > 40: Unknown capability 10 > *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** > *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** > Detected MBR style parition table on disk #<149CAC9> > Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800 > Registered new R/W disk #<149CCD9> sectors:800 > Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800 > Registered new R/W disk #<149CD89> sectors:800 > Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000 > Registered new R/W disk #<149CE39> sectors:9FE000 > Looking for paging disk with UUID > 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 > *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** > Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on > disk #<149CE39> > Found boot image on disk #<149CE39>! > BML4 at -7FFFFFEFD000 > Store freelist block is 2 > > It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED" > messages. For virtio-blk, it's almost the same with an exception that it > hangs completely. I'll try to investigate further. Meanwhile, can you mak= e > any suggestions why those magic intel AHCI controller hacks are required > and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve= ? > --0000000000004f2cfa06242496a6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr">I suspect PCI interrupts are not function= ing correctly.</div><div dir=3D"ltr"><br></div><div>Look at this code:</div= ><div>=C2=A0 =C2=A0 ;; Attach interrupt handler.<br>=C2=A0 =C2=A0 (sup:debu= g-print-line "Handler: " (ahci-irq-handler ahci))<br>=C2=A0 =C2= =A0 (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))<br>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (ahci-ir= q-handler-function ahci)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 ahci)<br></div><div><br></div>and this<div><br></d= iv><div>(defun pci-intr-line (device)<br>=C2=A0 (pci-config/8 device +pci-c= onfig-intr-line+)) ;; comment by me: the constant is #x3c<br></div><div><br= ></div><div>I found that "PCI 0x3c" means PCI interrupt pin. AFAI= K, interrupt pins are not supported by bhyve, is that=C2=A0correct? If it&#= 39;s true, I need either to teach bhyve how to deal with legacy interrupts = or to teach Mezzano to understand MSI. What would be easier in your=C2=A0op= inion?</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"g= mail_attr">=D1=87=D1=82, 10 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. =D0=B2= 17:12, Vasily Postnicov <<a href=3D"mailto:shamaz.mazum@gmail.com">sham= az.mazum@gmail.com</a>>:<br></div><blockquote class=3D"gmail_quote" styl= e=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid= ;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">I wa= s able to fix panics in both virtio and AHCI. This is what=C2=A0I found:<di= v><br></div><div>1) Virtio had a stupid bug, namely Mezzano tried to find a= n accessor to some IO port in the runtime doing something like (funcall (in= tern (format nil "~a-~a" bus-name slot-name)) ...). Surely, the c= reator made an error in the name of one of the accessors, so FUNCALL tried = to call an unbound symbol, hence the page fault.</div><div>2) AHCI had the = following code:</div><div><br></div><div>;; Magic hacks for Intel devices?<= br>;; Set port enable bits in Port Control and Status on Intel controllers.= <br></div>(when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+)= #x8086)<br>=C2=A0 (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-= CAP-NP-position+)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(ahci-global-register ahci +ah= ci-register-CAP+))))<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(pcs (pci:pci-con= fig/16 location #x92)))<br>=C2=A0 =C2=A0 (setf (pci:pci-config/16 location = #x92) (logior pcs<br><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (ash #xFF (- (-= 8 n-ports)))))))</div><div><br></div><div>I checked the value of N-PORTS, = it's 20, so (ash #xff (- (- 8 n-ports))) is=C2=A01044480 which is bigge= r than 2^16-1. I recompiled bhyve with MAX_PORTS =3D 6 in bhyve/pci_ahci.c = and the panic disappeared. Now I have this output:</div><div><br></div><div= >Detected AHCI ABAR at C1002000<br>AHCI IRQ is B<br>Host Capabilities FF30F= F25<br>Global Host Control 80000000<br>Interrupt Status 0<br>Ports Implemen= ted 1<br>Version 10300<br>Command Completion Coalescing Control 0<br>Comman= d Completion Coalescing Ports 0<br>Enclosure Management Location 0<br>Enclo= sure Management Control 0<br>Host Capabilities Extended 4<br>BIOS/OS Handof= f Control and Status 0<br>AHCI HBA version 1.300<br>Handler: 0<br>Config re= gister: 17<br>Port 0<br>Waiting for CR/FR to stop.<br>Allocated port data a= t 105C33000<br>Command List at 105C33000<br>Received FIS at 105C33400<br>Co= mmand Tabl at 105C33500<br>Initializing device on port 0<br>=C2=A0Command L= ist Base Address 5C33000<br>=C2=A0Command List Base Address Upper 32-bits 1= <br>=C2=A0FIS Base Address 5C33400<br>=C2=A0FIS Base Address Upper 32-bits = 1<br>=C2=A0Interrupt Status 0<br>=C2=A0Interrupt Enable 7D80003F<br>=C2=A0C= ommand and Status 1C017<br>=C2=A0Task File Data 50<br>=C2=A0Signature 101<b= r>=C2=A0SATA Status (SCR0: SStatus) 133<br>=C2=A0SATA Control (SCR2: SContr= ol) 300<br>=C2=A0SATA Error (SCR1: SError) 0<br>=C2=A0SATA Active (SCR3: SA= ctive) 0<br>=C2=A0Command Issue 0<br>=C2=A0SATA Notification (SCR4: SNotifi= cation) 0<br>=C2=A0FIS-based Switching Control 0<br>*** AHCI-RUN-COMMAND TI= MEOUT EXPIRED! ***<br>Command completed.<br>105C33600: 28A20040 100000 0 3F= <br>105C33610: 0 59564248 4644452D 2D413239<br>105C33620: 382D4136 39433646= 0 30300000<br>105C33630: 20203120 42482020 45205956 54415341<br>105C33640:= 49532044 20204B20 20202020 20202020<br>105C33650: 20202020 20202020 202020= 20 80802020<br>105C33660: B000000 4000 60000 0<br>105C33670: 0 0 A00000 700= 00<br>105C33680: 780003 780078 40200078 0<br>105C33690: 0 1F0000 40010E 0<b= r>105C336A0: 2803F0 74004068 40684000 4000B400<br>105C336B0: 7F 0 0 0<br>10= 5C336C0: 0 0 A00000 0<br>105C336D0: 10000 6008 0 0<br>105C336E0: 0 0 0 4008= 0000<br>105C336F0: 4008 0 0 0<br>105C33700: 0 0 0 0<br>105C33710: 0 0 0 0<b= r>105C33720: 0 0 0 0<br>105C33730: 0 0 0 0<br>105C33740: 0 0 0 0<br>105C337= 50: 10000 0 0 0<br>105C33760: 0 0 0 0<br>105C33770: 0 0 0 0<br>105C33780: 0= 0 0 0<br>105C33790: 0 0 0 0<br>105C337A0: 40000000 0 0 0<br>105C337B0: 0 0= 0 1020<br>105C337C0: 0 0 0 0<br>105C337D0: 0 0 0 0<br>105C337E0: 0 0 0 0<b= r>105C337F0: 0 0 0 78A50000<br>Features (83): 7400<br>Sector size: 200<br>S= ector count: A00000<br>Serial: BHYVE-FD29-AA68-6F9C<br>Model: BHYVE SATA DI= SK<br>Registered new R/W disk #<149CAC9> sectors:A00000<br>Host Capab= ilities FF30FF25<br>Global Host Control 80000002<br>Interrupt Status 1<br>P= orts Implemented 1<br>Version 10300<br>Command Completion Coalescing Contro= l 0<br>Command Completion Coalescing Ports 0<br>Enclosure Management Locati= on 0<br>Enclosure Management Control 0<br>Host Capabilities Extended 4<br>B= IOS/OS Handoff Control and Status 0<br>PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 = rid: 0 hdr: 0 intr: FF<br>=C2=A0 =C2=A0 40: Unknown capability 10<br>*** AH= CI-RUN-COMMAND TIMEOUT EXPIRED! ***<br></div><div>*** AHCI-RUN-COMMAND TIME= OUT EXPIRED! ***<br></div><div>Detected MBR style parition table on disk #&= lt;149CAC9><br>Detected partition 0 on disk #<149CAC9>. Start: 800= size: 800<br>Registered new R/W disk #<149CCD9> sectors:800<br>Detec= ted partition 1 on disk #<149CAC9>. Start: 1000 size: 800<br>Register= ed new R/W disk #<149CD89> sectors:800<br>Detected partition 2 on dis= k #<149CAC9>. Start: 2000 size: 9FE000<br>Registered new R/W disk #&l= t;149CE39> sectors:9FE000<br>Looking for paging disk with UUID 5C:F6:EE:= 79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10<br></div><div>*** AHCI-RUN-COMMAND T= IMEOUT EXPIRED! ***<br></div>Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:= BA:2B:63:25:C4:1A:5F:10 on disk #<149CE39><br>Found boot image on dis= k #<149CE39>!<br>BML4 at -7FFFFFEFD000<br>Store freelist block is 2<d= iv><br></div><div>It seems it is booting, but very very slowly with those &= quot;TIMEOUT EXPIRED" messages. For virtio-blk, it's almost the sa= me with an exception=C2=A0that=C2=A0it hangs completely. I'll try to in= vestigate further. Meanwhile, can you make any suggestions why those magic = intel AHCI controller hacks are required and why sc->ports can get bigge= r than DEF_PORTS in pci_ahci_init in bhyve?</div></div> </blockquote></div></div></div> --0000000000004f2cfa06242496a6--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ>