Date: Tue, 15 Oct 2024 05:15:32 +0000 From: Vasily Postnicov <shamaz.mazum@gmail.com> To: Peter Grehan <grehan@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: Running Mezzano in bhyve Message-ID: <CADnZ6BmFty3XKdM4t0vnuBX8%2BrnUSyApW9yvVKnN_s8abCJkOg@mail.gmail.com> In-Reply-To: <106b8500-a0ef-4095-af20-8c0f110ea739@freebsd.org> References: <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com> <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com> <e395fc30-0582-4d51-b1b3-cf5157bdd3a9@freebsd.org> <CADnZ6BmjGzHygqJSNY=wpuy-6Z4YiAMpt-gBx0f%2Bi%2BrXBfBvaQ@mail.gmail.com> <106b8500-a0ef-4095-af20-8c0f110ea739@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000094c86f06247d0be4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Regarding items 3) and 4): 3) Indeed, bhyve does not explicitly forbid writing to 0x3c. I meant the following. The interrupt line is set is pci_emul.c in bhyve: pci_set_cfgdata8(pi, PCIR_INTLINE, pirq_irq(ii->ii_pirq_pin)); Bhyve asserts interrupts with pci_irq_assert in amd64/pci_irq.c. We need this line: vm_isa_assert_irq(pi->pi_vmctx, pirq->reg & PIRQ_IRQ, pi->pi_lintr.ioapic_irq); pirq->reg & PIRQ_IRQ is literally the same as pirq_irq(ii->ii_pirq_pin). Now, if something (e.g. UEFI firmware, bootloader) writes to PCIR_INTLINE bhyve will still send interrupts with the number that was there before the write, while the OS will expect an interrupt with the new number. I treat this as a bug in bhyve (but it affects nobody, because newer OSes do not use the 8259 interrupt controller). 4) It's commenting the lock what makes an effect. I commented pci_generate_msi just in case because it's not needed for Mezzano, but runs protected by the mutex which is now gone. This is a backtrace and thread list when bhyve hangs up if the mutex is not commented out: (lldb) bt * thread #1, name =3D 'mevent', stop reason =3D signal SIGSTOP * frame #0: 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:38 frame #1: 0x000011adeaa479c0 libthr.so.3`__thr_umutex_lock(mtx=3D0x0000378ecca00888, id=3D101223) at thr_umtx.c:79:3 frame #2: 0x000011adeaa40eea libthr.so.3`mutex_lock_sleep(curthread=3D0x0000378ecc412000, m=3D0x0000378ecca00888, abstime=3D0x0000000000000000) at thr_mutex.c:699:9 frame #3: 0x000011adeaa3ed8f libthr.so.3`__Tthr_mutex_lock [inlined] mutex_lock_common(m=3D0x0000378ecca00888, abstime=3D0x0000000000000000, cvattach=3Dfalse, rb_onlist=3Dfalse) at thr_mutex.c:733:9 frame #4: 0x000011adeaa3ed4d libthr.so.3`__Tthr_mutex_lock(mutex=3D<unavailable>) at thr_mutex.c:752:9 frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=3D0x0000378ecc4b8000= , isr=3D'\x01', msix_idx=3D65535) at virtio.h:358:3 frame #6: 0x000011a5c43e6c86 bhyve`vq_interrupt(vs=3D0x0000378ecc4b8000= , vq=3D0x0000378ecc4b8038) at virtio.h:376:2 frame #7: 0x000011a5c43e6c44 bhyve`vq_endchains(vq=3D0x0000378ecc4b8038= , used_all_avail=3D0) at virtio.c:512:3 frame #8: 0x000011a5c43db348 bhyve`pci_vtnet_rx(sc=3D0x0000378ecc4b8000= ) at pci_virtio_net.c:271:4 frame #9: 0x000011a5c43dab53 bhyve`pci_vtnet_rx_callback(fd=3D6, type=3DEVF_READ, param=3D0x0000378ecc4b8000) at pci_virtio_net.c:403:2 frame #10: 0x000011a5c43bb9f8 bhyve`mevent_handle(kev=3D0x000011ade4451200, numev=3D1) at mevent.c:273:3 frame #11: 0x000011a5c43bb5d7 bhyve`mevent_dispatch at mevent.c:549:3 frame #12: 0x000011a5c43aed4b bhyve`main(argc=3D1, argv=3D0x000011ade4453418) at bhyverun.c:1052:2 frame #13: 0x000011adec6c1a6a libc.so.7`__libc_start1(argc=3D24, argv=3D0x000011ade4453360, env=3D0x000011ade4453428, cleanup=3D<unavailable= >, mainX=3D(bhyve`main at bhyverun.c:694)) at libc_start1.c:157:7 frame #14: 0x000011a5c43a80cd bhyve`_start at crt1_s.S:83 (lldb) frame select 5 frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=3D0x0000378ecc4b8000, isr=3D'\x01', msix_idx=3D65535) at virtio.h:358:3 355 if (pci_msix_enabled(vs->vs_pi)) 356 pci_generate_msix(vs->vs_pi, msix_idx); 357 else { -> 358 VS_LOCK(vs); 359 vs->vs_isr |=3D isr; 360 pci_generate_msi(vs->vs_pi, 0); 361 #ifdef __amd64__ (lldb) thread list Process 3185 stopped * thread #1: tid =3D 101223, 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'mevent', stop reason =3D signal SIGSTOP thread #2: tid =3D 101868, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-0', stop reason =3D signal SIGSTOP thread #3: tid =3D 101869, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-1', stop reason =3D signal SIGSTOP thread #4: tid =3D 101870, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-2', stop reason =3D signal SIGSTOP thread #5: tid =3D 101871, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-3', stop reason =3D signal SIGSTOP thread #6: tid =3D 101872, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-4', stop reason =3D signal SIGSTOP thread #7: tid =3D 101873, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-5', stop reason =3D signal SIGSTOP thread #8: tid =3D 101874, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-6', stop reason =3D signal SIGSTOP thread #9: tid =3D 101875, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-7', stop reason =3D signal SIGSTOP thread #10: tid =3D 101876, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err a= t _umtx_op_err.S:38, name =3D 'vtnet-5:0 tx', stop reason =3D signal SIGSTOP thread #11: tid =3D 101877, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err a= t _umtx_op_err.S:38, name =3D 'hda-audio-output', stop reason =3D signal SIGS= TOP thread #12: tid =3D 101878, 0x000011adec7752ea libc.so.7`__sys_accept at _accept.S:4, name =3D 'rfb', stop reason =3D signal SIGSTOP thread #13: tid =3D 101879, 0x000011adec7726aa libc.so.7`__sys_ioctl at ioctl.S:4, name =3D 'vcpu 0', stop reason =3D signal SIGSTOP thread #14: tid =3D 101880, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err a= t _umtx_op_err.S:38, name =3D 'vcpu 1', stop reason =3D signal SIGSTOP I think implementing IOAPIC in MEzzano is the best option indeed, but I have a little experience. I'll see what I can do. =D0=BF=D0=BD, 14 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. =D0=B2 22:52, Pet= er Grehan <grehan@freebsd.org>: > > 1) The problem with PIT. Can be solved as you proposed or by > > patching Mezzano. The bhyve patch would be the best option for that: > it's useful for > other older o/s's (DOS). > > > 2) Mezzano assumes that Intel AHCI controllers report no more than 6 > > ports. Can be solved by patching Mezzano or defining MAX_PORTS to be > > 6 in usr.sbin/bhyve/pci_ahci.c > > A Mezzano patch would be best for that. The bhyve man page has an > example with 8 disks attached so reducing the limit to 6 could hit > existing users. > > > 3) According to > > https://wiki.osdev.org/PCI#Message_Signaled_Interrupts > > <https://wiki.osdev.org/PCI#Message_Signaled_Interrupts>, interrupt > > line config register must be RW. Bhyve does not support writing to > > it. I do not know a correct fix, this [1] workaround helps, however. > > Bhyve does support writing to that - your patch disables that, and my > guess is that when Mezzano sees this as zero (ie invalid) it then looks > for the irq line via the ACPI MADT (or other means). > > A quick look at Mezzano shows that it is still using the 8259 PIC for > interrupts. At the minimum it should be using the IOAPIC, or excessive > interrupt sharing will result, and possibly incorrect behaviour when > this happens. I think IOAPIC support could be added without a large > amount of effort, compared to e.g. MSI/MSI-x. > > > 4) Finally, I had a random deadlock in interrupt handling for the > > virtio-net device. Likewise, I do not know how to fix it correctly, > > but this [2] patch helped. > > Hmmm that seems strange: MSI interrupts aren't generated if they > haven't been setup/enabled by a guest. Commenting out the lock/unlock > code would seem to indicate a larger bug in play. Would it possible to > get some tracing on that segment of code e.g. a dtrace log ? > > > Do you have any ideas how to make proper patches for bhyve from > > these workarounds? > > The first one can be put in a phab diff, which I'll do. I think there's > still some more work involved for the others. > > later, > > Peter. > > > > --00000000000094c86f06247d0be4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">Regarding items 3) and 4):<div><br></div><div>3) Indeed, b= hyve does not explicitly forbid writing to 0x3c. I meant the following. The= interrupt line is set is pci_emul.c in bhyve:</div><div>=C2=A0pci_set_cfgd= ata8(pi, PCIR_INTLINE, pirq_irq(ii->ii_pirq_pin));<br></div><div>Bhyve a= sserts interrupts with pci_irq_assert in amd64/pci_irq.c. We need this line= : vm_isa_assert_irq(pi->pi_vmctx, pirq->reg & PIRQ_IRQ, pi->pi= _lintr.ioapic_irq);</div><div>pirq->reg & PIRQ_IRQ is literally the = same as pirq_irq(ii->ii_pirq_pin). Now, if something (e.g. UEFI firmware= , bootloader) writes to PCIR_INTLINE bhyve will still send interrupts with = the number that was there before the write, while the OS will expect an int= errupt with the new number. I treat this as a bug in bhyve (but it affects = nobody, because newer OSes do not use the 8259 interrupt controller).</div>= <div><br></div><div>4) It's commenting the lock what makes an effect. I= commented pci_generate_msi just in case because it's not needed for Me= zzano, but runs protected by the mutex which is now gone.</div><div>This is= a backtrace and thread list when bhyve hangs up if the mutex is not commen= ted out:</div><div><br></div><div>(lldb) bt<br>* thread #1, name =3D 'm= event', stop reason =3D signal SIGSTOP<br>=C2=A0 * frame #0: 0x000011ad= eaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:38<br>=C2=A0 =C2=A0 fra= me #1: 0x000011adeaa479c0 libthr.so.3`__thr_umutex_lock(mtx=3D0x0000378ecca= 00888, id=3D101223) at thr_umtx.c:79:3<br>=C2=A0 =C2=A0 frame #2: 0x000011a= deaa40eea libthr.so.3`mutex_lock_sleep(curthread=3D0x0000378ecc412000, m=3D= 0x0000378ecca00888, abstime=3D0x0000000000000000) at thr_mutex.c:699:9<br>= =C2=A0 =C2=A0 frame #3: 0x000011adeaa3ed8f libthr.so.3`__Tthr_mutex_lock [i= nlined] mutex_lock_common(m=3D0x0000378ecca00888, abstime=3D0x0000000000000= 000, cvattach=3Dfalse, rb_onlist=3Dfalse) at thr_mutex.c:733:9<br>=C2=A0 = =C2=A0 frame #4: 0x000011adeaa3ed4d libthr.so.3`__Tthr_mutex_lock(mutex=3D&= lt;unavailable>) at thr_mutex.c:752:9<br>=C2=A0 =C2=A0 frame #5: 0x00001= 1a5c43e7b06 bhyve`vi_interrupt(vs=3D0x0000378ecc4b8000, isr=3D'\x01'= ;, msix_idx=3D65535) at virtio.h:358:3<br>=C2=A0 =C2=A0 frame #6: 0x000011a= 5c43e6c86 bhyve`vq_interrupt(vs=3D0x0000378ecc4b8000, vq=3D0x0000378ecc4b80= 38) at virtio.h:376:2<br>=C2=A0 =C2=A0 frame #7: 0x000011a5c43e6c44 bhyve`v= q_endchains(vq=3D0x0000378ecc4b8038, used_all_avail=3D0) at virtio.c:512:3<= br>=C2=A0 =C2=A0 frame #8: 0x000011a5c43db348 bhyve`pci_vtnet_rx(sc=3D0x000= 0378ecc4b8000) at pci_virtio_net.c:271:4<br>=C2=A0 =C2=A0 frame #9: 0x00001= 1a5c43dab53 bhyve`pci_vtnet_rx_callback(fd=3D6, type=3DEVF_READ, param=3D0x= 0000378ecc4b8000) at pci_virtio_net.c:403:2<br>=C2=A0 =C2=A0 frame #10: 0x0= 00011a5c43bb9f8 bhyve`mevent_handle(kev=3D0x000011ade4451200, numev=3D1) at= mevent.c:273:3<br>=C2=A0 =C2=A0 frame #11: 0x000011a5c43bb5d7 bhyve`mevent= _dispatch at mevent.c:549:3<br>=C2=A0 =C2=A0 frame #12: 0x000011a5c43aed4b = bhyve`main(argc=3D1, argv=3D0x000011ade4453418) at bhyverun.c:1052:2<br>=C2= =A0 =C2=A0 frame #13: 0x000011adec6c1a6a libc.so.7`__libc_start1(argc=3D24,= argv=3D0x000011ade4453360, env=3D0x000011ade4453428, cleanup=3D<unavail= able>, mainX=3D(bhyve`main at bhyverun.c:694)) at libc_start1.c:157:7<br= >=C2=A0 =C2=A0 frame #14: 0x000011a5c43a80cd bhyve`_start at crt1_s.S:83<br= ></div><div><br></div><div>(lldb) frame select 5<br>frame #5: 0x000011a5c43= e7b06 bhyve`vi_interrupt(vs=3D0x0000378ecc4b8000, isr=3D'\x01', msi= x_idx=3D65535) at virtio.h:358:3<br>=C2=A0 =C2=A0355 if (pci_msix_enabled= (vs->vs_pi))<br>=C2=A0 =C2=A0356 pci_generate_msix(vs->vs_pi, msix= _idx);<br>=C2=A0 =C2=A0357 else {<br>-> 358 VS_LOCK(vs);<br>=C2=A0 = =C2=A0359 vs->vs_isr |=3D isr;<br>=C2=A0 =C2=A0360 pci_generate_ms= i(vs->vs_pi, 0);<br>=C2=A0 =C2=A0361 #ifdef __amd64__<br></div><div><br= ></div><div>(lldb) thread list<br>Process 3185 stopped<br>* thread #1: tid = =3D 101223, 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:3= 8, name =3D 'mevent', stop reason =3D signal SIGSTOP<br>=C2=A0 thre= ad #2: tid =3D 101868, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx= _op_err.S:38, name =3D 'blk-3:0-0', stop reason =3D signal SIGSTOP<= br>=C2=A0 thread #3: tid =3D 101869, 0x000011adeaa37e2c libthr.so.3`_umtx_o= p_err at _umtx_op_err.S:38, name =3D 'blk-3:0-1', stop reason =3D s= ignal SIGSTOP<br>=C2=A0 thread #4: tid =3D 101870, 0x000011adeaa37e2c libth= r.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-2', sto= p reason =3D signal SIGSTOP<br>=C2=A0 thread #5: tid =3D 101871, 0x000011ad= eaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3= :0-3', stop reason =3D signal SIGSTOP<br>=C2=A0 thread #6: tid =3D 1018= 72, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = =3D 'blk-3:0-4', stop reason =3D signal SIGSTOP<br>=C2=A0 thread #7= : tid =3D 101873, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_e= rr.S:38, name =3D 'blk-3:0-5', stop reason =3D signal SIGSTOP<br>= =C2=A0 thread #8: tid =3D 101874, 0x000011adeaa37e2c libthr.so.3`_umtx_op_e= rr at _umtx_op_err.S:38, name =3D 'blk-3:0-6', stop reason =3D sign= al SIGSTOP<br>=C2=A0 thread #9: tid =3D 101875, 0x000011adeaa37e2c libthr.s= o.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'blk-3:0-7', stop r= eason =3D signal SIGSTOP<br>=C2=A0 thread #10: tid =3D 101876, 0x000011adea= a37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name =3D 'vtnet-5= :0 tx', stop reason =3D signal SIGSTOP<br>=C2=A0 thread #11: tid =3D 10= 1877, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, nam= e =3D 'hda-audio-output', stop reason =3D signal SIGSTOP<br>=C2=A0 = thread #12: tid =3D 101878, 0x000011adec7752ea libc.so.7`__sys_accept at _a= ccept.S:4, name =3D 'rfb', stop reason =3D signal SIGSTOP<br>=C2=A0= thread #13: tid =3D 101879, 0x000011adec7726aa libc.so.7`__sys_ioctl at io= ctl.S:4, name =3D 'vcpu 0', stop reason =3D signal SIGSTOP<br>=C2= =A0 thread #14: tid =3D 101880, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err= at _umtx_op_err.S:38, name =3D 'vcpu 1', stop reason =3D signal SI= GSTOP<br></div><div><br></div><div>I think implementing IOAPIC in MEzzano i= s the best option indeed, but I have a little experience. I'll see what= I can do.</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class= =3D"gmail_attr">=D0=BF=D0=BD, 14 =D0=BE=D0=BA=D1=82. 2024=E2=80=AF=D0=B3. = =D0=B2 22:52, Peter Grehan <<a href=3D"mailto:grehan@freebsd.org">grehan= @freebsd.org</a>>:<br></div><blockquote class=3D"gmail_quote" style=3D"m= argin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;borde= r-left-color:rgb(204,204,204);padding-left:1ex">> 1) The problem with PI= T. Can be solved as you proposed or by<br> > patching Mezzano. The bhyve patch would be the best option for that: i= t's useful for<br> other older o/s's (DOS).<br> <br> > 2) Mezzano assumes that Intel AHCI controllers report no more than 6 <= br> > ports. Can be solved by patching Mezzano or defining MAX_PORTS to be <= br> > 6 in usr.sbin/bhyve/pci_ahci.c<br> <br> =C2=A0 A Mezzano patch would be best for that. The bhyve man page has an<br= > example with 8 disks attached so reducing the limit to 6 could hit<br> existing users.<br> <br> > 3) According to <br> > <a href=3D"https://wiki.osdev.org/PCI#Message_Signaled_Interrupts" rel= =3D"noreferrer" target=3D"_blank">https://wiki.osdev.org/PCI#Message_Signal= ed_Interrupts</a> <br> > <<a href=3D"https://wiki.osdev.org/PCI#Message_Signaled_Interrupts"= rel=3D"noreferrer" target=3D"_blank">https://wiki.osdev.org/PCI#Message_Si= gnaled_Interrupts</a>>, interrupt <br> > line config register must be RW. Bhyve does not support writing to <br= > > it. I do not know a correct fix, this [1] workaround helps, however.<b= r> <br> =C2=A0 Bhyve does support writing to that - your patch disables that, and m= y<br> guess is that when Mezzano sees this as zero (ie invalid) it then looks<br> for the irq line via the ACPI MADT (or other means).<br> <br> =C2=A0 A quick look at Mezzano shows that it is still using the 8259 PIC fo= r<br> interrupts. At the minimum it should be using the IOAPIC, or excessive<br> interrupt sharing will result, and possibly incorrect behaviour when<br> this happens. I think IOAPIC support could be added without a large<br> amount of effort, compared to e.g. MSI/MSI-x.<br> <br> > 4) Finally, I had a random deadlock in interrupt handling for the <br> > virtio-net device. Likewise, I do not know how to fix it correctly, <b= r> > but this [2] patch helped.<br> <br> =C2=A0 Hmmm that seems strange: MSI interrupts aren't generated if they= <br> haven't been setup/enabled by a guest. Commenting out the lock/unlock<b= r> code would seem to indicate a larger bug in play. Would it possible to<br> get some tracing on that segment of code e.g. a dtrace log ?<br> <br> > Do you have any ideas how to make proper patches for bhyve from<br> > these workarounds?<br> <br> =C2=A0 The first one can be put in a phab diff, which I'll do. I think = there's<br> still some more work involved for the others.<br> <br> later,<br> <br> Peter.<br> <br> <br> <br> </blockquote></div> --00000000000094c86f06247d0be4--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BmFty3XKdM4t0vnuBX8%2BrnUSyApW9yvVKnN_s8abCJkOg>