Date: Tue, 15 Oct 2024 05:15:32 +0000 From: Vasily Postnicov <shamaz.mazum@gmail.com> To: Peter Grehan <grehan@freebsd.org> Cc: freebsd-virtualization@freebsd.org Subject: Re: Running Mezzano in bhyve Message-ID: <CADnZ6BmFty3XKdM4t0vnuBX8%2BrnUSyApW9yvVKnN_s8abCJkOg@mail.gmail.com> In-Reply-To: <106b8500-a0ef-4095-af20-8c0f110ea739@freebsd.org> References: <CADnZ6B=ex24mbGN3du6UuS84akJZAxTcG5xqt0HB0RN5S262cQ@mail.gmail.com> <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <CADnZ6BkWd-v=y0L9%2BGiu=ys_Cuk5nm6djApSXYLufYuv=WnQWQ@mail.gmail.com> <CADnZ6B=LwZyiBTvXGek37e23t_e3ub4K%2BE96QaahukPbobkHhg@mail.gmail.com> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> <CADnZ6BkKh5V9_Y%2BTGrGpc=vTW2q81pdWJn8MUVvWNOiV35nBFw@mail.gmail.com> <CADnZ6BkHkNBD5LaEZCeSy7QnfquwB-Wv3sYu4S=P58ZyVGrDQQ@mail.gmail.com> <e395fc30-0582-4d51-b1b3-cf5157bdd3a9@freebsd.org> <CADnZ6BmjGzHygqJSNY=wpuy-6Z4YiAMpt-gBx0f%2Bi%2BrXBfBvaQ@mail.gmail.com> <106b8500-a0ef-4095-af20-8c0f110ea739@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Regarding items 3) and 4):
3) Indeed, bhyve does not explicitly forbid writing to 0x3c. I meant the
following. The interrupt line is set is pci_emul.c in bhyve:
pci_set_cfgdata8(pi, PCIR_INTLINE, pirq_irq(ii->ii_pirq_pin));
Bhyve asserts interrupts with pci_irq_assert in amd64/pci_irq.c. We need
this line: vm_isa_assert_irq(pi->pi_vmctx, pirq->reg & PIRQ_IRQ,
pi->pi_lintr.ioapic_irq);
pirq->reg & PIRQ_IRQ is literally the same as pirq_irq(ii->ii_pirq_pin).
Now, if something (e.g. UEFI firmware, bootloader) writes to PCIR_INTLINE
bhyve will still send interrupts with the number that was there before the
write, while the OS will expect an interrupt with the new number. I treat
this as a bug in bhyve (but it affects nobody, because newer OSes do not
use the 8259 interrupt controller).
4) It's commenting the lock what makes an effect. I commented
pci_generate_msi just in case because it's not needed for Mezzano, but runs
protected by the mutex which is now gone.
This is a backtrace and thread list when bhyve hangs up if the mutex is not
commented out:
(lldb) bt
* thread #1, name = 'mevent', stop reason = signal SIGSTOP
* frame #0: 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38
frame #1: 0x000011adeaa479c0
libthr.so.3`__thr_umutex_lock(mtx=0x0000378ecca00888, id=101223) at
thr_umtx.c:79:3
frame #2: 0x000011adeaa40eea
libthr.so.3`mutex_lock_sleep(curthread=0x0000378ecc412000,
m=0x0000378ecca00888, abstime=0x0000000000000000) at thr_mutex.c:699:9
frame #3: 0x000011adeaa3ed8f libthr.so.3`__Tthr_mutex_lock [inlined]
mutex_lock_common(m=0x0000378ecca00888, abstime=0x0000000000000000,
cvattach=false, rb_onlist=false) at thr_mutex.c:733:9
frame #4: 0x000011adeaa3ed4d
libthr.so.3`__Tthr_mutex_lock(mutex=<unavailable>) at thr_mutex.c:752:9
frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=0x0000378ecc4b8000,
isr='\x01', msix_idx=65535) at virtio.h:358:3
frame #6: 0x000011a5c43e6c86 bhyve`vq_interrupt(vs=0x0000378ecc4b8000,
vq=0x0000378ecc4b8038) at virtio.h:376:2
frame #7: 0x000011a5c43e6c44 bhyve`vq_endchains(vq=0x0000378ecc4b8038,
used_all_avail=0) at virtio.c:512:3
frame #8: 0x000011a5c43db348 bhyve`pci_vtnet_rx(sc=0x0000378ecc4b8000)
at pci_virtio_net.c:271:4
frame #9: 0x000011a5c43dab53 bhyve`pci_vtnet_rx_callback(fd=6,
type=EVF_READ, param=0x0000378ecc4b8000) at pci_virtio_net.c:403:2
frame #10: 0x000011a5c43bb9f8
bhyve`mevent_handle(kev=0x000011ade4451200, numev=1) at mevent.c:273:3
frame #11: 0x000011a5c43bb5d7 bhyve`mevent_dispatch at mevent.c:549:3
frame #12: 0x000011a5c43aed4b bhyve`main(argc=1,
argv=0x000011ade4453418) at bhyverun.c:1052:2
frame #13: 0x000011adec6c1a6a libc.so.7`__libc_start1(argc=24,
argv=0x000011ade4453360, env=0x000011ade4453428, cleanup=<unavailable>,
mainX=(bhyve`main at bhyverun.c:694)) at libc_start1.c:157:7
frame #14: 0x000011a5c43a80cd bhyve`_start at crt1_s.S:83
(lldb) frame select 5
frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=0x0000378ecc4b8000,
isr='\x01', msix_idx=65535) at virtio.h:358:3
355 if (pci_msix_enabled(vs->vs_pi))
356 pci_generate_msix(vs->vs_pi, msix_idx);
357 else {
-> 358 VS_LOCK(vs);
359 vs->vs_isr |= isr;
360 pci_generate_msi(vs->vs_pi, 0);
361 #ifdef __amd64__
(lldb) thread list
Process 3185 stopped
* thread #1: tid = 101223, 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'mevent', stop reason = signal SIGSTOP
thread #2: tid = 101868, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-0', stop reason = signal SIGSTOP
thread #3: tid = 101869, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-1', stop reason = signal SIGSTOP
thread #4: tid = 101870, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-2', stop reason = signal SIGSTOP
thread #5: tid = 101871, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-3', stop reason = signal SIGSTOP
thread #6: tid = 101872, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-4', stop reason = signal SIGSTOP
thread #7: tid = 101873, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-5', stop reason = signal SIGSTOP
thread #8: tid = 101874, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-6', stop reason = signal SIGSTOP
thread #9: tid = 101875, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'blk-3:0-7', stop reason = signal SIGSTOP
thread #10: tid = 101876, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'vtnet-5:0 tx', stop reason = signal SIGSTOP
thread #11: tid = 101877, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'hda-audio-output', stop reason = signal SIGSTOP
thread #12: tid = 101878, 0x000011adec7752ea libc.so.7`__sys_accept at
_accept.S:4, name = 'rfb', stop reason = signal SIGSTOP
thread #13: tid = 101879, 0x000011adec7726aa libc.so.7`__sys_ioctl at
ioctl.S:4, name = 'vcpu 0', stop reason = signal SIGSTOP
thread #14: tid = 101880, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at
_umtx_op_err.S:38, name = 'vcpu 1', stop reason = signal SIGSTOP
I think implementing IOAPIC in MEzzano is the best option indeed, but I
have a little experience. I'll see what I can do.
пн, 14 окт. 2024 г. в 22:52, Peter Grehan <grehan@freebsd.org>:
> > 1) The problem with PIT. Can be solved as you proposed or by
> > patching Mezzano. The bhyve patch would be the best option for that:
> it's useful for
> other older o/s's (DOS).
>
> > 2) Mezzano assumes that Intel AHCI controllers report no more than 6
> > ports. Can be solved by patching Mezzano or defining MAX_PORTS to be
> > 6 in usr.sbin/bhyve/pci_ahci.c
>
> A Mezzano patch would be best for that. The bhyve man page has an
> example with 8 disks attached so reducing the limit to 6 could hit
> existing users.
>
> > 3) According to
> > https://wiki.osdev.org/PCI#Message_Signaled_Interrupts
> > <https://wiki.osdev.org/PCI#Message_Signaled_Interrupts>, interrupt
> > line config register must be RW. Bhyve does not support writing to
> > it. I do not know a correct fix, this [1] workaround helps, however.
>
> Bhyve does support writing to that - your patch disables that, and my
> guess is that when Mezzano sees this as zero (ie invalid) it then looks
> for the irq line via the ACPI MADT (or other means).
>
> A quick look at Mezzano shows that it is still using the 8259 PIC for
> interrupts. At the minimum it should be using the IOAPIC, or excessive
> interrupt sharing will result, and possibly incorrect behaviour when
> this happens. I think IOAPIC support could be added without a large
> amount of effort, compared to e.g. MSI/MSI-x.
>
> > 4) Finally, I had a random deadlock in interrupt handling for the
> > virtio-net device. Likewise, I do not know how to fix it correctly,
> > but this [2] patch helped.
>
> Hmmm that seems strange: MSI interrupts aren't generated if they
> haven't been setup/enabled by a guest. Commenting out the lock/unlock
> code would seem to indicate a larger bug in play. Would it possible to
> get some tracing on that segment of code e.g. a dtrace log ?
>
> > Do you have any ideas how to make proper patches for bhyve from
> > these workarounds?
>
> The first one can be put in a phab diff, which I'll do. I think there's
> still some more work involved for the others.
>
> later,
>
> Peter.
>
>
>
>
[-- Attachment #2 --]
<div dir="ltr">Regarding items 3) and 4):<div><br></div><div>3) Indeed, bhyve does not explicitly forbid writing to 0x3c. I meant the following. The interrupt line is set is pci_emul.c in bhyve:</div><div> pci_set_cfgdata8(pi, PCIR_INTLINE, pirq_irq(ii->ii_pirq_pin));<br></div><div>Bhyve asserts interrupts with pci_irq_assert in amd64/pci_irq.c. We need this line: vm_isa_assert_irq(pi->pi_vmctx, pirq->reg & PIRQ_IRQ, pi->pi_lintr.ioapic_irq);</div><div>pirq->reg & PIRQ_IRQ is literally the same as pirq_irq(ii->ii_pirq_pin). Now, if something (e.g. UEFI firmware, bootloader) writes to PCIR_INTLINE bhyve will still send interrupts with the number that was there before the write, while the OS will expect an interrupt with the new number. I treat this as a bug in bhyve (but it affects nobody, because newer OSes do not use the 8259 interrupt controller).</div><div><br></div><div>4) It's commenting the lock what makes an effect. I commented pci_generate_msi just in case because it's not needed for Mezzano, but runs protected by the mutex which is now gone.</div><div>This is a backtrace and thread list when bhyve hangs up if the mutex is not commented out:</div><div><br></div><div>(lldb) bt<br>* thread #1, name = 'mevent', stop reason = signal SIGSTOP<br> * frame #0: 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:38<br> frame #1: 0x000011adeaa479c0 libthr.so.3`__thr_umutex_lock(mtx=0x0000378ecca00888, id=101223) at thr_umtx.c:79:3<br> frame #2: 0x000011adeaa40eea libthr.so.3`mutex_lock_sleep(curthread=0x0000378ecc412000, m=0x0000378ecca00888, abstime=0x0000000000000000) at thr_mutex.c:699:9<br> frame #3: 0x000011adeaa3ed8f libthr.so.3`__Tthr_mutex_lock [inlined] mutex_lock_common(m=0x0000378ecca00888, abstime=0x0000000000000000, cvattach=false, rb_onlist=false) at thr_mutex.c:733:9<br> frame #4: 0x000011adeaa3ed4d libthr.so.3`__Tthr_mutex_lock(mutex=<unavailable>) at thr_mutex.c:752:9<br> frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=0x0000378ecc4b8000, isr='\x01', msix_idx=65535) at virtio.h:358:3<br> frame #6: 0x000011a5c43e6c86 bhyve`vq_interrupt(vs=0x0000378ecc4b8000, vq=0x0000378ecc4b8038) at virtio.h:376:2<br> frame #7: 0x000011a5c43e6c44 bhyve`vq_endchains(vq=0x0000378ecc4b8038, used_all_avail=0) at virtio.c:512:3<br> frame #8: 0x000011a5c43db348 bhyve`pci_vtnet_rx(sc=0x0000378ecc4b8000) at pci_virtio_net.c:271:4<br> frame #9: 0x000011a5c43dab53 bhyve`pci_vtnet_rx_callback(fd=6, type=EVF_READ, param=0x0000378ecc4b8000) at pci_virtio_net.c:403:2<br> frame #10: 0x000011a5c43bb9f8 bhyve`mevent_handle(kev=0x000011ade4451200, numev=1) at mevent.c:273:3<br> frame #11: 0x000011a5c43bb5d7 bhyve`mevent_dispatch at mevent.c:549:3<br> frame #12: 0x000011a5c43aed4b bhyve`main(argc=1, argv=0x000011ade4453418) at bhyverun.c:1052:2<br> frame #13: 0x000011adec6c1a6a libc.so.7`__libc_start1(argc=24, argv=0x000011ade4453360, env=0x000011ade4453428, cleanup=<unavailable>, mainX=(bhyve`main at bhyverun.c:694)) at libc_start1.c:157:7<br> frame #14: 0x000011a5c43a80cd bhyve`_start at crt1_s.S:83<br></div><div><br></div><div>(lldb) frame select 5<br>frame #5: 0x000011a5c43e7b06 bhyve`vi_interrupt(vs=0x0000378ecc4b8000, isr='\x01', msix_idx=65535) at virtio.h:358:3<br> 355 if (pci_msix_enabled(vs->vs_pi))<br> 356 pci_generate_msix(vs->vs_pi, msix_idx);<br> 357 else {<br>-> 358 VS_LOCK(vs);<br> 359 vs->vs_isr |= isr;<br> 360 pci_generate_msi(vs->vs_pi, 0);<br> 361 #ifdef __amd64__<br></div><div><br></div><div>(lldb) thread list<br>Process 3185 stopped<br>* thread #1: tid = 101223, 0x000011adeaa37e2a libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'mevent', stop reason = signal SIGSTOP<br> thread #2: tid = 101868, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-0', stop reason = signal SIGSTOP<br> thread #3: tid = 101869, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-1', stop reason = signal SIGSTOP<br> thread #4: tid = 101870, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-2', stop reason = signal SIGSTOP<br> thread #5: tid = 101871, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-3', stop reason = signal SIGSTOP<br> thread #6: tid = 101872, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-4', stop reason = signal SIGSTOP<br> thread #7: tid = 101873, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-5', stop reason = signal SIGSTOP<br> thread #8: tid = 101874, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-6', stop reason = signal SIGSTOP<br> thread #9: tid = 101875, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'blk-3:0-7', stop reason = signal SIGSTOP<br> thread #10: tid = 101876, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'vtnet-5:0 tx', stop reason = signal SIGSTOP<br> thread #11: tid = 101877, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'hda-audio-output', stop reason = signal SIGSTOP<br> thread #12: tid = 101878, 0x000011adec7752ea libc.so.7`__sys_accept at _accept.S:4, name = 'rfb', stop reason = signal SIGSTOP<br> thread #13: tid = 101879, 0x000011adec7726aa libc.so.7`__sys_ioctl at ioctl.S:4, name = 'vcpu 0', stop reason = signal SIGSTOP<br> thread #14: tid = 101880, 0x000011adeaa37e2c libthr.so.3`_umtx_op_err at _umtx_op_err.S:38, name = 'vcpu 1', stop reason = signal SIGSTOP<br></div><div><br></div><div>I think implementing IOAPIC in MEzzano is the best option indeed, but I have a little experience. I'll see what I can do.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">пн, 14 окт. 2024 г. в 22:52, Peter Grehan <<a href="mailto:grehan@freebsd.org">grehan@freebsd.org</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">> 1) The problem with PIT. Can be solved as you proposed or by<br>
> patching Mezzano. The bhyve patch would be the best option for that: it's useful for<br>
other older o/s's (DOS).<br>
<br>
> 2) Mezzano assumes that Intel AHCI controllers report no more than 6 <br>
> ports. Can be solved by patching Mezzano or defining MAX_PORTS to be <br>
> 6 in usr.sbin/bhyve/pci_ahci.c<br>
<br>
A Mezzano patch would be best for that. The bhyve man page has an<br>
example with 8 disks attached so reducing the limit to 6 could hit<br>
existing users.<br>
<br>
> 3) According to <br>
> <a href="https://wiki.osdev.org/PCI#Message_Signaled_Interrupts" rel="noreferrer" target="_blank">https://wiki.osdev.org/PCI#Message_Signaled_Interrupts</a> <br>
> <<a href="https://wiki.osdev.org/PCI#Message_Signaled_Interrupts" rel="noreferrer" target="_blank">https://wiki.osdev.org/PCI#Message_Signaled_Interrupts</a>>, interrupt <br>
> line config register must be RW. Bhyve does not support writing to <br>
> it. I do not know a correct fix, this [1] workaround helps, however.<br>
<br>
Bhyve does support writing to that - your patch disables that, and my<br>
guess is that when Mezzano sees this as zero (ie invalid) it then looks<br>
for the irq line via the ACPI MADT (or other means).<br>
<br>
A quick look at Mezzano shows that it is still using the 8259 PIC for<br>
interrupts. At the minimum it should be using the IOAPIC, or excessive<br>
interrupt sharing will result, and possibly incorrect behaviour when<br>
this happens. I think IOAPIC support could be added without a large<br>
amount of effort, compared to e.g. MSI/MSI-x.<br>
<br>
> 4) Finally, I had a random deadlock in interrupt handling for the <br>
> virtio-net device. Likewise, I do not know how to fix it correctly, <br>
> but this [2] patch helped.<br>
<br>
Hmmm that seems strange: MSI interrupts aren't generated if they<br>
haven't been setup/enabled by a guest. Commenting out the lock/unlock<br>
code would seem to indicate a larger bug in play. Would it possible to<br>
get some tracing on that segment of code e.g. a dtrace log ?<br>
<br>
> Do you have any ideas how to make proper patches for bhyve from<br>
> these workarounds?<br>
<br>
The first one can be put in a phab diff, which I'll do. I think there's<br>
still some more work involved for the others.<br>
<br>
later,<br>
<br>
Peter.<br>
<br>
<br>
<br>
</blockquote></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADnZ6BmFty3XKdM4t0vnuBX8%2BrnUSyApW9yvVKnN_s8abCJkOg>
