Date: Sat, 30 Dec 2017 00:22:21 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: freebsd-virtualization@freebsd.org Subject: bhyve/amd: interrupt delivered when it shouldn't be? Message-ID: <42c22179-ae42-e4bb-e77d-a1d49fe634ed@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
First, about the setup. It's a FreeBSD/amd64 head guest on a FreeBSD/amd64 head host. The hardware is AMD. The hypervisor is bhyve. Under a certain specific load, that involves a lot of page faults and IPI-s, I see the guest system getting stuck. This is pretty consistent. Typically I find a thread spinning on smp_ipi_mtx. And an owner of the mutex appears to be in mi_switch() -> sched_switch(). The debugging data that I have is somewhat flaky, but it seems that the owner is typically in this code path: smp_targeted_tlb_shootdown -> ipi_send_cpu -> native_lapic_ipi_raw smp_targeted_tlb_shootdown holds smp_ipi_mtx. native_lapic_ipi_raw, in this setup, performs the following manipulations: saveintr = intr_disable(); ... intr_restore(saveintr); The interrupts are already disabled when this function is entered, because smp_ipi_mtx is a spinlock and our spinlock implementation disables interrupts. So, intr_restore() in this case should be a NOP (BTW, it's implemented via popf). But what I see suggests that at this point a Local APIC timer interrupt gets delivered to the thread. And that causes all the mess as the thread holding the spinlock gets preempted. Does this ring a bell to anyone? Is there any suspect code? It seems that we set v_intr_masking bit, so the rFLAGS / eFLAGS should be completely virtualized. So, maybe a hardware issue? Thank you! -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42c22179-ae42-e4bb-e77d-a1d49fe634ed>