Date: Thu, 28 Jul 2022 21:58:10 +0000 From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 265487] bhyve drops the PCI passthrough device in a VM if the host is suspended/resumed Message-ID: <bug-265487-27103@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D265487 Bug ID: 265487 Summary: bhyve drops the PCI passthrough device in a VM if the host is suspended/resumed Product: Base System Version: 13.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bhyve Assignee: virtualization@FreeBSD.org Reporter: jon@xyinn.org Hello, I've been debugging an issue with the creator if wifibox ( P=C3=81LI G=C3=A1bor J=C3=A1nos) that I've been experiencing on my Thinkpad= X1 Carbon Gen 7 on 13.1-STABLE (n251926-488f9d85278, but I also reproduced it in 13.1-RELEASE - n250148-fc952ac2212 - while seeing if this was a regression, it is not). My particular scenario is using 'wifibox' to pass my wifi card to an Alpine Linux VM using bhyve. Most of the details and logs during our debugging have been posted in this bug report: https://github.com/pgj/freebsd-wifibox/issues/31 It seems that when I shut down my laptop (host) while the bhyve VM is runni= ng, it would non-deterministically (but very soon, around 1-5 times of doing suspend/resume) would resume the system successfully, but the wifi would st= op working. Upon further investigation, we noticed that the wireless interface= and the entire PCI card was completely missing from within the VM. The only way= to "fix" it was to reboot. Upon further experimentation, I noticed that if I stopped the VM, then suspended/resumed, and then started the VM again, the PCI card would be successfully available again. This workaround so far hasn't failed for me (= but I'm still testing it throughout the day). My suspicion is that when bhyve h= as passed down a PCI device and the host is suspended, the resources for that handle aren't being released correctly and the machine may be in an inconsistent state in regards to this specific passthrough device. To post some logs, we can see the following before and after inside the bhy= ve VM: Before ----------- wifibox:~# lspci 00:1f.0 Class 0601: 8086:7000 00:04.2 Class 0100: 1af4:1009 00:04.0 Class 0100: 1af4:1001 00:00.0 Class 0600: 1275:1275 00:04.3 Class 0100: 1af4:1009 00:06.0 Class 0280: 8086:02f0 00:04.1 Class 0100: 1af4:1009 00:05.0 Class 0200: 8086:100f wifibox:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:98:8A:05:71=20=20 inet addr:10.1.0.1 Bcast:0.0.0.0 Mask:255.0.0.0 inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:20396 errors:0 dropped:0 overruns:0 frame:0 TX packets:33557 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000=20 RX bytes:4197622 (4.0 MiB) TX bytes:27342203 (26.0 MiB) lo Link encap:Local Loopback=20=20 inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000=20 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wlan0 Link encap:Ethernet HWaddr F8:E4:E3:EB:35:02=20=20 inet addr:192.168.1.139 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::fae4:e3ff:feeb:3502/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34620 errors:0 dropped:0 overruns:0 frame:0 TX packets:20133 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000=20 RX bytes:27437272 (26.1 MiB) TX bytes:4448962 (4.2 MiB) wifibox:~# dmesg | grep iwlwifi [ 0.869316] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A [ 0.869318] iwlwifi 0000:00:06.0: PCI INT A: not connected [ 0.870101] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41 [ 0.926465] iwlwifi 0000:00:06.0: api flags index 2 larger than supporte= d by driver [ 0.926473] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37 [ 0.926601] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 QuZ-a0-jf-b0-66.ucode op_mode iwlmvm [ 0.946738] iwlwifi 0000:00:06.0: Detected Intel(R) Wireless-AC 9560 160= MHz, REV=3D0x354 [ 1.061975] iwlwifi 0000:00:06.0: Detected RF JF, rfid=3D0x105110 [ 1.118696] iwlwifi 0000:00:06.0: base HW address: f8:e4:e3:eb:35:02 [ 5.207325] iwlwifi 0000:00:06.0: Unhandled alg: 0x3f0707 After ------ wifibox:~# lspci 00:1f.0 Class 0601: 8086:7000 00:04.2 Class 0100: 1af4:1009 00:04.0 Class 0100: 1af4:1001 00:00.0 Class 0600: 1275:1275 00:04.3 Class 0100: 1af4:1009 00:04.1 Class 0100: 1af4:1009 00:05.0 Class 0200: 8086:100f wifibox:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:98:8A:05:71=20=20 inet addr:10.1.0.1 Bcast:0.0.0.0 Mask:255.0.0.0 inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:145 errors:0 dropped:0 overruns:0 frame:0 TX packets:17 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000=20 RX bytes:143406 (140.0 KiB) TX bytes:1594 (1.5 KiB) lo Link encap:Local Loopback=20=20 inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000=20 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wifibox:~# dmesg | grep iwlwifi <nothing> Here we can see that the '00:06.0 Class 0280: 8086:02f0' has completely dissapeared. Using my "stop wifibox before sleep to release the PCI resource + start it = back up later" approach in this way: root@leslie:~ # cat /usr/local/etc/devd/wifibox.conf # This is a `devd(8)` configuration file to run the resume action of # wifibox on the ACPI resume event. Review the contents and create a # copy of it without the `.sample` extension to use it. Restart the # `devd` service once the file has been created. notify 11 { match "system" "ACPI"; match "subsystem" "Suspend"; action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify"; }; notify 11 { match "system" "ACPI"; match "subsystem" "Resume"; action "/etc/rc.resume acpi $notify && logger 'Starting wifibox aft= er resume and getting IP via DHCP' && /usr/local/sbin/wifibox start && /sbin/dhclient wifibox0"; }; we can see the following in the logs (correct flow on the host): bridge0: Ethernet address: 58:9c:fc:10:ff:99 bridge0: changing name to 'wifibox0' tap0: Ethernet address: 58:9c:fc:10:c0:3f tap0: promiscuous mode enabled wifibox0: link state changed to DOWN ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 tap0: link state changed to UP wifibox0: link state changed to UP lo0: link state changed to UP ... tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) ue0: link state changed to UP tap0: promiscuous mode disabled ... ppt0: detached pci0: <network> at device 20.3 (no driver attached) ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 bridge0: Ethernet address: 58:9c:fc:10:ff:99 bridge0: changing name to 'wifibox0' tap0: Ethernet address: 58:9c:fc:10:c0:3f tap0: promiscuous mode enabled wifibox0: link state changed to DOWN tap0: link state changed to UP wifibox0: link state changed to UP tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 tap0: link state changed to UP wifibox0: link state changed to UP tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) tap0: promiscuous mode disabled Regarding the devd rule for wifibox, notice that in the Suspend action, the "/etc/rc.suspend acpi $notify" call is happening at the end. I'm suspecting this is behaving like a stack (and makes sense since we are basically rever= sing stuff when we suspend/resume). When I made that call in the beginning, the = PCI passthrough issue arose, but when I placed it at the end it was fine. I'm guessing this is due to the suspend command being triggered and the wifibox section didn't have enough time to fully release the resources, thus it wou= ld yield the same issue that I mentioned before. Perhaps that rule can help further narrowing this issue down. At the moment I'm testing some slight improvements to the devd workaround t= hat=20 P=C3=81LI suggested which will help speed up the suspend/resume times and n= ot need me to explicitly call the dhclient command. Ultimately, we'll want to fix t= he bhyve issue. I don't have any experience with bhyve or its internals but I = can help with debugging across RELEASE/STABLE/CURRENT. - Jonathan --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-265487-27103>