From nobody Wed Aug 20 00:17:45 2025 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4c66VM5yDsz632Lb for ; Wed, 20 Aug 2025 00:18:03 +0000 (UTC) (envelope-from wanpengqian@gmail.com) Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4c66VL2Spdz3b6Z; Wed, 20 Aug 2025 00:18:02 +0000 (UTC) (envelope-from wanpengqian@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=JUm3Lx9O; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of wanpengqian@gmail.com designates 2607:f8b0:4864:20::b30 as permitted sender) smtp.mailfrom=wanpengqian@gmail.com Received: by mail-yb1-xb30.google.com with SMTP id 3f1490d57ef6-e94e3c3621fso1425433276.0; Tue, 19 Aug 2025 17:18:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755649076; x=1756253876; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6VP3is9VaG+IrEf75hRDLBtd/zvrMXFjV6UiakaAQxA=; b=JUm3Lx9O6BiRp3WhTU+zrZ//4NtdoREJH5NxInAU9OzaeV49+9V/CLJ4b1eCJ3iTNT 8Rn9ZjGwPPLWcsIKC6GpXsa82EJp8WCS/m3deJLfqhe1rLgQOQZDgv4ffWHxLDw0mWKS NU/RXJ2rvf0LtiIhRMHxmgpBI/PBkzJg/CNlmXjvrKI1aB4jE01TNYyH0aM3/0vf5wY8 qJRhIHiw1l2NrHgVjlZC1RuQ0Ziue9qq8cMNUHQZNDXl+yYbMaa2tIkwwD2uxwAVd2aR 9PwV6RM3nHS/YBMJB4vWlW+tzqnxvXY0X0sre5MHEJ0TjYC3UnFXk1kNwL2gK9kWogXo la8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755649076; x=1756253876; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6VP3is9VaG+IrEf75hRDLBtd/zvrMXFjV6UiakaAQxA=; b=KEJnoLt3tjq7OIevgBQQRxSc61zXlZk9uBtHGhGSED9WP3ciq7C7668OaBZDgC5AQe Az/KRGpO5EKv+jYdv+/cgs3+xmBClfcd8pxtqNIlaN5zL0uTSi/D8gD1jmmlpwauJ1+l tBrpvFg9n2ndRG2OKxqJDq96M9Vs9MlmrbFjmC5YyhGYDBScehm4E2hu80RhUNydx9UU MWPT7Id7ixLxxBUmmVzAuxOrVQJy6Gy2FPicVJda6PywPZbAcNMzhabJ+44bhJ8ok7tg fVIAgUegAxf8tZFKt7/6kcN2lBHhoyQCbT5+jG7A5p1EvG2oDnvHyAeEIm8RvjPhsP21 IpOA== X-Forwarded-Encrypted: i=1; AJvYcCWXppxWpPXM2PBGCIKV9r1/MZPXaYbD6rFkWvbA76e3c7mEG8QwcvLifLHZ6z7QvD5N1lgRk8Xe1WI71J7ZoA==@freebsd.org X-Gm-Message-State: AOJu0YzoPWGT70E+ft2ATcDGV7Lt1Un07yq9Bmo1OX5Lpv8Mkn86XH1O aDtEnAY0CqfCNVUrnJcCNiCpAT3fKODqNJO2VA/O0GkT1/y5oNDODfH3+1P+egRRjuby1LsFWxX SxDdchHWOF5o3MUzVN+AlmbncHGV1FigsszAz X-Gm-Gg: ASbGncv0Wx2wDEzthh6RTonCz1md141Z65biZvS/zW3CMefzxjFu6Cwp/Tc31XzVJkU QcB1gu9+VFz4Ra2oeqp3JeDkJjEqB37TxVm5h8iYLuT9PSkCztR6yYfWRftHdw2LSDGUFwDe+DC 1ZXpZxmMlC3YJ/iKwG9e1jLGE0qaPNWTLfxTWN4ncVY5CfqtEaZ6r+QVwxW6zzYo30rqwCv2rZK cDaUBi3 X-Google-Smtp-Source: AGHT+IHwfPwmTaHaScGf0mj9fAhBhH1Q4vITW3Fv6DtL6zXrEckS+sROApwjUWdV/qDkoBlWALsEt/SwMRWLX65+b9U= X-Received: by 2002:a05:690c:46c3:b0:71b:4739:9d67 with SMTP id 00721157ae682-71fb30b3eefmr13493377b3.4.1755649076263; Tue, 19 Aug 2025 17:17:56 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org MIME-Version: 1.0 References: <38c9656c26fc3cee7ba733168c0fa2cdd01209d9.camel@FreeBSD.org> <5473b6f9d3e542b45d9c7ef3e28c57b2f937ab79.camel@FreeBSD.org> In-Reply-To: <5473b6f9d3e542b45d9c7ef3e28c57b2f937ab79.camel@FreeBSD.org> From: Wanpeng Qian Date: Wed, 20 Aug 2025 09:17:45 +0900 X-Gm-Features: Ac12FXwMrzwGCzIbFXtnVzBB4IXyeMtDw9tTc7ut9utym9z4cPqllmqDojUmW1U Message-ID: Subject: Re: bhyve passthru problem To: =?UTF-8?Q?Corvin_K=C3=B6hne?= Cc: Peter Grehan , virtualization@freebsd.org, Oleksandr Kryvulia Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.96 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.96)[-0.963]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_DN_SOME(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; ARC_NA(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MLMMJ_DEST(0.00)[virtualization@freebsd.org]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::b30:from]; DKIM_TRACE(0.00)[gmail.com:+] X-Rspamd-Queue-Id: 4c66VL2Spdz3b6Z Hi all, I am working with a PCIe x16 -> x4x4x4x4 quad-bifurcation carrier hosting 4 SSDs. One NVMe controller exposes a second BAR of 256 bytes (0x100). nvme0@pci0:129:0:0: class=3D0x010802 rev=3D0x01 hdr=3D0x00 vendor=3D0x144d device=3D0xa802 subvendor=3D0x144d subdevice=3D0xa801 vendor =3D 'Samsung Electronics Co Ltd' device =3D 'NVMe SSD Controller SM951/PM951' class =3D mass storage subclass =3D NVM bar [10] =3D type Memory, range 64, base 0xfbe00000, size 16384, enab= led bar [18] =3D type Memory, range 32, base 0xfbe04000, size 256, enable= d cap 01[40] =3D powerspec 3 supports D0 D3 current D0 cap 05[50] =3D MSI supports 8 messages, 64 bit cap 10[70] =3D PCI-Express 2 endpoint max data 128(128) FLR NS max read 512 link x4(x4) speed 8.0(8.0) ClockPM disabled cap 11[b0] =3D MSI-X supports 9 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] =3D Serial 1 0000000000000000 ecap 0004[158] =3D Power Budgeting 1 ecap 0019[168] =3D PCIe Sec 1 lane errors 0 ecap 0018[188] =3D LTR 1 ecap 001e[190] =3D L1 PM Substates 1 On FreeBSD, bhyve refuses to start with the error that the BAR base or size is not page aligned. After some digging, I believe it is safe to passthrough the entire 4 KiB page when=E2=80=94and only when=E2=80=94no other device=E2=80=99s BAR (= different BDF) falls within the same host physical page. This mirrors what QEMU/VFIO does for sub-page BARs. I have implemented the same =E2=80=9Cexclusive page=E2=80=9D check in bhyve= and posted a patch: Review: https://reviews.freebsd.org/D52013 Background (QEMU/VFIO discussion): https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg02908.html Patch summary: Build a table of occupied MMIO pages using PCIOCGETCONF / PCIOCGETBAR. For memory BARs with size < PAGE_SIZE: Require the BAR base to be page-aligned (unchanged). If the 4 KiB page is exclusive to that BDF, allow passthrough and set the host mapped_size =3D PAGE_SIZE, while keeping the guest-visible BAR size unchanged. Otherwise, keep rejecting. No change for I/O BARs, or for memory BARs with size >=3D PAGE_SIZE (still must be a multiple of PAGE_SIZE). Testing: Host: FreeBSD 14.3R with IOMMU enabled. Devices: NVMe controller on a quad-bifurcation card; also tested alongside an NVIDIA GTX 1080 Ti as another ppt device. Guest: Windows 10 installs and boots from the passed-through NVMe device. No regressions observed with devices that have >=3D 4 KiB BARs. Feedback welcome on: The placement and lifetime of the occupied-page cache. Whether to gate this behind a runtime knob (e.g., -W sub4k=3Dauto|off|force= ). Any additional scenarios you would like me to test. Best regards, Qian On Mon, Jun 17, 2024 at 3:21=E2=80=AFPM Corvin K=C3=B6hne wrote: > > On Fri, 2024-06-14 at 17:50 +1000, Peter Grehan wrote: > > > I don't know why bhyve validates the BAR size. The commit adding > > > this > > > check is old [1] and doesn't explain it. What bhyve could do is > > > rounding up the BAR size to a full page size when allocating memory > > > for > > > the BAR. > > > > > > [1] https://github.com/freebsd/freebsd- > > > src/commit/7a902ec0eccc752c9c38533ed123121eaaea1225 > > > > At the time, BIOSs would often place device BARs of less than a > > page > > size in the same physical page. Since EPT only gives page > > granularity, > > this would result in all those devices being available to the guest > > even > > if they hadn't been passed through. > > > > later, > > > > Peter. > > > > > > Thanks for the explanation! > > What can we do about it? Does FreeBSD remaps BARs if they aren't page > aligned? If not, can we verify that the page is only used by a specific > device? > > > -- > Kind regards, > Corvin