From owner-freebsd-virtualization@freebsd.org Sat Mar 14 12:50:15 2020 Return-Path: Delivered-To: freebsd-virtualization@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 115AD258D00 for ; Sat, 14 Mar 2020 12:50:15 +0000 (UTC) (envelope-from erleya@gmail.com) Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48fj8P4bQqz42Ps; Sat, 14 Mar 2020 12:50:13 +0000 (UTC) (envelope-from erleya@gmail.com) Received: by mail-wr1-x432.google.com with SMTP id s14so15523191wrt.8; Sat, 14 Mar 2020 05:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=G0hA914vj6W+KotbkUvLEmQ6GOqFx25sKsj7JZQZt6w=; b=uZbcBJVY5CqZHE286XWKH4sudnpdzelgUAyCr32BKcmVXQYgYxBMkwf+R6KxSGk+X+ Ak3kzQZAZj6a3uRxgE5fSthKi4YuJIVEzJJu8H5nNjpIeWiPmHhRga+JhT45Kqi9UzE9 hduN6ytXIzA56g3QPO14fDif0HF3FTPAQNyBvVfikJCFf0Nj5I0ykxYMQm2gnLHfvOPB CdIqcZu0muQf/gmqrVzrEM1Xd8fHxgtv76M8rZH2urnGk1p8O1BZciVq9PPwSuongp8f 3hiitwTQ08th1dM5hY80Rc2bmD5ml+o9qPrH4IvH6O5pYPidFUaxuP9BdC3Q6mW461Y9 KQIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=G0hA914vj6W+KotbkUvLEmQ6GOqFx25sKsj7JZQZt6w=; b=oH5q5FUS02SvBJzwocpnkNjEjMpGN4j2cdDVXDP54mKzV7pJrpxPf1sqBPII7rPuVS AnonxhjQ8zawQLa5+Q9u+h/si3Vv3CaiVlDuTQgeb53vCw4d1yLoCvPvF/WXhR4FqefJ EelCLy9tYsls9X3zTFdiPE4f73RGyjf3O+wbGEfeJEvujOPXsF9MHqXJZa0vvAOC5SF4 R5Q9qXiOXimNuCdWkwYnCu7d7yIB4R5qAWVftPliNvjGl5+JZsGqFShc4mlA1pTaHiSh 5YjkPk5s+7XvsdrEGEqhbd8mC24JQjZggArXbNo2HLV/v4liZHW8aH/zTb8Ap45OPCgS Z+gA== X-Gm-Message-State: ANhLgQ23E+/1l3Jt64eafI4mhM14msMZwaA7DnkO+kf69Z8HtwAnHXHP frGn6TNxtwgq4KAXe6U88jjiHK4J X-Google-Smtp-Source: ADFU+vsKFzbTe0li/pCDtWwKJyi+jbgT08zAJq/KQhpZH8feg/Gd56i/bZY2mwXJ94XHAAqR1DyulQ== X-Received: by 2002:adf:9b19:: with SMTP id b25mr24085356wrc.368.1584190211667; Sat, 14 Mar 2020 05:50:11 -0700 (PDT) Received: from erley.ru (erley.ru. [83.153.157.67]) by smtp.gmail.com with ESMTPSA id f9sm33192780wrc.71.2020.03.14.05.50.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 14 Mar 2020 05:50:11 -0700 (PDT) Received: by erley.ru (OpenSMTPD) with ESMTPSA id f3c18373 (TLSv1.2:ECDHE-RSA-CHACHA20-POLY1305:256:NO); Sat, 14 Mar 2020 13:50:09 +0100 (CET) Subject: Re: [GPU pass-through] no compatible bridge window for claimed BAR To: Peter Grehan Cc: freebsd-virtualization@freebsd.org References: <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> From: Alex Erley Message-ID: Date: Sat, 14 Mar 2020 13:50:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <07921dcf-11d5-f440-a42f-d7ec950cab10@freebsd.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48fj8P4bQqz42Ps X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=uZbcBJVY; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of erleya@gmail.com designates 2a00:1450:4864:20::432 as permitted sender) smtp.mailfrom=erleya@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RECEIVED_SPAMHAUS_PBL(0.00)[67.157.153.83.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.00)[ip: (-9.28), ipnet: 2a00:1450::/32(-2.40), asn: 15169(-1.65), country: US(-0.05)]; IP_SCORE_FREEMAIL(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2.3.4.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Mar 2020 12:50:15 -0000 Hello, Some new findings to share. 1) Changing PCI_EMUL_MEMBASE64 from 0xD000000000 to any value *below 0x0440000000* makes bhyve fail when starting VM with message: bhyve: failed to initialize BARs for PCI 1/0/0 device emulation initialization error: Cannot allocate memory 2) Having PCI_EMUL_MEMBASE64 set to 0x0440000000 (or above) guest VM can not configure BARs of pass-through device properly. == (a) == On BHyve host ppt device is: > devinfo -rv ... pci0 hostb0 at slot=0 function=0 dbsf=pci0:0:0:0 pcib1 at slot=1 function=0 dbsf=pci0:0:1:0 handle=\_SB_.PCI0.P0P2 I/O ports: 0xe000-0xefff I/O memory addresses: 0x00c0000000-0x00d30fffff <-- covers all child mem windows pci1 ppt0 at slot=0 function=0 dbsf=pci0:1:0:0 pcib1 I/O port window: 0xe000-0xe07f pcib1 memory window: 0x00c0000000-0x00cfffffff <-- 256M 0x00d0000000-0x00d1ffffff <-- 32M 0x00d2000000-0x00d2ffffff <-- 16M ppt1 at slot=0 function=1 dbsf=pci0:1:0:1 pcib1 memory window: 0xd3080000-0xd3083fff <-- 16K ... and there is no other device attached to pci1. == (b) == On guest VM dmesg shows (timestamps are removed): ... BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x0000000000100000-0x00000000bea95fff] usable BIOS-e820: [mem 0x00000000bea96000-0x00000000bea97fff] reserved BIOS-e820: [mem 0x00000000bea98000-0x00000000bea99fff] ACPI data BIOS-e820: [mem 0x00000000bea9a000-0x00000000beaa8fff] reserved BIOS-e820: [mem 0x00000000beaa9000-0x00000000bfb28fff] usable BIOS-e820: [mem 0x00000000bfb29000-0x00000000bfb58fff] type 20 BIOS-e820: [mem 0x00000000bfb59000-0x00000000bfb7cfff] reserved BIOS-e820: [mem 0x00000000bfb7d000-0x00000000bfb81fff] usable BIOS-e820: [mem 0x00000000bfb82000-0x00000000bfb88fff] ACPI data BIOS-e820: [mem 0x00000000bfb89000-0x00000000bfb8cfff] ACPI NVS BIOS-e820: [mem 0x00000000bfb8d000-0x00000000bffcffff] usable BIOS-e820: [mem 0x00000000bffd0000-0x00000000bffeffff] reserved BIOS-e820: [mem 0x00000000bfff0000-0x00000000bfffffff] usable BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable ^^^-upper limit for adressable memory ... PM: Registered nosave memory: [mem 0xc0000000-0xffffffff] [mem 0xc0000000-0xffffffff] available for PCI devices ... pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] ^-- 128K pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff window] ^-- 512M pci_bus 0000:00: root bus resource [mem 0xf0000000-0xf07fffff window] ^-- 8M pci_bus 0000:00: root bus resource [bus 00-ff] == (c) == Until now all runs OK. Guest Linux then allocates memory regions for devices. Allocation is done from lower reg (0x10) to higher (0x30) for each device (i.e. from 00.0 to 1f.0) on PCI bus. Here I reordered dmesg output to groups to show continuous RAM regions: (pass-through device is marked with *) pci 0000:00:01.0: reg 0x24: [io 0x2000-0x207f] pci 0000:00:02.0: reg 0x10: [io 0x2080-0x209f] pci 0000:00:03.0: reg 0x10: [io 0x20c0-0x20ff] ... pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:02.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1d.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1e.0: reg 0x30: [mem 0x00000000-0x000007ff pref] pci 0000:00:1f.0: reg 0x30: [mem 0x00000000-0x000007ff pref] ... pci 0000:00:01.0: reg 0x10:*[mem 0xc0000000-0xc0ffffff] 16M ... 0xc1000000-0xc1ffffff 16M gap pci 0000:00:01.0: reg 0x1c:*[mem 0xc2000000-0xc3ffffff 64bit pref] 32M pci 0000:00:01.1: reg 0x10:*[mem 0xc4000000-0xc4003fff] pci 0000:00:02.0: reg 0x14: [mem 0xc4004000-0xc4005fff] pci 0000:00:03.0: reg 0x14: [mem 0xc4006000-0xc4007fff] pci 0000:00:1d.0: reg 0x10: [mem 0xc4008000-0xc400807f] ... 0xc4008080-0xc4ffffff <16M gap pci 0000:00:1d.0: reg 0x14: [mem 0xc5000000-0xc5ffffff] 16M pci 0000:00:1e.0: reg 0x10: [mem 0xc6000000-0xc6000fff] ... 0xc6001000-0xd2ffffff <208M gap pci 0000:00:01.0: reg 0x30:*[mem 0xd3000000-0xd307ffff pref] 512K 0xd3080000-0xdfffffff <208M gap pci 0000:00:01.0: reg0x14:*[mem 0x440000000-0x44fffffff 64bit pref] 256M ^^^- this value is outside allowed range == (d) == So, there is no window for 256M BAR, although there are 2 big gapes of 208M in 512M space provided for BAR allocation by PCI bus. So, BAR reg 0x14 of size 256M for device 01.0 must be inside provisioned 512M region 0xc0000000-0xdfffffff. But refering to (1) above, setting base address to any value below 0x440000000 breaks bhyve on start. According to (b), this value corresponds to upper addressable memory limit in guest VM. So I'm blocked here at the moment: - Guest VM requires a value which BHyve doesn't like. - Guest VM allocates BARs with huge gapes. I have little knowledge about PCI bus internals, although I already read some articles on internet. Could it be some ACPI trick to do? I'd be happy to hear any ideas... PS I suspect that if I take other OS as a guest VM or other pass-through GPU model, it would probably allocate BARs properly. But this is not what I want for this config. There should be a way to allocate 256M BAR in guest Linux. Have a nice day, Alex