From owner-freebsd-virtualization@freebsd.org Thu Nov 16 20:45:41 2017 Return-Path: Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43E5FDE8A8D for ; Thu, 16 Nov 2017 20:45:41 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C495769B16 for ; Thu, 16 Nov 2017 20:45:40 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f47.google.com with SMTP id k66so356307lfg.3 for ; Thu, 16 Nov 2017 12:45:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=kR80g8RUT3VnvOwjHFhL0BOiEZQlaupGs2f76KoDs+s=; b=R9Ec+SzH/5m3I63cUvAYp1/Uzi9lWTkE5yDIG26Wx72zWc1ciMl3RKyZdVyA/bzxCq DIte65AgCpaVF39KAUfp2lQIR1rQX5ya1ImoZt6aIJvk5RlRzFDVD+QCDJyPhZP2CBF5 Q1SnuWsnfOblxcDw1RdLJZmTy4TiLBtbuEwg/+ncloAA0tkidJxW8U5brj7RfO8rF6Gx XlN+vKI95nl3octQFs77gq7l0a3oDKkjMGEz60S6KpcPkrtPgdA1QXjlb54UEdB6fZQU yB6KfnB/9n80qQsy97hR02K9KigwnP+zvXmN3ncbXZOQw6rb6rs+ksAuThUCXopD9s8T HGDg== X-Gm-Message-State: AJaThX6J33DEdtuGABEEHHa4Fi1/bCX7Y8prAR0iFmFPhdG5+FGMQauC F48D04i8sj1ywoRGQeBO7955t776 X-Google-Smtp-Source: AGs4zMYjiGhdSCDaLFHzRlMMl6UEWABTo5l5MWqYYnS0V840rfImgekd45Y1TGFR5ScK5ZdmLeq5Kw== X-Received: by 10.25.208.20 with SMTP id h20mr39952lfg.26.1510865138079; Thu, 16 Nov 2017 12:45:38 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id a78sm361474lfa.86.2017.11.16.12.45.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Nov 2017 12:45:36 -0800 (PST) From: Andriy Gapon Subject: Re: problem with pass-through on amd To: Anish , "freebsd-virtualization@freebsd.org" References: Message-ID: Date: Thu, 16 Nov 2017 22:45:35 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Nov 2017 20:45:41 -0000 On 14/11/2017 06:22, Anish wrote: > Also ivhd has fault interrupt enabled which is very helpful in debugging: > > [root@ryzen /home/anish/FreeBSD/head]# vmstat -ia |grep ivh > irq256: ivhd0:fault                    0          0 > irq257: ivhd1:fault                    0          0 > Anish, I have made several interesting discoveries regarding my problem. One of them is that actually there were some IOMMU log events: dev.ivhd.0.event_tail: 240 dev.ivhd.0.event_head: 0 dev.ivhd.0.event_intr_count: 0 But there were no interrupts and the events are unconsumed and unreported. I examined MSI configuration of the IOMMU PCI device and the address and data registers were zeroed out. I looked at dmesg and at the code and I realized why that happened. So, first of all, I pre-load vmm via loader.conf. Probably as a result of that the ivhd device attaches before any bridges and buses on my system. And amdvi_alloc_intr_resources() does a rather untypical thing, it configures an MSI for a PCI device by directly writing to its configuration registers. The PCI bus code is completely unaware of those changes and it wipes them out in pci_add_child() -> pci_cfg_restore(). Also, I think that even if ivhd attached after the root PCI bus, then what it does would be still unsafe. I think that, for example, a suspend-resume cycle would wipe out the MSI configuration too. I think that in that case we should better use pci methods to configure MSI. Now, why does ivhd attach before the root Host-PCI bridge and what can we do to fix the order? ivrs_drv.c has this code: /* * Load this module at the end after PCI re-probing to configure interrupt. */ DRIVER_MODULE_ORDERED(ivhd, acpi, ivhd_driver, ivhd_devclass, 0, 0, SI_ORDER_ANY); But apparently this SI_ORDER_ANY does not help much. It affects only the driver registration order, but not the device probe and attachment order. This code is far more significant: ivhd_devs[i] = BUS_ADD_CHILD(parent, 1, "ivhd", i); ivhd passes 1 as the order. This is a very high order for the acpi bus. As a comment in acpi_probe_child() says: /* * Create a placeholder device for this node. Sort the * placeholder so that the probe/attach passes will run * breadth-first. Orders less than ACPI_DEV_BASE_ORDER * are reserved for special objects (i.e., system * resources). */ where ACPI_DEV_BASE_ORDER is 100. For example, order of the Host-PCI bridge on my system is 120. I must note that this is important only of vmm is preloaded (which is probably not an extremely rare case). If vmm is loaded after the system is booted then, of course, ivhd will be probed after the PCI buses / bridges. -- Andriy Gapon