From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 1 21:23:14 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EDF6BEF5; Wed, 1 Apr 2015 21:23:14 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8F30FC85; Wed, 1 Apr 2015 21:23:14 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t31LN4DP075967 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 2 Apr 2015 00:23:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t31LN4DP075967 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t31LN3r3075966; Thu, 2 Apr 2015 00:23:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 2 Apr 2015 00:23:03 +0300 From: Konstantin Belousov To: Tobias Oberstein Subject: Re: NVMe performance 4x slower than expected Message-ID: <20150401212303.GB2379@kib.kiev.ua> References: <551BC57D.5070101@gmail.com> <551C5A82.2090306@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <551C5A82.2090306@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: "freebsd-hackers@freebsd.org" , Michael Fuckner , Jim Harris , Alan Somers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Apr 2015 21:23:15 -0000 On Wed, Apr 01, 2015 at 10:52:18PM +0200, Tobias Oberstein wrote: > > > FreeBSD 11 Current with patches (DMAR and ZFS patches, otherwise the box > > > doesn't boot at all .. because of 3TB RAM and the amount of periphery). > > > > Do you still have WITNESS and INVARIANTS turned on in your kernel > > config? They're turned on by default for Current, but they do have > > some performance impact. To turn them off, just build a > > GENERIC-NODEBUG kernel . > > WITNESS is off, INVARIANTS is still on. INVARIANTS are costly. > > Here is complete config: > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_kernel_conf.md > > This is the aggregated patch (work was done by Konstantin - thanks again > btw!) > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_patch.md > > > Could you also post full dmesg output as well as vmstat -i? > > dmesg: > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_dmesg.md > > vmstat: > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd_vmstat.md > > === > > Here are results from FIO under FreeBSD: > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/freebsd.md > > Here are results using _same_ FIO control file under Linux: > > https://github.com/oberstet/scratchbox/blob/master/freebsd/cruncher/results/linux.md Is this vmstat after the test ? Somewhat funny is that nvme does not use MSI(X). I have the following patch for a long time, it allowed to increase pps in iperf and similar tests when DMAR is enabled. In your case it could reduce the rate of the DMAR interrupts. diff --git a/sys/x86/iommu/intel_ctx.c b/sys/x86/iommu/intel_ctx.c index a18adcf..b23a4c1 100644 --- a/sys/x86/iommu/intel_ctx.c +++ b/sys/x86/iommu/intel_ctx.c @@ -586,6 +586,18 @@ dmar_ctx_unload_entry(struct dmar_map_entry *entry, bool free) } } +static struct dmar_qi_genseq * +dmar_ctx_unload_gseq(struct dmar_ctx *ctx, struct dmar_map_entry *entry, + struct dmar_qi_genseq *gseq) +{ + + if (TAILQ_NEXT(entry, dmamap_link) != NULL) + return (NULL); + if (ctx->batch_no++ % dmar_batch_coalesce != 0) + return (NULL); + return (gseq); +} + void dmar_ctx_unload(struct dmar_ctx *ctx, struct dmar_map_entries_tailq *entries, bool cansleep) @@ -619,8 +631,7 @@ dmar_ctx_unload(struct dmar_ctx *ctx, struct dmar_map_entries_tailq *entries, entry->gseq.gen = 0; entry->gseq.seq = 0; dmar_qi_invalidate_locked(ctx, entry->start, entry->end - - entry->start, TAILQ_NEXT(entry, dmamap_link) == NULL ? - &gseq : NULL); + entry->start, dmar_ctx_unload_gseq(ctx, entry, &gseq)); } TAILQ_FOREACH_SAFE(entry, entries, dmamap_link, entry1) { entry->gseq = gseq; diff --git a/sys/x86/iommu/intel_dmar.h b/sys/x86/iommu/intel_dmar.h index 2865ab5..6e0ab7f 100644 --- a/sys/x86/iommu/intel_dmar.h +++ b/sys/x86/iommu/intel_dmar.h @@ -93,6 +93,7 @@ struct dmar_ctx { u_int entries_cnt; u_long loads; u_long unloads; + u_int batch_no; struct dmar_gas_entries_tree rb_root; struct dmar_map_entries_tailq unload_entries; /* Entries to unload */ struct dmar_map_entry *first_place, *last_place; @@ -339,6 +340,7 @@ extern dmar_haddr_t dmar_high; extern int haw; extern int dmar_tbl_pagecnt; extern int dmar_match_verbose; +extern int dmar_batch_coalesce; extern int dmar_check_free; static inline uint32_t diff --git a/sys/x86/iommu/intel_drv.c b/sys/x86/iommu/intel_drv.c index c239579..e7dc3f9 100644 --- a/sys/x86/iommu/intel_drv.c +++ b/sys/x86/iommu/intel_drv.c @@ -153,7 +153,7 @@ dmar_count_iter(ACPI_DMAR_HEADER *dmarh, void *arg) return (1); } -static int dmar_enable = 0; +static int dmar_enable = 1; static void dmar_identify(driver_t *driver, device_t parent) { diff --git a/sys/x86/iommu/intel_utils.c b/sys/x86/iommu/intel_utils.c index f696f9d..d3c3267 100644 --- a/sys/x86/iommu/intel_utils.c +++ b/sys/x86/iommu/intel_utils.c @@ -624,6 +624,7 @@ dmar_barrier_exit(struct dmar_unit *dmar, u_int barrier_id) } int dmar_match_verbose; +int dmar_batch_coalesce = 100; static SYSCTL_NODE(_hw, OID_AUTO, dmar, CTLFLAG_RD, NULL, ""); SYSCTL_INT(_hw_dmar, OID_AUTO, tbl_pagecnt, CTLFLAG_RD, @@ -632,6 +633,9 @@ SYSCTL_INT(_hw_dmar, OID_AUTO, tbl_pagecnt, CTLFLAG_RD, SYSCTL_INT(_hw_dmar, OID_AUTO, match_verbose, CTLFLAG_RWTUN, &dmar_match_verbose, 0, "Verbose matching of the PCI devices to DMAR paths"); +SYSCTL_INT(_hw_dmar, OID_AUTO, batch_coalesce, CTLFLAG_RW | CTLFLAG_TUN, + &dmar_batch_coalesce, 0, + "Number of qi batches between interrupt"); #ifdef INVARIANTS int dmar_check_free; SYSCTL_INT(_hw_dmar, OID_AUTO, check_free, CTLFLAG_RWTUN,