From owner-freebsd-stable@freebsd.org Fri Nov 25 09:25:09 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0F9B7C538CF for ; Fri, 25 Nov 2016 09:25:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AE2781474 for ; Fri, 25 Nov 2016 09:25:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uAP9P39x050082 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 25 Nov 2016 11:25:03 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uAP9P39x050082 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uAP9P3Au050081; Fri, 25 Nov 2016 11:25:03 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 25 Nov 2016 11:25:03 +0200 From: Konstantin Belousov To: Jason Harmening Cc: freebsd-stable@freebsd.org Subject: Re: huge nanosleep variance on 11-stable Message-ID: <20161125092503.GZ54029@kib.kiev.ua> References: <6167392c-c37a-6e39-aa22-ca45435d6088@gmail.com> <20161102075509.GF54029@kib.kiev.ua> <3620f62e-0f4c-2d62-dcf8-e2fdff459250@gmail.com> <20161102162808.GI54029@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161102162808.GI54029@kib.kiev.ua> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Nov 2016 09:25:09 -0000 On Wed, Nov 02, 2016 at 06:28:08PM +0200, Konstantin Belousov wrote: > On Wed, Nov 02, 2016 at 09:18:15AM -0700, Jason Harmening wrote: > > I think you are probably right. Hacking out the Intel-specific > > additions to C-state parsing in acpi_cpu_cx_cst() from r282678 (thus > > going back to sti;hlt instead of monitor+mwait at C1) fixed the problem > > for me. But r282678 also had the effect of enabling C2 and C3 on my > > system, because ACPI only presents MWAIT entries for those states and > > not p_lvlx. > You can do the same with "debug.acpi.disabled=mwait" loader tunable > without hacking the code. And set sysctl hw.acpi.cpu.cx_lowest to C1 to > enforce use of hlt instruction even when mwait states were requested. I believe I now understood the problem. First, I got the definitive confirmation that LAPIC timer on Nehalems is stopped in any C mode higher than C1/C1E, i.e. even if C2 is enabled LAPIC eventtimer cannot be used. This is consistent with the ARAT CPUID bit CPUID[0x6].eax[2] reported zero. On SandyBridge and IvyBridge CPUs, it seems that ARAT might be both 0 and 1 according to the same source, but all CPUs I saw have ARAT = 1. And for Haswell and later generations, ARAT is claimed to be always implemented. The actual issue is somewhat silly bug, I must admit: if ncpus >= 8, and non-FSB interrupt routing from HPET, default HPET eventtimer quality 450 is reduced by 100, i.e. it is 350. OTOH, LAPIC default quality is 600 and it is reduced by 200 if ARAT is not reported. We end up with HPET quality 350 < LAPIC quality 400, despite ARAT is not set. The patch below sets LAPIC eventtimer quality to 100 if not ARAT. Also I realized that there is no reason to disable deadline mode regardless of ARAT. diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c index d9a3453..1b1547d 100644 --- a/sys/x86/x86/local_apic.c +++ b/sys/x86/x86/local_apic.c @@ -478,8 +478,9 @@ native_lapic_init(vm_paddr_t addr) lapic_et.et_quality = 600; if (!arat) { lapic_et.et_flags |= ET_FLAGS_C3STOP; - lapic_et.et_quality -= 200; - } else if ((cpu_feature & CPUID_TSC) != 0 && + lapic_et.et_quality = 100; + } + if ((cpu_feature & CPUID_TSC) != 0 && (cpu_feature2 & CPUID2_TSCDLT) != 0 && tsc_is_invariant && tsc_freq != 0) { lapic_timer_tsc_deadline = 1;