From owner-freebsd-stable@freebsd.org Sat Nov 26 22:37:48 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0DCDC5680B for ; Sat, 26 Nov 2016 22:37:48 +0000 (UTC) (envelope-from jason.harmening@gmail.com) Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com [IPv6:2a00:1450:4010:c07::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 345D9380 for ; Sat, 26 Nov 2016 22:37:48 +0000 (UTC) (envelope-from jason.harmening@gmail.com) Received: by mail-lf0-x229.google.com with SMTP id o141so71958325lff.1 for ; Sat, 26 Nov 2016 14:37:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pW+vPmPSBhIqgpQKiMtkBMDgNZm7dHcoaIyuQ4Fs0XY=; b=gWrXpVoZM/YzKL34wLei47X4goxuXQ5WTLnZ168JOtFvtrIyKQLcBi93j3fSt55yLK ZhaBh5X1VxNaDiCnK7bXUIFUKwXbV+2zs/wv/SonkMJndRwCWNMRc45WoKNzI8qFevco YZwpps2Pie+laMS2PONORrKnjqRIj+CDITb1HenP0+kb3kpiTZs9h50W9ClL+PeUWDRm 470PFM24uI1RcWE7bCuDIwNFycUlFnn7ahE/MdpUyI4I/j8OJVAF/MlTsxR9NE9tePD/ pOFGC1LzAhEqmiC+jPd38Ccs26Bh4d0TyyDApGhwHv+9S0Knwv/pD+fScOK8e67qw7kT w1uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pW+vPmPSBhIqgpQKiMtkBMDgNZm7dHcoaIyuQ4Fs0XY=; b=Z4OS4m0Az4YIt/v+n1RS0wY0Qm4/7rWnk0vpmJCR0LkW5cBuVTso5d5LfqcQ5lQH+6 6a6FOMDlTK6ejk2K4eXa9sI93u6JiY8PKLNaP3NSXn56aPH76PnDE0miMtqxX9RimkSU CsS/RneG0fAB53LlBimSZjHHZOBDfoH7QvSOqCHrgR/XRF43EtHsNE3ESJlYnXSN2l7A KzXCsOc6+sheiYdk+VNGzgoPhPjxKc69Ke2GV6qL7mzIr0kMQlvdKaEK+n3MFOu7jdUs mXC1GqoYrSiP7qjHQfd25pH7kG97n13Hmuzk0lSuu5hkwl4njsIE6jlIWKi3600Ca+eS EZvA== X-Gm-Message-State: AKaTC004EZUCDHrjQ0Ls85s/NmhtdpBm+2NL2xktylHT2KJ2Tw0yWaPjUNmMmmY7swbZifn8HFHp1DTd4E+oNQ== X-Received: by 10.46.1.93 with SMTP id 90mr6614201ljb.30.1480199866138; Sat, 26 Nov 2016 14:37:46 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.193.16 with HTTP; Sat, 26 Nov 2016 14:37:45 -0800 (PST) In-Reply-To: References: <6167392c-c37a-6e39-aa22-ca45435d6088@gmail.com> <20161102075509.GF54029@kib.kiev.ua> <3620f62e-0f4c-2d62-dcf8-e2fdff459250@gmail.com> <20161102162808.GI54029@kib.kiev.ua> <20161125092503.GZ54029@kib.kiev.ua> From: Jason Harmening Date: Sat, 26 Nov 2016 14:37:45 -0800 Message-ID: Subject: Re: huge nanosleep variance on 11-stable To: Konstantin Belousov Cc: FreeBSD-STABLE Mailing List Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Nov 2016 22:37:48 -0000 I can confirm this patch works. HPET is now chosen over LAPIC as the eventtimer source, and the system works smoothly without disabling C2 or mwait. On Fri, Nov 25, 2016 at 4:12 AM, Jason Harmening wrote: > On Fri, Nov 25, 2016 at 1:25 AM, Konstantin Belousov > wrote: > >> On Wed, Nov 02, 2016 at 06:28:08PM +0200, Konstantin Belousov wrote: >> > On Wed, Nov 02, 2016 at 09:18:15AM -0700, Jason Harmening wrote: >> > > I think you are probably right. Hacking out the Intel-specific >> > > additions to C-state parsing in acpi_cpu_cx_cst() from r282678 (thus >> > > going back to sti;hlt instead of monitor+mwait at C1) fixed the >> problem >> > > for me. But r282678 also had the effect of enabling C2 and C3 on my >> > > system, because ACPI only presents MWAIT entries for those states and >> > > not p_lvlx. >> > You can do the same with "debug.acpi.disabled=mwait" loader tunable >> > without hacking the code. And set sysctl hw.acpi.cpu.cx_lowest to C1 to >> > enforce use of hlt instruction even when mwait states were requested. >> >> I believe I now understood the problem. First, I got the definitive >> confirmation that LAPIC timer on Nehalems is stopped in any C mode >> higher than C1/C1E, i.e. even if C2 is enabled LAPIC eventtimer cannot >> be used. This is consistent with the ARAT CPUID bit CPUID[0x6].eax[2] >> reported zero. >> >> On SandyBridge and IvyBridge CPUs, it seems that ARAT might be both 0 >> and 1 according to the same source, but all CPUs I saw have ARAT = 1. >> And for Haswell and later generations, ARAT is claimed to be always >> implemented. >> >> The actual issue is somewhat silly bug, I must admit: if ncpus >= 8, and >> non-FSB interrupt routing from HPET, default HPET eventtimer quality 450 >> is reduced by 100, i.e. it is 350. OTOH, LAPIC default quality is 600 >> and it is reduced by 200 if ARAT is not reported. We end up with HPET >> quality 350 < LAPIC quality 400, despite ARAT is not set. >> >> The patch below sets LAPIC eventtimer quality to 100 if not ARAT. Also >> I realized that there is no reason to disable deadline mode regardless >> of ARAT. >> >> diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c >> index d9a3453..1b1547d 100644 >> --- a/sys/x86/x86/local_apic.c >> +++ b/sys/x86/x86/local_apic.c >> @@ -478,8 +478,9 @@ native_lapic_init(vm_paddr_t addr) >> lapic_et.et_quality = 600; >> if (!arat) { >> lapic_et.et_flags |= ET_FLAGS_C3STOP; >> - lapic_et.et_quality -= 200; >> - } else if ((cpu_feature & CPUID_TSC) != 0 && >> + lapic_et.et_quality = 100; >> + } >> + if ((cpu_feature & CPUID_TSC) != 0 && >> (cpu_feature2 & CPUID2_TSCDLT) != 0 && >> tsc_is_invariant && tsc_freq != 0) { >> lapic_timer_tsc_deadline = 1; >> >> Ah, that makes sense. Thanks! > > I'll try the patch as soon as I get back from vacation. I've been able to > verify that setting cx_lowest and disabling mwait fixes the problem without > hacking the code. But I've been too busy at $(WORK) to check anything > else, namely whether forcing HPET would also fix the problem. > >