From owner-freebsd-hackers@freebsd.org Fri Mar 18 21:07:50 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34083AD5EF3 for ; Fri, 18 Mar 2016 21:07:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 13660132A; Fri, 18 Mar 2016 21:07:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CA63AB94B; Fri, 18 Mar 2016 17:07:48 -0400 (EDT) From: John Baldwin To: "K. Macy" Cc: Stanislav Sedov , Adrian Chadd , Konstantin Belousov , Ryan Stone , "freebsd-hackers@freebsd.org" Subject: Re: Peformance issues with r278325 Date: Fri, 18 Mar 2016 14:05:39 -0700 Message-ID: <1938968.vRTG1GJntD@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: <3213721.SRyKE0LtiG@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 18 Mar 2016 17:07:48 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2016 21:07:50 -0000 On Friday, March 18, 2016 01:11:15 PM K. Macy wrote: > On Friday, March 18, 2016, John Baldwin wrote: > > > On Friday, March 18, 2016 11:58:19 AM Stanislav Sedov wrote: > > > > > > > On Mar 18, 2016, at 11:49 AM, Ryan Stone > > wrote: > > > > > > > > On Fri, Mar 18, 2016 at 1:37 PM, John Baldwin > > wrote: > > > > I think I'll likely just convert it to use a direct > > > > TSC delay loop always in HEAD (assuming that verifies ok in testing as > > well). > > > > > > > > Couldn't that work incorrectly on VM guests? The tsc is not > > guaranteed to be monotonic in that environment. > > > > > > > > > > Another concern is SMP systems where the clock is not synchronized. SMP > > TSC requires > > > a complicated setup procedure on the system boot which is not followed > > properly by all > > > vendors, and I have seen some recent systems with SMP TSC skew. > > > > > > I'm afraid that using TSC in this code will make FreeBSD unusable on > > such (arguably buggy) > > > systems. > > > > Eh, SMP does not matter here. DELAY() already uses TSC on FreeBSD. The > > current thread is pinned to a single CPU in lapic_ipi_wait(). The idea > > would be to do this: > > > > deadline = rdtsc() + freq * delay / 1000000; > > > > while (rdtsc() < deadline) { > > if (APIC_DELSTAT_IDLE) > > return (1); > > ia32_pause(); > > } > > > > > > > In a VM the thread is pinned to the _vcpu_. You have no control over what > physical core the underlying thread is running on. If the vcpu is migrated > to a different package there is a possibility of the next TSC reading be a > lower value. The rdtsc instruction reads the value of the physical core. > Although I doubt this is a problem in practice, it is a possibility if > vcpus are not pinned for the lifetime of the VM. Again, DELAY() already makes this assumption on FreeBSD/x86. If this is broken, it's broken in far, far many more places than this. In addition, a hypervisor is capable of handling this if it uses the TSC adjustment field in the VMCS. OTOH, TSC's on modern systems generally are in sync. Invariant TSCs are generated from a timer in the uncore on all the i[357] CPUs AFAIK, so they are identical across cores and threads within a package. They are also in sync across packages on all of the i[357] systems I have worked with (and generations prior to those) in the last 7-8 years (barring one machine I'm aware of that was a quad-socket i7 box where the BIOS overwrote the TSC of the first core in each package on each reboot to try to reset it to zero on each boot). -- John Baldwin