From owner-freebsd-emulation@FreeBSD.ORG  Tue Nov 27 00:45:53 2012
Return-Path: <owner-freebsd-emulation@FreeBSD.ORG>
Delivered-To: freebsd-emulation@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id A71731AC;
 Tue, 27 Nov 2012 00:45:53 +0000 (UTC)
 (envelope-from alexclear@gmail.com)
Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com
 [209.85.212.54])
 by mx1.freebsd.org (Postfix) with ESMTP id E62BA8FC08;
 Tue, 27 Nov 2012 00:45:52 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id l1so4412004vba.13
 for <multiple recipients>; Mon, 26 Nov 2012 16:45:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=nj59BYJBM6NHXZhY2VFQVs05do6vN67dps5E7cB5nOQ=;
 b=n3QeLzeR8hoyKsuC+lLgG7S1T2L6aCjAPskeTbfO3pRNASpQ0wNxorxmqraROnYevf
 y6Wj/2NGss8uuBZdPyewGeldOSZetx4LYMq5wnlOxn1lnSD6PEo91Fx/iT7gOHXOMRuh
 C2gRyc5SahIl5OpjHn3QRs8GUovFsPhBM2V87QqOTQD4sLK+4igf3SY9wf6JwFq10FW9
 PndieDWDX75Ex6C4jaK1nQtMeLXWEyzGsTygdISdBhL03er/jc9jM/L/PKfRobJpf1V/
 gpBQDYtXvBwrXHb3IWBPElPQTCzFjmOvrSe+ginXbhCoObedLTrJbTEmsnm1kwIzlQd8
 7kVA==
MIME-Version: 1.0
Received: by 10.220.240.80 with SMTP id kz16mr21857199vcb.7.1353977152107;
 Mon, 26 Nov 2012 16:45:52 -0800 (PST)
Received: by 10.220.108.209 with HTTP; Mon, 26 Nov 2012 16:45:51 -0800 (PST)
In-Reply-To: <CA+kq2xvjDa1BeuzPUuH99bgriEA-GJH36AZGiqScKSo4QZmHDg@mail.gmail.com>
References: <CA+kq2xvh3j5CM7UzRVfXCeLhHwpTY+_M7dCJx0c27NtV8EVJwg@mail.gmail.com>
 <CAE-m3X1UPsy+wbqm_02JpXMr-UO3m7N6z_ZwY2HNo4GL0YUi1w@mail.gmail.com>
 <CA+kq2xva61m_bHdzBZM2TYL5z7XiohvkxsYWtOyoBwQkpyvp0A@mail.gmail.com>
 <50AFAD05.1050604@FreeBSD.org>
 <CA+kq2xv+U4ZnfK=1js4PRaNpTNdW-y-G50GV4+MVP0LugBf1pQ@mail.gmail.com>
 <50B25C17.20208@FreeBSD.org>
 <CA+kq2xvjDa1BeuzPUuH99bgriEA-GJH36AZGiqScKSo4QZmHDg@mail.gmail.com>
Date: Tue, 27 Nov 2012 03:45:51 +0300
Message-ID: <CA+kq2xtxwC+U15C6pO=tVQcDqFk07Zqivrt_9=fwzn5hMaHi+A@mail.gmail.com>
Subject: Re: VirtualBox 4.2.4 on FreeBSD 9.1-PRERELEASE problem: VMs behave
 very different when pinned to different cores
From: Alex Chistyakov <alexclear@gmail.com>
To: "freebsd-emulation@freebsd.org" <freebsd-emulation@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Alexander Motin <mav@freebsd.org>, Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-emulation@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Development of Emulators of other operating systems
 <freebsd-emulation.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-emulation>, 
 <mailto:freebsd-emulation-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-emulation>
List-Post: <mailto:freebsd-emulation@freebsd.org>
List-Help: <mailto:freebsd-emulation-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-emulation>, 
 <mailto:freebsd-emulation-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Nov 2012 00:45:53 -0000

On Mon, Nov 26, 2012 at 11:10 AM, Alex Chistyakov <alexclear@gmail.com> wrote:
> On Sun, Nov 25, 2012 at 9:57 PM, Andriy Gapon <avg@freebsd.org> wrote:
>> on 24/11/2012 00:17 Alex Chistyakov said the following:
>>> On Fri, Nov 23, 2012 at 9:06 PM, Andriy Gapon <avg@freebsd.org> wrote:
>>>>
>>>> I've cc-ed Alexander who is deeply familiar with both the scheduler and the timer
>>>> code.
>>>> I think that it would be nice to get ktr(4) information suitable for use with
>>>> schedgraph (please google for these keywords).
>>>
>>> I collected two samples and put them here: http://1888.spb.ru/samples.zip
>>> sched-cpu0.ktr is for a VM running on CPU #0 and sched-cpu1.ktr is for
>>> a VM running on CPU #1
>>> They seem to be very different.
>>
>> It looks like you didn't stop ktr tracing before running ktrdump or something
>> like that.  schedgraph can not grok the files because it believes that the
>> timestamps are incorrect.
>>
>> # - While the workload is continuing (i.e. before it finishes), disable
>> #   KTR tracing by setting 'sysctl debug.ktr.mask=0'.  This is necessary
>> #   to avoid a race condition while running ktrdump, i.e. the KTR ring buffer
>> #   will cycle a bit while ktrdump runs, and this confuses schedgraph because
>> #   the timestamps appear to go backwards at some point.
>>>>  Also, version of your kernel,
>>>
>>> kern.version: FreeBSD 9.1-PRERELEASE #4: Fri Nov 23 22:38:47 MSK 2012
>>> Sources were grabbed on Nov, 16.
>>>
>>>> output of sysctls kern.eventtimer and kern.sched.
>>>
>>> kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440)
>>> i8254(100) RTC(0)
>>> kern.eventtimer.et.LAPIC.flags: 7
>>> kern.eventtimer.et.LAPIC.frequency: 50002806
>>> kern.eventtimer.et.LAPIC.quality: 600
>>> kern.eventtimer.et.RTC.flags: 17
>>> kern.eventtimer.et.RTC.frequency: 32768
>>> kern.eventtimer.et.RTC.quality: 0
>>> kern.eventtimer.et.i8254.flags: 1
>>> kern.eventtimer.et.i8254.frequency: 1193182
>>> kern.eventtimer.et.i8254.quality: 100
>>> kern.eventtimer.et.HPET.flags: 7
>>> kern.eventtimer.et.HPET.frequency: 14318180
>>> kern.eventtimer.et.HPET.quality: 550
>>> kern.eventtimer.et.HPET1.flags: 3
>>> kern.eventtimer.et.HPET1.frequency: 14318180
>>> kern.eventtimer.et.HPET1.quality: 440
>>> kern.eventtimer.et.HPET2.flags: 3
>>> kern.eventtimer.et.HPET2.frequency: 14318180
>>> kern.eventtimer.et.HPET2.quality: 440
>>> kern.eventtimer.periodic: 0
>>> kern.eventtimer.timer: LAPIC
>>> kern.eventtimer.activetick: 1
>>> kern.eventtimer.idletick: 0
>>> kern.eventtimer.singlemul: 2
>>> kern.sched.cpusetsize: 8
>>> kern.sched.preemption: 1
>>> kern.sched.topology_spec: <groups>
>>> kern.sched.steal_thresh: 2
>>> kern.sched.steal_idle: 1
>>> kern.sched.balance_interval: 127
>>> kern.sched.balance: 1
>>> kern.sched.affinity: 1
>>> kern.sched.idlespinthresh: 16
>>> kern.sched.idlespins: 10000
>>> kern.sched.static_boost: 152
>>> kern.sched.preempt_thresh: 80
>>> kern.sched.interact: 30
>>> kern.sched.slice: 12
>>> kern.sched.quantum: 94488
>>> kern.sched.name: ULE
>>>
>>> I tried kern.eventtimer.periodic=1 and
>>> kern.timecounter.hardware=ACPI-fast but that did not help.
>>
>> Could you please also provide the CPU identification block from dmesg?
>
> CPU: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (3200.18-MHz K8-class CPU)
>   Origin = "GenuineIntel"  Id = 0x206d7  Family = 0x6  Model = 0x2d
> Stepping = 7
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0x1fbee3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   TSC: P-state invariant, performance statistics
>
>>
>>>> BTW, do you use the default ULE scheduler?
>>>
>>> Yep.
>>> I tried SCHED_4BSD and the situation became much better but not ideal.
>>> %si was around 3-7% on the guest and I had to boot with noacpi and
>>> disable the tickless kernel on the guest to lower it.
>>> At least I was able to run a VM on CPU #0 and all cores became equal.
>>
>> Interesting results.
>>
>>>> Also, is your kernel DTrace enabled?
>>
>> Could you please run the following script for some seconds and report its output
>> for both of the scenarios:
>>
>> profile:::profile-4001
>> {
>>         @stacks[pid, tid, execname, stack()] = count();
>> }
>>
>> END
>> {
>>         trunc(@stacks, 40);
>>         printa(@stacks);
>> }
>
> A DTrace result for "bad" core is at https://gist.github.com/4145782
> and a result for "good" core is at https://gist.github.com/4146946

Okay, I have finally found a way to utilize PMP and got some
stacktraces for "good" and "bad" VM runs.
The "good" results are here: https://gist.github.com/4151328 and the
"bad" ones are here: https://gist.github.com/4151343
I was sampling for 160 seconds both times, a sample per a second. I
used a script titled "Full technology demonstration" on
http://poormansprofiler.org/ so numbers on the left are total counts
of repeating stacktraces.
As you can see the second result has "select,select,listenerRun,??,??"
and "select,select,clientInput,??,??" samples so I guessed they can be
related to VNC/VRDE. I disabled VRDE and was able to boot the VM on
the "bad" core without major problems then.
But CPU usage remained quite high on the "bad" core and ping was very
unstable too so this was not the end of the story.
I collected PMP stats for the "bad" core with VRDE disabled:
https://gist.github.com/4151530
A lot of time seems to be spent in VMMR3HwAccRunGC call. Well, this
function is just a bridge for the guest code, so I'm not sure how to
interpret this fact.

Thank you,

--
SY,
Alex