Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Nov 2012 03:45:51 +0300
From:      Alex Chistyakov <alexclear@gmail.com>
To:        "freebsd-emulation@freebsd.org" <freebsd-emulation@freebsd.org>
Cc:        Alexander Motin <mav@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: VirtualBox 4.2.4 on FreeBSD 9.1-PRERELEASE problem: VMs behave very different when pinned to different cores
Message-ID:  <CA%2Bkq2xtxwC%2BU15C6pO=tVQcDqFk07Zqivrt_9=fwzn5hMaHi%2BA@mail.gmail.com>
In-Reply-To: <CA%2Bkq2xvjDa1BeuzPUuH99bgriEA-GJH36AZGiqScKSo4QZmHDg@mail.gmail.com>
References:  <CA%2Bkq2xvh3j5CM7UzRVfXCeLhHwpTY%2B_M7dCJx0c27NtV8EVJwg@mail.gmail.com> <CAE-m3X1UPsy%2Bwbqm_02JpXMr-UO3m7N6z_ZwY2HNo4GL0YUi1w@mail.gmail.com> <CA%2Bkq2xva61m_bHdzBZM2TYL5z7XiohvkxsYWtOyoBwQkpyvp0A@mail.gmail.com> <50AFAD05.1050604@FreeBSD.org> <CA%2Bkq2xv%2BU4ZnfK=1js4PRaNpTNdW-y-G50GV4%2BMVP0LugBf1pQ@mail.gmail.com> <50B25C17.20208@FreeBSD.org> <CA%2Bkq2xvjDa1BeuzPUuH99bgriEA-GJH36AZGiqScKSo4QZmHDg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 26, 2012 at 11:10 AM, Alex Chistyakov <alexclear@gmail.com> wrote:
> On Sun, Nov 25, 2012 at 9:57 PM, Andriy Gapon <avg@freebsd.org> wrote:
>> on 24/11/2012 00:17 Alex Chistyakov said the following:
>>> On Fri, Nov 23, 2012 at 9:06 PM, Andriy Gapon <avg@freebsd.org> wrote:
>>>>
>>>> I've cc-ed Alexander who is deeply familiar with both the scheduler and the timer
>>>> code.
>>>> I think that it would be nice to get ktr(4) information suitable for use with
>>>> schedgraph (please google for these keywords).
>>>
>>> I collected two samples and put them here: http://1888.spb.ru/samples.zip
>>> sched-cpu0.ktr is for a VM running on CPU #0 and sched-cpu1.ktr is for
>>> a VM running on CPU #1
>>> They seem to be very different.
>>
>> It looks like you didn't stop ktr tracing before running ktrdump or something
>> like that.  schedgraph can not grok the files because it believes that the
>> timestamps are incorrect.
>>
>> # - While the workload is continuing (i.e. before it finishes), disable
>> #   KTR tracing by setting 'sysctl debug.ktr.mask=0'.  This is necessary
>> #   to avoid a race condition while running ktrdump, i.e. the KTR ring buffer
>> #   will cycle a bit while ktrdump runs, and this confuses schedgraph because
>> #   the timestamps appear to go backwards at some point.
>>>>  Also, version of your kernel,
>>>
>>> kern.version: FreeBSD 9.1-PRERELEASE #4: Fri Nov 23 22:38:47 MSK 2012
>>> Sources were grabbed on Nov, 16.
>>>
>>>> output of sysctls kern.eventtimer and kern.sched.
>>>
>>> kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440)
>>> i8254(100) RTC(0)
>>> kern.eventtimer.et.LAPIC.flags: 7
>>> kern.eventtimer.et.LAPIC.frequency: 50002806
>>> kern.eventtimer.et.LAPIC.quality: 600
>>> kern.eventtimer.et.RTC.flags: 17
>>> kern.eventtimer.et.RTC.frequency: 32768
>>> kern.eventtimer.et.RTC.quality: 0
>>> kern.eventtimer.et.i8254.flags: 1
>>> kern.eventtimer.et.i8254.frequency: 1193182
>>> kern.eventtimer.et.i8254.quality: 100
>>> kern.eventtimer.et.HPET.flags: 7
>>> kern.eventtimer.et.HPET.frequency: 14318180
>>> kern.eventtimer.et.HPET.quality: 550
>>> kern.eventtimer.et.HPET1.flags: 3
>>> kern.eventtimer.et.HPET1.frequency: 14318180
>>> kern.eventtimer.et.HPET1.quality: 440
>>> kern.eventtimer.et.HPET2.flags: 3
>>> kern.eventtimer.et.HPET2.frequency: 14318180
>>> kern.eventtimer.et.HPET2.quality: 440
>>> kern.eventtimer.periodic: 0
>>> kern.eventtimer.timer: LAPIC
>>> kern.eventtimer.activetick: 1
>>> kern.eventtimer.idletick: 0
>>> kern.eventtimer.singlemul: 2
>>> kern.sched.cpusetsize: 8
>>> kern.sched.preemption: 1
>>> kern.sched.topology_spec: <groups>
>>> kern.sched.steal_thresh: 2
>>> kern.sched.steal_idle: 1
>>> kern.sched.balance_interval: 127
>>> kern.sched.balance: 1
>>> kern.sched.affinity: 1
>>> kern.sched.idlespinthresh: 16
>>> kern.sched.idlespins: 10000
>>> kern.sched.static_boost: 152
>>> kern.sched.preempt_thresh: 80
>>> kern.sched.interact: 30
>>> kern.sched.slice: 12
>>> kern.sched.quantum: 94488
>>> kern.sched.name: ULE
>>>
>>> I tried kern.eventtimer.periodic=1 and
>>> kern.timecounter.hardware=ACPI-fast but that did not help.
>>
>> Could you please also provide the CPU identification block from dmesg?
>
> CPU: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (3200.18-MHz K8-class CPU)
>   Origin = "GenuineIntel"  Id = 0x206d7  Family = 0x6  Model = 0x2d
> Stepping = 7
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0x1fbee3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x1<LAHF>
>   TSC: P-state invariant, performance statistics
>
>>
>>>> BTW, do you use the default ULE scheduler?
>>>
>>> Yep.
>>> I tried SCHED_4BSD and the situation became much better but not ideal.
>>> %si was around 3-7% on the guest and I had to boot with noacpi and
>>> disable the tickless kernel on the guest to lower it.
>>> At least I was able to run a VM on CPU #0 and all cores became equal.
>>
>> Interesting results.
>>
>>>> Also, is your kernel DTrace enabled?
>>
>> Could you please run the following script for some seconds and report its output
>> for both of the scenarios:
>>
>> profile:::profile-4001
>> {
>>         @stacks[pid, tid, execname, stack()] = count();
>> }
>>
>> END
>> {
>>         trunc(@stacks, 40);
>>         printa(@stacks);
>> }
>
> A DTrace result for "bad" core is at https://gist.github.com/4145782
> and a result for "good" core is at https://gist.github.com/4146946

Okay, I have finally found a way to utilize PMP and got some
stacktraces for "good" and "bad" VM runs.
The "good" results are here: https://gist.github.com/4151328 and the
"bad" ones are here: https://gist.github.com/4151343
I was sampling for 160 seconds both times, a sample per a second. I
used a script titled "Full technology demonstration" on
http://poormansprofiler.org/ so numbers on the left are total counts
of repeating stacktraces.
As you can see the second result has "select,select,listenerRun,??,??"
and "select,select,clientInput,??,??" samples so I guessed they can be
related to VNC/VRDE. I disabled VRDE and was able to boot the VM on
the "bad" core without major problems then.
But CPU usage remained quite high on the "bad" core and ping was very
unstable too so this was not the end of the story.
I collected PMP stats for the "bad" core with VRDE disabled:
https://gist.github.com/4151530
A lot of time seems to be spent in VMMR3HwAccRunGC call. Well, this
function is just a bridge for the guest code, so I'm not sure how to
interpret this fact.

Thank you,

--
SY,
Alex



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bkq2xtxwC%2BU15C6pO=tVQcDqFk07Zqivrt_9=fwzn5hMaHi%2BA>