Date: Thu, 26 Jun 2014 15:43:14 -0700 From: Neel Natu <neelnatu@gmail.com> To: Sean Bruno <sbruno@freebsd.org> Cc: "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org> Subject: Re: jenkins bhyve vms crashing and burning after several days of use Message-ID: <CAFgRE9HpA_LQStzPYpDUU0erqNp%2BKOrjwK%2B7A7RGfD7XTCi1Hg@mail.gmail.com> In-Reply-To: <1403821402.2417.12.camel@bruno> References: <1403818926.2417.6.camel@bruno> <1403819194.2417.8.camel@bruno> <CAFgRE9GYHzenX7px6-Sp6BfeTVA0-jcwg=JgcGXKuBeFJXUoog@mail.gmail.com> <1403821402.2417.12.camel@bruno>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Sean, On Thu, Jun 26, 2014 at 3:23 PM, Sean Bruno <sbruno@ignoranthack.me> wrote: > On Thu, 2014-06-26 at 15:00 -0700, Neel Natu wrote: >> Hi Sean, >> >> On Thu, Jun 26, 2014 at 2:46 PM, Sean Bruno <sbruno@ignoranthack.me> wrote: >> > On Thu, 2014-06-26 at 14:42 -0700, Sean Bruno wrote: >> >> so, we're seeing the bhyve vms running in the freebsd cluster for >> >> jenkins crashing and burning after a couple of days of use. >> >> >> >> vm exit[9] >> >> reason VMX >> >> rip 0x0000000029286336 >> >> inst_length 3 >> >> status 0 >> >> exit_reason 49 >> >> qualification 0x0000000000000000 >> >> inst_type 0 >> >> inst_error 0 >> >> >> >> >> >> It looks like we have an active core file on havoc.ysv if you have a >> >> moment to look at it: >> >> >> >> http://people.freebsd.org/~sbruno/bhyve.core >> >> >> >> FreeBSD havoc.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #2 >> >> r267362: Wed Jun 11 14:56:34 UTC 2014 >> >> sbruno@havoc.freebsd.org:/usr/obj/usr/src/sys/HAVOC amd64 >> >> >> > >> > Also, from chaos.ysv >> > >> > http://people.freebsd.org/~sbruno/bhyve.core.chaos >> > >> > FreeBSD chaos.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 >> > r267362: Wed Jun 11 15:50:24 UTC 2014 >> > sbruno@chaos.ysv.freebsd.org:/usr/obj/usr/src/sys/CHAOS amd64 >> > >> >> Can you tell us the processor and memory configuration on havoc and chaos? >> >> Also, could you execute the following commands on havoc: >> >> # bhyvectl --vm=vmname --cpu=9 --get-vmcs-guest-physical-address >> -- this will output the offending guest physical address that >> triggered the EPT misconfiguration >> >> # bhyvectl --vm=vmname --get-gpa-pmap=<gpa_from_above> >> -- this will output the page table entries in the EPT that map to the >> offending GPA >> >> Hopefully that provides us with something to work with. >> >> best >> Neel >> >> > > > chaos: > CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz (2200.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x206d6 Family=0x6 Model=0x2d Stepping=6 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> > AMD Features2=0x1<LAHF> > TSC: P-state invariant, performance statistics > avail memory = 66298322944 (63227 MB) > > havoc: > FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 > CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2400.14-MHz > K8-class CPU) > Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI> > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> > AMD Features2=0x1<LAHF> > TSC: P-state invariant, performance statistics > avail memory = 16571621376 (15803 MB) > Thanks, we'll see if there are relevant errata for these processors. > > There appear to be three vms running on havoc: > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x300002c936e007 0x300002c9353007 0x300002c9352007 0 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x30000286cb0007 0x300003ad105007 0x3000019b1fd007 0 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x300002c9348007 0x300002c9339007 0 > > > But there's no information available on chaos at the moment as there are > no active vms running. > Sorry, I should explained a bit more. After a bhyve(8) exits because of the EPT misconfiguration error there are breadcrumbs left over in the VMCS as well as the nested page tables. We can use them to diagnose what happened. The bhyvectl commands above should be executed after the VM exits but before it is restarted again. Once it restarts, the breadcrumbs get written over and are of no use. The "--vm=<vmname>" passed to the bhyvectl command should be of the virtual machine that crashed. The "--cpu=<vcpuid>" passed to the bhyvectl command should be the vcpuid that detected the EPT misconfiguration. The reason I used '9' as an example above was because you saw this on the console: vm exit[9] reason VMX rip 0x0000000029286336 Hope that helps. best Neel > sean >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFgRE9HpA_LQStzPYpDUU0erqNp%2BKOrjwK%2B7A7RGfD7XTCi1Hg>