From owner-freebsd-virtualization@FreeBSD.ORG Thu Jun 26 22:43:15 2014 Return-Path: Delivered-To: freebsd-virtualization@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B06CC456; Thu, 26 Jun 2014 22:43:15 +0000 (UTC) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 626FA20FD; Thu, 26 Jun 2014 22:43:15 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id r5so3841667qcx.11 for ; Thu, 26 Jun 2014 15:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=9geydb0V3Ca1W9MhVZ7detFE0Uo8dhHey8YmB7MAxAc=; b=HqQfudr8yv6+M0DcIEzkLS1mkaQg8oKUWL1RXJDbD13xi+j85gdqWnOWgu9drsPX+p BBGsXSRYMycOzc5d2cl7WFrOPCiJ+NBufdMKDvYn/jw1ZGfSTgm4h9Fw5isS+3sKAhlm fNWra6sURHwXb39JFxPeFsgmbXs9Mpd6M7OXst4UY0S1g426UxzEDg/Ax1EkiQFUj43u LgtAKri6WwRebbt6mi7TV3V11K3c+IbzWDRzZWNb8IdgaxcLeW0RqnY5dUlGZ95EWIg0 HxbcXW0OsMhAXn/8+J5U4HIyDh/XQkSnMzflSmmYIAxJ6Wc0xOU72x8/gTa1x8zVuNHy bP9g== MIME-Version: 1.0 X-Received: by 10.224.40.194 with SMTP id l2mr27800987qae.81.1403822594526; Thu, 26 Jun 2014 15:43:14 -0700 (PDT) Received: by 10.140.48.37 with HTTP; Thu, 26 Jun 2014 15:43:14 -0700 (PDT) In-Reply-To: <1403821402.2417.12.camel@bruno> References: <1403818926.2417.6.camel@bruno> <1403819194.2417.8.camel@bruno> <1403821402.2417.12.camel@bruno> Date: Thu, 26 Jun 2014 15:43:14 -0700 Message-ID: Subject: Re: jenkins bhyve vms crashing and burning after several days of use From: Neel Natu To: Sean Bruno Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-virtualization@freebsd.org" X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jun 2014 22:43:15 -0000 Hi Sean, On Thu, Jun 26, 2014 at 3:23 PM, Sean Bruno wrote: > On Thu, 2014-06-26 at 15:00 -0700, Neel Natu wrote: >> Hi Sean, >> >> On Thu, Jun 26, 2014 at 2:46 PM, Sean Bruno wrote: >> > On Thu, 2014-06-26 at 14:42 -0700, Sean Bruno wrote: >> >> so, we're seeing the bhyve vms running in the freebsd cluster for >> >> jenkins crashing and burning after a couple of days of use. >> >> >> >> vm exit[9] >> >> reason VMX >> >> rip 0x0000000029286336 >> >> inst_length 3 >> >> status 0 >> >> exit_reason 49 >> >> qualification 0x0000000000000000 >> >> inst_type 0 >> >> inst_error 0 >> >> >> >> >> >> It looks like we have an active core file on havoc.ysv if you have a >> >> moment to look at it: >> >> >> >> http://people.freebsd.org/~sbruno/bhyve.core >> >> >> >> FreeBSD havoc.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #2 >> >> r267362: Wed Jun 11 14:56:34 UTC 2014 >> >> sbruno@havoc.freebsd.org:/usr/obj/usr/src/sys/HAVOC amd64 >> >> >> > >> > Also, from chaos.ysv >> > >> > http://people.freebsd.org/~sbruno/bhyve.core.chaos >> > >> > FreeBSD chaos.ysv.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 >> > r267362: Wed Jun 11 15:50:24 UTC 2014 >> > sbruno@chaos.ysv.freebsd.org:/usr/obj/usr/src/sys/CHAOS amd64 >> > >> >> Can you tell us the processor and memory configuration on havoc and chaos? >> >> Also, could you execute the following commands on havoc: >> >> # bhyvectl --vm=vmname --cpu=9 --get-vmcs-guest-physical-address >> -- this will output the offending guest physical address that >> triggered the EPT misconfiguration >> >> # bhyvectl --vm=vmname --get-gpa-pmap= >> -- this will output the page table entries in the EPT that map to the >> offending GPA >> >> Hopefully that provides us with something to work with. >> >> best >> Neel >> >> > > > chaos: > CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz (2200.05-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x206d6 Family=0x6 Model=0x2d Stepping=6 > Features=0xbfebfbff > Features2=0x1fbee3ff > AMD Features=0x2c100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > avail memory = 66298322944 (63227 MB) > > havoc: > FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 > CPU: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2400.14-MHz > K8-class CPU) > Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 > Features=0xbfebfbff > Features2=0x29ee3ff > AMD Features=0x2c100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > avail memory = 16571621376 (15803 MB) > Thanks, we'll see if there are relevant errata for these processors. > > There appear to be three vms running on havoc: > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9 > --get-vmcs-guest-physical-address > gpa[9] 0x0000000000000000 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm1 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x300002c936e007 0x300002c9353007 0x300002c9352007 0 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm2 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x30000286cb0007 0x300003ad105007 0x3000019b1fd007 0 > > root@havoc.ysv:/home/sbruno # bhyvectl --vm=vm3 --cpu=9 > --get-gpa-pmap=0x0000000000000000 > gpa 0: 0x300002c9348007 0x300002c9339007 0 > > > But there's no information available on chaos at the moment as there are > no active vms running. > Sorry, I should explained a bit more. After a bhyve(8) exits because of the EPT misconfiguration error there are breadcrumbs left over in the VMCS as well as the nested page tables. We can use them to diagnose what happened. The bhyvectl commands above should be executed after the VM exits but before it is restarted again. Once it restarts, the breadcrumbs get written over and are of no use. The "--vm=" passed to the bhyvectl command should be of the virtual machine that crashed. The "--cpu=" passed to the bhyvectl command should be the vcpuid that detected the EPT misconfiguration. The reason I used '9' as an example above was because you saw this on the console: vm exit[9] reason VMX rip 0x0000000029286336 Hope that helps. best Neel > sean >