From owner-freebsd-virtualization@freebsd.org Wed Jul 25 00:02:50 2018 Return-Path: Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E18C8105992F for ; Wed, 25 Jul 2018 00:02:49 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 67E3885C59 for ; Wed, 25 Jul 2018 00:02:49 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 251B1105992E; Wed, 25 Jul 2018 00:02:49 +0000 (UTC) Delivered-To: virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DCF7A105992D for ; Wed, 25 Jul 2018 00:02:48 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3F0FF85C57; Wed, 25 Jul 2018 00:02:48 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lj1-x234.google.com with SMTP id 203-v6so5081100ljj.13; Tue, 24 Jul 2018 17:02:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=nu8DJv13MxsHp5xvSMN5P3AgZIjDuL6a6x1LGD0oHrY=; b=OzYylUw/UfZdxCcsl1MWt2PDSWrW4SqkWmX7wVBs/N7hPUT0C62RGQAoNcKrfDdubM wJjsdoqB5hdJHMpf898hb2g9wdPb9xn+tBf/dUY/kSRMIShQ0KhG77TSFXgn1jk1pDjk IIJa4U3yzGJHUERHMRo75W6btiFhkmN+kxi27raCR/l2Qu3zqwh77sxlUNHbHZ/241Fn psHthLzxGyNBfSUb51fD57Tv8nTOK+h4hphI4meuoJ3QXFvQ/1u0I5zNe8RHGoamalMg 4UKWefutb3QZO/Aor1KUPje8+HRFAfz5F44RhwSrnABCpSc4hX81NFng5cyK7SoojqKj HUxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=nu8DJv13MxsHp5xvSMN5P3AgZIjDuL6a6x1LGD0oHrY=; b=W/6SXozl1foMe9GeJ2Smi4XyE+hHum+s1oeKWME8bL5Qqyz/jHMVCVTW2jnlVyrvD2 Q/na5jrmuIHC5X3BFwvHbM3DlWIML89aXUOB0CbHfygZSF4Ad6VgcJJhfa7UkDOLszCx EBAUsTFs63WxbEcs+Lfi5sp55eCSjuLazVe6xWRSz0haWINPlkhOU+fv8xwevdHNc0wb iKdB2idvwZm/v4vskBgWcksieG9ujudIsjv/0htnvKZ5hsfQhU2uz+t8fTlv75lbNyvs xmK4kETOnvTKE4eCcielO75Mo+yC7K2MN1rAaOSwprl6LhMctXLhs0JceaXJBPQq9jae UkJg== X-Gm-Message-State: AOUpUlFZymwluTUQ2wD4FTLkpR+smKQ0KYJ+Sc7cAB+i9MAmpOeDYh1p upEVC5d9RNwqww2tDPuM0XL6zfMLfsaFGRznLC/g98YA X-Google-Smtp-Source: AAOMgpde2OuRd6h7GaV+zVFHHyvsHY6W2TsZfBzCb2sgWombu2i8SzvUcCDts61q76EZn3UTK+WqZKnsjBXSbplB4rQ= X-Received: by 2002:a2e:1517:: with SMTP id s23-v6mr14104070ljd.73.1532476966562; Tue, 24 Jul 2018 17:02:46 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 2002:ab3:7851:0:0:0:0:0 with HTTP; Tue, 24 Jul 2018 17:02:45 -0700 (PDT) In-Reply-To: References: From: Alan Somers Date: Tue, 24 Jul 2018 18:02:45 -0600 X-Google-Sender-Auth: WiTtqnsFS5kcoO2d0PDdiEKxfVE Message-ID: Subject: Re: Overcommitting CPUs with BHyve? To: Alan Somers Cc: "freebsd-virtualization@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2018 00:02:50 -0000 An anonymous BHyve expert has explained things to me off-list. Details below. On Tue, Jul 24, 2018 at 3:30 PM, Alan Somers wrote: > What are people's experiences with overcommitting CPUs in BHyve? I have > an 8-core machine that often runs VMs totalling up to 5 allocated CPUs > without problems. But today I got greedy. I assigned 8 cores to one VM > for a big build job. Obviously, some of those were shared with the host. > I also assigned it 8GB of RAM (out of 16 total). Build performance fell > through the floor, even though the host was idle. Eventually I killed the > build and restarted it with a more modest 2 make jobs (but the VM still had > 8 cores). Performance improved. But eventually the system seemed to be > mostly hung, while I had a build job running on the host as well as in the > VM. I killed both build jobs, which resolved the hung processes. Then I > restarted the host's build alone, and my system completely hung, with > top(1) indicating that many processes were in the pfault state. > > So my questions are: > 1) Is it a known problem to overcommit CPUs with BHyve? > Yes it's a problem, and it's not just BHyve. The problem comes from stuff like spinlocks. Unlike normal userland locks, when two CPUs contend on a spinlock both are running at the same time. When two vCPUs are contending on a spinlock, the host has no idea how to prioritize them. Normally that's not a problem, because physical CPUs are always supposed to be able to run. But when you overcommit vCPUs, some of them must get swapped out at all times. If a spinlock is being contended by both a running vCPU and a swapped out vCPU, then it might be contended for a long time. The host's scheduler simply isn't able to fix that problem. The problem is even worse when you're using hyperthreading (which I am) because those eight logical cores are really only four physical cores, and spinning on a spinlock doesn't generate enough pipeline stalls to cause a hyperthread switch. So it's probably best to stick with the n - 1 rule. Overcommitting is ok if all guests are single-cored because then they won't use spinlocks. But my guests aren't all single-cored. 2) Could this be related to the pfault hang, even though the guest was idle > at the time? > The expert suspects the ZFS ARC was competing with the guest for RAM. IIUC, ZFS will sometimes greedily grow its ARC by swapping out idle parts of the guest's RAM. But the guest isn't aware of this behavior, and will happily allocate memory from the swapped-out portion. The result is a battle between the ARC and the guest for physical RAM. The best solution is to limit the maximum amount of RAM used by the ARC with the vfs.zfs.arc_max sysctl. More info: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222916 Thanks to everyone who commented, especially the Anonymous Coward. -Alan