From nobody Wed Mar 2 18:12:28 2022 X-Original-To: freebsd-xen@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4091E19E038B for ; Wed, 2 Mar 2022 18:12:06 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4K82JN6s3Qz4cch for ; Wed, 2 Mar 2022 18:12:04 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: by mail-pf1-x433.google.com with SMTP id y11so2619025pfa.6 for ; Wed, 02 Mar 2022 10:12:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Km784lPMqWQy1P3KlOXLXyk3qDZQgh0YIKyW9HBluDM=; b=h2Pybmp+Y257QU7QyLn0uD88JaEVGTnN+BT+RY0QnkZY2g50XgqQSRg1kRF5qdAif1 0zlYMfQiljfYckiO4ph4LEmptm1lgeUzA53XxycMgrsfKnrDu1E+mmTv+Bxb+SaUF71s dmQMCvjcascR9UPPPfg4TN2rV7winTcHI+TbmyIII5ClMFcbkq+OWaVDTL+P5D8FonJt TiL/fh2XjQ45KtDYw6YVcHrP4GhAMWcEdOY/VTwUAJea6FT2B1IG9uoJWORnbAIi+iOj 20/HWVuBU7XC88A0d2pBvFCvIVF/09GN2KCmcm7xE6wCT5m3wvwhcd54jKdCoH9vLYCa jSPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Km784lPMqWQy1P3KlOXLXyk3qDZQgh0YIKyW9HBluDM=; b=I02u7/TwCVw5mjP62zRvyLvbPR2IRwi5wNJQXO1/HBx8N9pSnNVLxqqn4tNQL/edh8 0oMGbw4onLpuNbiHkK3xSqLqVZaAvpMlQ9SCsDEaJIQDErfsxJ5pIN4e//JPpc238BNq 5Gsm+U8s42DMa1ZYAqWvset5vLboIm9JfaM1M17Tq4o6adlw7BjBA3HKRCxeyciV6EIz 74jrjKA695KNjGYIi4NJOp4wH91kacDHfnEuLP7Eni0PeXimRcZ83eb19uP7YLITM6uv BRIzDKEURynVqq1zrRCJ3NRf3KTyXlmCnF0JI405bVJIP9WsyHo12+lzau+0hN5Y9knE 3h0w== X-Gm-Message-State: AOAM530kyyFrBY3nSZWo/eRz0lEDalgzQ/o5OEOBJfWKaStn7ISsKZKH MggJWmjhVkJhBe3OE1/MRZUi5C0wG5Vhbt5v30g= X-Google-Smtp-Source: ABdhPJwIJn85EdLRHbhOxmEfQ6T8kQleTUNRKdYR8yWBVPyESeNmlpvplPcVdSiQejpA5YEYZh3OF4rjfwsi2CZJocA= X-Received: by 2002:a05:6a00:16ce:b0:4ce:118f:8e4f with SMTP id l14-20020a056a0016ce00b004ce118f8e4fmr34846616pfc.56.1646244717933; Wed, 02 Mar 2022 10:11:57 -0800 (PST) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-xen List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-xen@freebsd.org X-BeenThere: freebsd-xen@freebsd.org MIME-Version: 1.0 References: <202203021705.222H5MVm026787@nfbcal.org> In-Reply-To: <202203021705.222H5MVm026787@nfbcal.org> From: Ze Dupsys Date: Wed, 2 Mar 2022 20:12:28 +0200 Message-ID: Subject: Re: ZFS + FreeBSD XEN dom0 panic To: Brian Buhrow Cc: freebsd-xen@freebsd.org Content-Type: multipart/alternative; boundary="000000000000a0365d05d9403764" X-Rspamd-Queue-Id: 4K82JN6s3Qz4cch X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=h2Pybmp+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of zedupsys@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=zedupsys@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.997]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-xen@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::433:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MLMMJ_DEST(0.00)[freebsd-xen]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N --000000000000a0365d05d9403764 Content-Type: text/plain; charset="UTF-8" I agree with you, that firewall and networking most probably is not at fault. When Dom0 had jails with netgraph, then it panic'ed a lot. Well, now i have concluded that mixing jails with VMs is not a good idea anyways, better having jails in DomU. Well, i could beef up RAM a bit, but it seems that this would just postpone the inevitable. While DomUs are just having CPU load, use network and have little load on HDDs, all is fine, but once HDD load increases, at some point system crashes. It really could be as you say due to ZFS and it's monopolistic/unfriendly RAM usage. I guess i will go on quest to search how to tune/limit ZFS a bit. About snapshoting ZFS volumes... i'm not doing it while DomUs are running. At the moment the crashing lab machine has no snapshots at all. Lab machine reboots with panic messages written on serial output, and logged by old laptop. So it feels that it is not hardware error; i will run memtest to be sure. At first i did get only partial panic messages, since XEN rebooted too soon, but then i added sync_console=1, so i get full panic messages, it seems that reboot=no value is not taken into account, though. About memory balooning i feel somewhat hesitant, in a way i am trying to not use too many different techniques that could introduce more problems. Is balooning on XEN + FreeBSD Dom0 considered stable? Have you used XENs driver domain with FreeBSD to "export/provide" disks? It seemed interesting approach as well, but as i was following documentation i could not understand how to even configure FreeBSD as a driver domain, if it's even possible, to provide block devices to Dom0 so it can provide them to other DomUs. This might solve RAM issues as well, since driver domain would have it's reserved RAM and could not put pressure on Dom0's RAM for whatever reason. In a way i am thinking about various strategies to shave off services from Dom0, to ensure it's stability. Maybe i should configure firewall inside in a DomU as in your pfSense example. Since for me usually CPU resources are not exhausted, but NICs and HDDs are. Thank's for the ideas of what else could be done, to solve this! Best wishes, Ze Dupsys On Wed, Mar 2, 2022 at 7:05 PM Brian Buhrow wrote: > hello. One difference between my systems and yurs, though I don't > think that's the > problem, is that I'm not running a firewall on the dom0 itself. The dom0 > runs on a protected > vlan with respect to the external network and the domu's are connected to > bridges that are > directly connected to the external network. I have one system where the > customer wants the > pfsense system runing, so pfsense runs as a domu on this system, connected > to an internal > "private" bridge and the public bridge, doing all the firewalling between > them. In this way, > the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management. > > If I had to wager a guess as to your trouble, it's that you don't > have enough memory on > your dom0. ZFS is a memory hog and I can't imagine getting away with > anything less than 8G on > the dom0 with FreeBSD-12 and ZFS. I'm using 8G for the dom0 on the system > I'm writing from and > it is quite stable, but, then again, I'm not doing as much with the dom0 > as you are. > > I too am using zvols as disks for the domu's, but I've not been > trying to make zfs > snapshots from them. obvious question, but I'll ask it anyway, you're not > trying to make > snapshots of the zvols while the domu's on top of them are running, are > you? I would imagine > that would not give you good images, but I wouldn't expect it to panic the > dom0 either. > However, it wil stretch your meager memory resources even further. > > Have you been able to get a panic message or does the system just > spontaneously reboot? If it > just reboots, then, again, I think you are having a memory shortage. > > My suggestion is to try giving the dom0 8G of RAM and then for the domU's, > use the balloon > driver to oversubscribe the remaining memory for the domu's. Of course, > the best course of > action is to see if you can put more memory in this system; 16GB just > isn't that much when > you're trying to run Xen plus a few domu's, especially on top of ZFS. > If yu can get a panic message or a crash dump, that would be helpful in > figuring out more > accurately what's going on. > > Another thought, since you were getting some crashes when running jails > with xen, is to get > memtest86 running on the raw machine and let it run for 3 or 4 days. If > you don't get any > memory errors, then I think you can be pretty sure it's not a hardware > problem. If, however, > you get any errors at all with that test, then I think it's a good bet you > hav a hardware issue. > > > -thanks > -Brian > > --000000000000a0365d05d9403764 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I agree with you, that firewall and networking most p= robably is not at fault. When Dom0 had jails with netgraph, then it panic&#= 39;ed a lot. Well, now i have concluded that mixing jails with VMs is not a= good idea anyways, better having jails in DomU.

Well, i could beef up RAM a bit, but it seems that this would just postp= one the inevitable. While DomUs are just having CPU load, use network and h= ave little load on HDDs, all is fine, but once HDD load increases, at some = point system crashes. It really could be as you say due to ZFS and it's= monopolistic/unfriendly RAM usage. I guess i will go on quest to search ho= w to tune/limit ZFS a bit.

About snapshoting ZFS v= olumes... i'm not doing it while DomUs are running. At the moment the c= rashing lab machine has no snapshots at all.

Lab m= achine reboots with panic messages written on serial output, and logged by = old laptop. So it feels that it is not hardware error; i will run memtest t= o be sure. At first i did get only partial panic messages, since XEN reboot= ed too soon, but then i added sync_console=3D1, so i get full panic message= s, it seems that reboot=3Dno value is not taken into account, though.

About memory balooning i feel somewhat hesitant, in= a way i am trying to not use too many different techniques that could intr= oduce more problems. Is balooning on XEN + FreeBSD Dom0 considered stable?<= br>

Have you used XENs driver domain with FreeBSD = to "export/provide" disks? It seemed interesting approach as well= , but as i was following documentation i could not understand how to even c= onfigure FreeBSD as a driver domain, if it's even possible, to provide = block devices to Dom0 so it can provide them to other DomUs. This might sol= ve RAM issues as well, since driver domain would have it's reserved RAM= and could not put pressure on Dom0's RAM for whatever reason.

In a way i am thinking about various strategies to sha= ve off services from Dom0, to ensure it's stability. Maybe i should con= figure firewall inside in a DomU as in your pfSense example. Since for me u= sually CPU resources are not exhausted, but NICs and HDDs are.

Thank's for the ideas of what else could be done, to s= olve this!

Best wishes,
Ze Dupsys


On Wed, Mar 2, 2022 at 7:05 PM Brian Buhrow <buhrow@nfbcal.org> wrote:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = hello.=C2=A0 One difference between my systems and yurs, though I don't= think that's the
problem, is that I'm not running a firewall on the dom0 itself.=C2=A0 T= he dom0 runs on a protected
vlan with respect to the external network and the domu's are connected = to bridges that are
directly connected to the external network.=C2=A0 I have one system where t= he customer wants the
pfsense system runing, so pfsense runs as a domu on this system, connected = to an internal
"private" bridge and the public bridge, doing all the firewalling= between them.=C2=A0 In this way,
the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 If I had to wager a guess as to your trouble, i= t's that you don't have enough memory on
your dom0.=C2=A0 ZFS is a memory hog and I can't imagine getting away w= ith anything less than 8G on
the dom0 with FreeBSD-12 and ZFS.=C2=A0 I'm using 8G for the dom0 on th= e system I'm writing from and
it is quite stable, but, then again, I'm not doing as much with the dom= 0 as you are.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 I too am using zvols as disks for the domu'= s, but I've not been trying to make zfs
snapshots from them.=C2=A0 obvious question, but I'll ask it anyway, yo= u're not trying to make
snapshots of the zvols while the domu's on top of them are running, are= you?=C2=A0 I would imagine
that would not give you good images, but I wouldn't expect it to panic = the dom0 either.
However, it wil stretch your meager memory resources even further.

Have you been able to get a panic message or does the system just spontaneo= usly reboot?=C2=A0 If it
just reboots, then, again, I think you are having a memory shortage.

My suggestion is to try giving the dom0 8G of RAM and then for the domU'= ;s, use the balloon
driver to oversubscribe the remaining memory for the domu's.=C2=A0 Of c= ourse, the best course of
action is to see if you can put more memory in this system; 16GB=C2=A0 just= isn't that much when
you're trying to run Xen plus a few domu's, especially on top of ZF= S.
If yu can get a panic message or a crash dump, that would be helpful in fig= uring out more
accurately what's going on.

Another thought, since you were getting some crashes when running jails wit= h xen, is to get
memtest86 running on the raw machine and let it run for=C2=A0 3 or 4 days.= =C2=A0 If you don't get any
memory errors, then I think you can be pretty sure it's not a hardware = problem.=C2=A0 =C2=A0If, however,
you get any errors at all with that test, then I think it's a good bet = you hav a hardware issue.


-thanks
-Brian

--000000000000a0365d05d9403764--