Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2022 20:12:28 +0200
From:      Ze Dupsys <zedupsys@gmail.com>
To:        Brian Buhrow <buhrow@nfbcal.org>
Cc:        freebsd-xen@freebsd.org
Subject:   Re: ZFS + FreeBSD XEN dom0 panic
Message-ID:  <CAOEWpze2KNf08%2BimZ5R3A=ZLF4Eqc1GGnV%2Bi_y-f8bYLikGmmg@mail.gmail.com>
In-Reply-To: <202203021705.222H5MVm026787@nfbcal.org>
References:  <CAOEWpzdC41ithfd7R_qa66%2Bsh_UXeku7OcVC_b%2BXUaLr_9SSTA@mail.gmail.com> <202203021705.222H5MVm026787@nfbcal.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000a0365d05d9403764
Content-Type: text/plain; charset="UTF-8"

I agree with you, that firewall and networking most probably is not at
fault. When Dom0 had jails with netgraph, then it panic'ed a lot. Well, now
i have concluded that mixing jails with VMs is not a good idea anyways,
better having jails in DomU.

Well, i could beef up RAM a bit, but it seems that this would just postpone
the inevitable. While DomUs are just having CPU load, use network and have
little load on HDDs, all is fine, but once HDD load increases, at some
point system crashes. It really could be as you say due to ZFS and it's
monopolistic/unfriendly RAM usage. I guess i will go on quest to search how
to tune/limit ZFS a bit.

About snapshoting ZFS volumes... i'm not doing it while DomUs are running.
At the moment the crashing lab machine has no snapshots at all.

Lab machine reboots with panic messages written on serial output, and
logged by old laptop. So it feels that it is not hardware error; i will run
memtest to be sure. At first i did get only partial panic messages, since
XEN rebooted too soon, but then i added sync_console=1, so i get full panic
messages, it seems that reboot=no value is not taken into account, though.

About memory balooning i feel somewhat hesitant, in a way i am trying to
not use too many different techniques that could introduce more problems.
Is balooning on XEN + FreeBSD Dom0 considered stable?

Have you used XENs driver domain with FreeBSD to "export/provide" disks? It
seemed interesting approach as well, but as i was following documentation i
could not understand how to even configure FreeBSD as a driver domain, if
it's even possible, to provide block devices to Dom0 so it can provide them
to other DomUs. This might solve RAM issues as well, since driver domain
would have it's reserved RAM and could not put pressure on Dom0's RAM for
whatever reason.

In a way i am thinking about various strategies to shave off services from
Dom0, to ensure it's stability. Maybe i should configure firewall inside in
a DomU as in your pfSense example. Since for me usually CPU resources are
not exhausted, but NICs and HDDs are.

Thank's for the ideas of what else could be done, to solve this!

Best wishes,
Ze Dupsys


On Wed, Mar 2, 2022 at 7:05 PM Brian Buhrow <buhrow@nfbcal.org> wrote:

>         hello.  One difference between my systems and yurs, though I don't
> think that's the
> problem, is that I'm not running a firewall on the dom0 itself.  The dom0
> runs on a protected
> vlan with respect to the external network and the domu's are connected to
> bridges that are
> directly connected to the external network.  I have one system where the
> customer wants the
> pfsense system runing, so pfsense runs as a domu on this system, connected
> to an internal
> "private" bridge and the public bridge, doing all the firewalling between
> them.  In this way,
> the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management.
>
>         If I had to wager a guess as to your trouble, it's that you don't
> have enough memory on
> your dom0.  ZFS is a memory hog and I can't imagine getting away with
> anything less than 8G on
> the dom0 with FreeBSD-12 and ZFS.  I'm using 8G for the dom0 on the system
> I'm writing from and
> it is quite stable, but, then again, I'm not doing as much with the dom0
> as you are.
>
>         I too am using zvols as disks for the domu's, but I've not been
> trying to make zfs
> snapshots from them.  obvious question, but I'll ask it anyway, you're not
> trying to make
> snapshots of the zvols while the domu's on top of them are running, are
> you?  I would imagine
> that would not give you good images, but I wouldn't expect it to panic the
> dom0 either.
> However, it wil stretch your meager memory resources even further.
>
> Have you been able to get a panic message or does the system just
> spontaneously reboot?  If it
> just reboots, then, again, I think you are having a memory shortage.
>
> My suggestion is to try giving the dom0 8G of RAM and then for the domU's,
> use the balloon
> driver to oversubscribe the remaining memory for the domu's.  Of course,
> the best course of
> action is to see if you can put more memory in this system; 16GB  just
> isn't that much when
> you're trying to run Xen plus a few domu's, especially on top of ZFS.
> If yu can get a panic message or a crash dump, that would be helpful in
> figuring out more
> accurately what's going on.
>
> Another thought, since you were getting some crashes when running jails
> with xen, is to get
> memtest86 running on the raw machine and let it run for  3 or 4 days.  If
> you don't get any
> memory errors, then I think you can be pretty sure it's not a hardware
> problem.   If, however,
> you get any errors at all with that test, then I think it's a good bet you
> hav a hardware issue.
>
>
> -thanks
> -Brian
>
>

--000000000000a0365d05d9403764
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I agree with you, that firewall and networking most p=
robably is not at fault. When Dom0 had jails with netgraph, then it panic&#=
39;ed a lot. Well, now i have concluded that mixing jails with VMs is not a=
 good idea anyways, better having jails in DomU.<br></div><div><br></div><d=
iv>Well, i could beef up RAM a bit, but it seems that this would just postp=
one the inevitable. While DomUs are just having CPU load, use network and h=
ave little load on HDDs, all is fine, but once HDD load increases, at some =
point system crashes. It really could be as you say due to ZFS and it&#39;s=
 monopolistic/unfriendly RAM usage. I guess i will go on quest to search ho=
w to tune/limit ZFS a bit.</div><div><br></div><div>About snapshoting ZFS v=
olumes... i&#39;m not doing it while DomUs are running. At the moment the c=
rashing lab machine has no snapshots at all.</div><div><br></div><div>Lab m=
achine reboots with panic messages written on serial output, and logged by =
old laptop. So it feels that it is not hardware error; i will run memtest t=
o be sure. At first i did get only partial panic messages, since XEN reboot=
ed too soon, but then i added sync_console=3D1, so i get full panic message=
s, it seems that reboot=3Dno value is not taken into account, though.<br></=
div><div><br></div><div>About memory balooning i feel somewhat hesitant, in=
 a way i am trying to not use too many different techniques that could intr=
oduce more problems. Is balooning on XEN + FreeBSD Dom0 considered stable?<=
br></div><div><br></div><div>Have you used XENs driver domain with FreeBSD =
to &quot;export/provide&quot; disks? It seemed interesting approach as well=
, but as i was following documentation i could not understand how to even c=
onfigure FreeBSD as a driver domain, if it&#39;s even possible, to provide =
block devices to Dom0 so it can provide them to other DomUs. This might sol=
ve RAM issues as well, since driver domain would have it&#39;s reserved RAM=
 and could not put pressure on Dom0&#39;s RAM for whatever reason.<br></div=
><div><br></div><div>In a way i am thinking about various strategies to sha=
ve off services from Dom0, to ensure it&#39;s stability. Maybe i should con=
figure firewall inside in a DomU as in your pfSense example. Since for me u=
sually CPU resources are not exhausted, but NICs and HDDs are.<br></div><di=
v><br></div><div>Thank&#39;s for the ideas of what else could be done, to s=
olve this!</div><div><br></div><div>Best wishes,</div><div>Ze Dupsys<br></d=
iv><div><br></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" cla=
ss=3D"gmail_attr">On Wed, Mar 2, 2022 at 7:05 PM Brian Buhrow &lt;<a href=
=3D"mailto:buhrow@nfbcal.org">buhrow@nfbcal.org</a>&gt; wrote:<br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
hello.=C2=A0 One difference between my systems and yurs, though I don&#39;t=
 think that&#39;s the<br>
problem, is that I&#39;m not running a firewall on the dom0 itself.=C2=A0 T=
he dom0 runs on a protected<br>
vlan with respect to the external network and the domu&#39;s are connected =
to bridges that are<br>
directly connected to the external network.=C2=A0 I have one system where t=
he customer wants the<br>
pfsense system runing, so pfsense runs as a domu on this system, connected =
to an internal<br>
&quot;private&quot; bridge and the public bridge, doing all the firewalling=
 between them.=C2=A0 In this way,<br>
the FreeBSD dom0 is only doing ZFS, simple ip routing and Xen management.<b=
r>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 If I had to wager a guess as to your trouble, i=
t&#39;s that you don&#39;t have enough memory on<br>
your dom0.=C2=A0 ZFS is a memory hog and I can&#39;t imagine getting away w=
ith anything less than 8G on<br>
the dom0 with FreeBSD-12 and ZFS.=C2=A0 I&#39;m using 8G for the dom0 on th=
e system I&#39;m writing from and<br>
it is quite stable, but, then again, I&#39;m not doing as much with the dom=
0 as you are.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 I too am using zvols as disks for the domu&#39;=
s, but I&#39;ve not been trying to make zfs<br>
snapshots from them.=C2=A0 obvious question, but I&#39;ll ask it anyway, yo=
u&#39;re not trying to make<br>
snapshots of the zvols while the domu&#39;s on top of them are running, are=
 you?=C2=A0 I would imagine<br>
that would not give you good images, but I wouldn&#39;t expect it to panic =
the dom0 either.<br>
However, it wil stretch your meager memory resources even further.<br>
<br>
Have you been able to get a panic message or does the system just spontaneo=
usly reboot?=C2=A0 If it<br>
just reboots, then, again, I think you are having a memory shortage.<br>
<br>
My suggestion is to try giving the dom0 8G of RAM and then for the domU&#39=
;s, use the balloon<br>
driver to oversubscribe the remaining memory for the domu&#39;s.=C2=A0 Of c=
ourse, the best course of<br>
action is to see if you can put more memory in this system; 16GB=C2=A0 just=
 isn&#39;t that much when<br>
you&#39;re trying to run Xen plus a few domu&#39;s, especially on top of ZF=
S.<br>
If yu can get a panic message or a crash dump, that would be helpful in fig=
uring out more<br>
accurately what&#39;s going on.<br>
<br>
Another thought, since you were getting some crashes when running jails wit=
h xen, is to get<br>
memtest86 running on the raw machine and let it run for=C2=A0 3 or 4 days.=
=C2=A0 If you don&#39;t get any<br>
memory errors, then I think you can be pretty sure it&#39;s not a hardware =
problem.=C2=A0 =C2=A0If, however,<br>
you get any errors at all with that test, then I think it&#39;s a good bet =
you hav a hardware issue.<br>
<br>
<br>
-thanks<br>
-Brian<br>
<br>
</blockquote></div>

--000000000000a0365d05d9403764--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOEWpze2KNf08%2BimZ5R3A=ZLF4Eqc1GGnV%2Bi_y-f8bYLikGmmg>