Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2022 10:57:37 +0200
From:      Ze Dupsys <zedupsys@gmail.com>
To:        freebsd-xen@freebsd.org, buhrow@nfbcal.org
Subject:   Re: ZFS + FreeBSD XEN dom0 panic
Message-ID:  <CAOEWpzdC41ithfd7R_qa66%2Bsh_UXeku7OcVC_b%2BXUaLr_9SSTA@mail.gmail.com>
In-Reply-To: <202203011540.221FeR4f028103@nfbcal.org>
References:  <CAOEWpzc2WVViMJHrrtuU-G_7yck4eehm6b=JQPSZU1MH-bzmiw@mail.gmail.com> <202203011540.221FeR4f028103@nfbcal.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000003c3a7005d93877a8
Content-Type: text/plain; charset="UTF-8"

Hello,

I started using XEN on one pre-production machine (with aim to use later in
production) with 12.2, but since it experienced random crashes i updated to
13.0 in hope that errors might disappear.

I do not know how detailed should i write, so that this email is not too
long, but gives enough info.

FreeBSD Dom0 is installed on ZFS, somewhat basic install, IPFW and rules
for NATting are used. Zpool is composed of 2 mirrored disks. There is a
ZVOL volmode=dev for each VM and VM's jail that are attached as raw devices
to DomU. At the moment DomUs contain FreeBSD, some 12.0 to 13.0, UFS, with
VNET jails, epairs all bridged to DomU's xn0 interface. On Dom0 i have
bridge interfaces, where DomU's are connected depending on their
"zone/network", those that have allowed outgoing connections are NATted by
IPFW on specific physical NIC and IP.

xen_cmdline="dom0_mem=6144M cpufreq=dom0-kernel dom0_max_vcpus=4 dom0=pvh
console=vga,com1 com1=115200,8n1 guest_loglvl=all loglvl=all"

Physical hardware is XEON CPU, ECC RAM 16G, 2x8TB HDD.

DomU config, something like this:
memory = 1024
vcpus=2
name = "sys-01"

type = "hvm"
boot = "dc"

vif = [ 'vifname=xbr0p5,type=vif,mac=00:16:3E:01:63:05,bridge=xbr0' ]
disk = [ 'backendtype=phy, format=raw, vdev=xvda,
target=/dev/zvol/sys/vmdk/root/sys-01-root',
         'backendtype=phy, format=raw, vdev=xvdb,
target=/dev/zvol/sys/vmdk/root/sys-01-jail1',
         'backendtype=phy, format=raw, vdev=xvdc,
target=/dev/zvol/sys/vmdk/root/sys-01-jail2'
         .. more defs, if any ..
       ]

vnc=1
vnclisten="0.0.0.0:X"
usbdevice = "tablet"
serial = "pty"


When just started, overall system works, speeds are acceptable, load is not
high so system is not under stress. The thing is that at some unexpected
times i noticed that system reboots, i.e. when i create new ZFS volume in
Dom0, or when i reboot DomU or do something in Dom0 which seems unrelated,
sometimes it was that init 0 would reboot system, sometimes it shut it
down. It somehow felt, that panics happen when there is HDD load. So i got
somewhat similar machine for testing/lab env, 16G ECC, slower XEON, 2x2TB
HDD and serial port and started to try to push that system to limits with
various combinations, restricting RAM, CPUs, etc. The bug info contains
combination, that seemed for me to be the fastest way of how to panic
system.

For XEN startup "vifname=" did not work as described in XEN user manual
pages for default startup script, so i added "ifconfig name $vifname" in
that script. The necessity for it was, that ipfw rules that required "via
$ifname in", had to have specific NIC, but XEN by default each time was
creating new NIC name depending on which name was free. This is not active
on lab system, and it still crashes, so i do not think that problem cause
is this.


About history.
I believe hardware is okay, since before XEN i was using FreeBSD 12.2
(upgraded incrementally from 12.0), ZFS + jails a lot, VNETs used were
netgraph(VNET bridge and ethernet interfaces). What i loved about that
setup was, clean output of ifconfig, since host had only bridge interface
and virtual ethernet interfaces for jails came directly from that bridge.
New jail creation was just "zfs clone", it did not take much space,
snapshots for backups could be made, whole HDD space could be easily
expanded/limited for each jail, due to ZFS capabilities. System was stable.
The problem with that setup was, that if some jail started to misbehave
badly it was hard to control overall system performance and behavioral
characteristics, i tried rctl, but jails could misbehave in new unexpected
bad ways (exhausting RAM, process count, CPU load, HDD load, opening too
many network sockets, etc. If OOM killer started to kill processes, it was
impossible to control which process/jail should get killed first, which
should be kept), so for me it seemed that virtualization is better way to
solve that. I.e. to have a system VM, that has DNS, Web gateway, etc., and
lower priority VMs, that could crash if misbehaving. I like XEN
architecture in general, and i would like to use FreeBSD as Dom0, if
possible; due to ZFS, knowledge and good history of OS stability.

Since ZFS dataset can not be passed through to DomU, my idea was to use
ZVOLs and UFS within VM, then i could snapshot those ZVOLs for backups,
DomU could growfs when necessary. Somewhat less convenient as for jail
architecture, but still, good enough.

My first attempt was to keep netgraph jails in Dom0, but it turned out bad.
Almost every time system panic happened when jail was started/stopped. Not
first jail, but 5th+, panic-ed system with high probability. So i started
to use epairs instead. It was less unstable, but still crashed from time to
time. Now there are no jails, and still.

I tried different ideas, to pass through whole HDD as raw in DomU-iscsi and
use ctld on Dom0 to provide disks for other DomUs, HDD speed was bad, but
system still crashed, i tried raw files on ZFS datasets, speeds seemed
close to ZVOLs actually, but system still crashed. So now i was starting to
wonder, what configurations do people use successfully? What have i missed?


On Tue, Mar 1, 2022 at 5:40 PM Brian Buhrow <buhrow@nfbcal.org> wrote:

>         hello.  I've been running FreeBSD-12.1 and Freebsd-12.2 plus ZFS
> plus Xen with FreeBSD as
> dom0  without any stability issues for about 2 years now.  I'm doing this
> on a number of
> systems, with a variety of  NetBSD, FreeBSD and Linux as domU guests.  I
> haven't looked at your
> bug details, but are you running FreeBSD-13?
> -thanks
> -Brian
>
>

--0000000000003c3a7005d93877a8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hello,</div><div><br></div><div>I started using XEN o=
n one pre-production machine (with aim to use later in production) with 12.=
2, but since it experienced random crashes i updated to 13.0 in hope that e=
rrors might disappear.</div><div><br></div><div>I do not know how detailed =
should i write, so that this email is not too long, but gives enough info.<=
br></div><div><br></div><div>FreeBSD Dom0 is installed on ZFS, somewhat bas=
ic install, IPFW and rules for NATting are used. Zpool is composed of 2 mir=
rored disks. There is a ZVOL volmode=3Ddev for each VM and VM&#39;s jail th=
at are attached as raw devices to DomU. At the moment DomUs contain FreeBSD=
, some 12.0 to 13.0, UFS, with VNET jails, epairs all bridged to DomU&#39;s=
 xn0 interface. On Dom0 i have bridge interfaces, where DomU&#39;s are conn=
ected depending on their &quot;zone/network&quot;, those that have allowed =
outgoing connections are NATted by IPFW on specific physical NIC and IP.</d=
iv><div><br></div><div>xen_cmdline=3D&quot;dom0_mem=3D6144M cpufreq=3Ddom0-=
kernel dom0_max_vcpus=3D4 dom0=3Dpvh console=3Dvga,com1 com1=3D115200,8n1 g=
uest_loglvl=3Dall loglvl=3Dall&quot;</div><div><br></div><div>Physical hard=
ware is XEON CPU, ECC RAM 16G, 2x8TB HDD.</div><div><br></div><div>DomU con=
fig, something like this:</div><div>memory =3D 1024<br>vcpus=3D2<br>name =
=3D &quot;sys-01&quot;<br><br>type =3D &quot;hvm&quot;<br>boot =3D &quot;dc=
&quot;<br><br>vif =3D [ &#39;vifname=3Dxbr0p5,type=3Dvif,mac=3D00:16:3E:01:=
63:05,bridge=3Dxbr0&#39; ]<br>disk =3D [ &#39;backendtype=3Dphy, format=3Dr=
aw, vdev=3Dxvda, target=3D/dev/zvol/sys/vmdk/root/sys-01-root&#39;,<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&#39;backendtype=3Dphy, format=3Draw, vdev=
=3Dxvdb, target=3D/dev/zvol/sys/vmdk/root/sys-01-jail1&#39;,<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0&#39;backendtype=3Dphy, format=3Draw, vdev=3Dxvdc, =
target=3D/dev/zvol/sys/vmdk/root/sys-01-jail2&#39;</div><div>=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .. more defs, if any ..<br></div><div>=
=C2=A0 =C2=A0 =C2=A0=C2=A0 ]<br><br>vnc=3D1<br>vnclisten=3D&quot;0.0.0.0:X&=
quot;<br>usbdevice =3D &quot;tablet&quot;<br>serial =3D &quot;pty&quot;<br>=
</div><div><br></div><div><br></div><div>When just started, overall system =
works, speeds are acceptable, load is not high so system is not under stres=
s. The thing is that at some unexpected times i noticed that system reboots=
, i.e. when i create new ZFS volume in Dom0, or when i reboot DomU or do so=
mething in Dom0 which seems unrelated, sometimes it was that init 0 would r=
eboot system, sometimes it shut it down. It somehow felt, that panics happe=
n when there is HDD load. So i got somewhat similar machine for testing/lab=
 env, 16G ECC, slower XEON, 2x2TB HDD and serial port and started to try to=
 push that system to limits with various combinations, restricting RAM, CPU=
s, etc. The bug info contains combination, that seemed for me to be the fas=
test way of how to panic system.<br></div><div><br></div><div>For XEN start=
up &quot;vifname=3D&quot; did not work as described in XEN user manual page=
s for default startup script, so i added &quot;ifconfig name $vifname&quot;=
 in that script. The necessity for it was, that ipfw rules that required &q=
uot;via $ifname in&quot;, had to have specific NIC, but XEN by default each=
 time was creating new NIC name depending on which name was free. This is n=
ot active on lab system, and it still crashes, so i do not think that probl=
em cause is this.<br></div><div><br></div><div><br></div><div>About history=
.</div><div>I believe hardware is okay, since before XEN i was using FreeBS=
D 12.2 (upgraded incrementally from 12.0), ZFS=C2=A0+ jails a lot, VNETs us=
ed were netgraph(VNET bridge and ethernet interfaces). What i loved about t=
hat setup was, clean output of ifconfig, since host had only bridge interfa=
ce and virtual ethernet interfaces for jails came directly from that bridge=
. New jail creation was just &quot;zfs clone&quot;, it did not take much sp=
ace, snapshots for backups could be made, whole HDD space could be easily e=
xpanded/limited for each jail, due to ZFS capabilities. System was stable. =
The problem with that setup was, that if some jail started to misbehave bad=
ly it was hard to control overall system performance and behavioral charact=
eristics, i tried rctl, but jails could misbehave in new unexpected bad way=
s (exhausting RAM, process count, CPU load, HDD load, opening too many netw=
ork sockets, etc. If OOM killer started to kill processes, it was impossibl=
e to control which process/jail should get killed first, which should be ke=
pt), so for me it seemed that virtualization is better way to solve that. I=
.e. to have a system VM, that has DNS, Web gateway, etc., and lower priorit=
y VMs, that could crash if misbehaving. I like XEN architecture in general,=
 and i would like to use FreeBSD as Dom0, if possible; due to ZFS, knowledg=
e and good history of OS stability.</div><div><br></div><div>Since ZFS data=
set can not be passed through to DomU, my idea was to use ZVOLs and UFS wit=
hin VM, then i could snapshot those ZVOLs for backups, DomU could growfs wh=
en necessary. Somewhat less convenient as for jail architecture, but still,=
 good enough.<br></div><div><br></div><div>My first attempt was to keep net=
graph jails in Dom0, but it turned out bad. Almost every time system panic =
happened when jail was started/stopped. Not first jail, but 5th+, panic-ed =
system with high probability. So i started to use epairs instead. It was le=
ss unstable, but still crashed from time to time. Now there are no jails, a=
nd still.</div><div><br></div><div>I tried different ideas, to pass through=
 whole HDD as raw in DomU-iscsi and use ctld on Dom0 to provide disks for o=
ther DomUs, HDD speed was bad, but system still crashed, i tried raw files =
on ZFS datasets, speeds seemed close to ZVOLs actually, but system still cr=
ashed. So now i was starting to wonder, what configurations do people use s=
uccessfully? What have i missed?<br></div><div><br></div></div><br><div cla=
ss=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Mar 1, 202=
2 at 5:40 PM Brian Buhrow &lt;<a href=3D"mailto:buhrow@nfbcal.org">buhrow@n=
fbcal.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">=C2=A0 =C2=A0 =C2=A0 =C2=A0 hello.=C2=A0 I&#39;ve been running F=
reeBSD-12.1 and Freebsd-12.2 plus ZFS plus Xen with FreeBSD as<br>
dom0=C2=A0 without any stability issues for about 2 years now.=C2=A0 I&#39;=
m doing this on a number of<br>
systems, with a variety of=C2=A0 NetBSD, FreeBSD and Linux as domU guests.=
=C2=A0 I haven&#39;t looked at your<br>
bug details, but are you running FreeBSD-13?<br>
-thanks<br>
-Brian<br>
<br>
</blockquote></div>

--0000000000003c3a7005d93877a8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOEWpzdC41ithfd7R_qa66%2Bsh_UXeku7OcVC_b%2BXUaLr_9SSTA>