Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Jan 2022 13:21:17 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 261059] Kernel panic XEN + ZFS volume.
Message-ID:  <bug-261059-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261059

            Bug ID: 261059
           Summary: Kernel panic XEN + ZFS volume.
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: zedupsys@gmail.com

Created attachment 230842
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D230842&action=
=3Dedit
all config and test script files

Broadly describing, problem is simple - whole system reboots at
unexpected/unwanted times uncontrollably.

XEN virtualization toolstack is used. FreeBSD is run as Dom0 PVH and hosted
FreeBSD VMs DomU HVM. ZFS file system is used for Dom0. And disks for DomUs=
 are
exposed as block devices, ZFS volumes.

I haven't been able to narrow down which area is one that is causing crash.=
 At
first i thought that this is XEN related problem, but the more i tested, it
somewhat started to feel ZFS related as well; sort of concurrency related.
While searching i have created some scripts (will add as attachments), which
when run atleast on my testing hardware can crash system most of the time.

Based on my observations, the most effective way to crash the system is to =
run
as root three scripts in parallel:
1) one that creates 2GB ZFS volumes and copies data from IMG file onto ZVOL=
 by
executing dd,
2) script that turns on/off VM1,
3) script that turns on/off VM2 and VM2 has at least 5 disks.

But it is not the only way, it is the one that crashes system faster than o=
ther
ways.


System hardware:
CPU: Intel(R) Xeon(R) CPU X3440  @ 2.53GHz
RAM: 16GB ECC
HDD: 2x WDC WD2003FYYS 2TB


System installed from FreeBSD-13.0-RELEASE-amd64-dvd1.iso, all defaults exc=
ept
IP and some basic configuration. ZFS pool created automatically with name s=
ys.
XEN toolstack installed by pkg install. Done freebsd-update.


root@lab-01 > uname -a
FreeBSD lab-01.b7.abj.lv 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Au=
g 24
07:33:27 UTC 2021=20=20=20=20
root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC=
=20
amd64


root@lab-01 > freebsd-version
13.0-RELEASE-p5


root@lab-01 > zpool status
  pool: sys
 state: ONLINE
  scan: resilvered 3.70M in 00:00:03 with 0 errors on Fri Jan  7 11:06:07 2=
022
config:

        NAME          STATE     READ WRITE CKSUM
        sys           ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            gpt/sys0  ONLINE       0     0     0
            gpt/sys1  ONLINE       0     0     0

errors: No known data errors


root@lab-01 > pkg info
argp-standalone-1.3_4          Standalone version of arguments parsing
functions from GLIBC
ca_root_nss-3.69_1             Root certificate bundle from the Mozilla Pro=
ject
curl-7.79.1                    Command line tool and library for transferri=
ng
data with URLs
edk2-xen-x64-g202102           EDK2 Firmware for xen_x64
gettext-runtime-0.21           GNU gettext runtime libraries and programs
glib-2.70.1,2                  Some useful routines of C programming (curre=
nt
stable version)
indexinfo-0.3.1                Utility to regenerate the GNU info page index
libevent-2.1.12                API for executing callback functions on even=
ts
or timeouts
libffi-3.3_1                   Foreign Function Interface
libiconv-1.16                  Character set conversion library
libnghttp2-1.44.0              HTTP/2.0 C Library
libssh2-1.9.0_3,3              Library implementing the SSH2 protocol
libxml2-2.9.12                 XML parser library for GNOME
lzo2-2.10_1                    Portable speedy, lossless data compression
library
mpdecimal-2.5.1                C/C++ arbitrary precision decimal floating p=
oint
libraries
pcre-8.45                      Perl Compatible Regular Expressions library
perl5-5.32.1_1                 Practical Extraction and Report Language
pixman-0.40.0_1                Low-level pixel manipulation library
pkg-1.17.5                     Package manager
python38-3.8.12                Interpreted object-oriented programming lang=
uage
readline-8.1.1                 Library for editing command lines as they are
typed
seabios-1.14.0                 Open source implementation of a 16bit X86 BI=
OS
tmux23-2.3_1                   Terminal Multiplexer (old stable version 2.3)
vim-8.2.3458                   Improved version of the vi editor (console
flavor)
xen-kernel-4.15.0_1            Hypervisor using a microkernel design
xen-tools-4.15.0_2             Xen management tools
yajl-2.1.0                     Portable JSON parsing and serialization libr=
ary
in ANSI C
zsh-5.8                        The Z shell


root@lab-01 > cat /boot/loader.conf
zfs_load=3D"YES"
vfs.root.mountfrom=3D"zfs:sys"

beastie_disable=3D"YES"
autoboot_delay=3D"5"

boot_multicons=3D"YES"
boot_serial=3D"YES"
comconsole_speed=3D"9600"
console=3D"comconsole,vidconsole"

xen_kernel=3D"/boot/xen"
xen_cmdline=3D"dom0_mem=3D2048M cpufreq=3Ddom0-kernel dom0_max_vcpus=3D2 do=
m0=3Dpvh
console=3Dvga,com1 com1=3D9600,8n1 guest_loglvl=3Dall loglvl=3Dall"

hw.usb.no_boot_wait=3D1


root@lab-01 > cat /etc/rc.conf
hostname=3D"lab-01.b7.abj.lv"

cloned_interfaces=3D"bridge10"

create_args_bridge10=3D"name xbr0"
cloned_interfaces_sticky=3D"YES"

ifconfig_xbr0=3D"inet 10.63.0.1/16"

zfs_enable=3D"YES"
sshd_enable=3D"YES"

xencommons_enable=3D"YES"


Besides default ZFS dataset that is mounted at /, i have created parent for=
 VM
ZVOLs and for working in /service directory.
root@lab-01 > zfs list
NAME           USED  AVAIL     REFER  MOUNTPOINT
sys           98.6G  1.66T     1.99G  /
sys/service   96.6G  1.66T     96.6G  /service
sys/vmdk        48K  1.66T       24K  none
sys/vmdk/dev    24K  1.66T       24K  none

# zfs create -o mountpoint=3Dnone sys/vmdk
# etc.

I am running scripts from folder /service/crash, so attachments can just be
placed there on fresh system. Scripts need SSH key, thus create it by comma=
nd
ssh-keygen.


Attached file descriptions:
lib.sh - this file contains reusable functions for tests and VM preparation,
used by test scripts and manually.
libexec.sh - this is just a wrapper file which uses first arg as function n=
ame
to be called from lib.sh, this is used for manual function calls.
test_vm1_zvol_on_off.sh - this script in loop executes VM1 boot, sleep, VM1
power off
test_vm2_zvol_on_off.sh - this script in loop executes VM2 boot, sleep, VM2
power off
test_vm2_zvol_5_on_off.sh - this turns on/off VM2 which has 5 HDDs
test_vm1_zvol_3gb.sh - this turns VM1 on/off, and writes/removes 3GB file in
VM1:/tmp folder
xen-vm1-zvol.conf - XEN config file for VM1
xen-vm2-zvol.conf - XEN config file for VM2
xen-vm2-zvol-5.conf - XEN config file for VM2 with 5 HDDs.


To create VMs:
With all those attached files in /service/crash. Run as root:
./libexec.sh vm1_img_create
./libexec.sh vm2_img_create

These commands will create VM1 and VM2 disk images, set internal IP as defi=
ned
in lib.sh and copy SSH key from hosts /root/.ssh int VM disks. VM image is
downloaded from
https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Late=
st/FreeBSD-13.0-RELEASE-amd64.raw.xz,
so network connection is necessary or file FreeBSD-13.0-RELEASE-amd64.raw.xz
must be placed in folder /service/crash/cache.

Then to convert IMG to ZVOL:
./libexec.sh vm1_img_to_zvol
./libexec.sh vm2_img_to_zvol

Sometimes at this point there is dd error, that /dev/zvol/sys/vmdk/dev/vm1-=
root
is not accessible. There is some ZFS bug, but i could not repeat it reliably
enough to write bug report. So just reboot system, it will show up, just re=
run
command.

Create dummy disks for VM2 data.
./libexec.sh vmdk_empty_create vm2-data1.img 2G
./libexec.sh vmdk_empty_create vm2-data2.img 2G
./libexec.sh vmdk_empty_create vm2-data3.img 2G
./libexec.sh vmdk_empty_create vm2-data4.img 2G

./libexec.sh vm2_data_to_zvol

Now that everything is prepared, just test VMs with
xl create xen-vm1-zvol.conf

To see that VM boots, run:
xl console xen-vm1-zvol

It is necessary to connect with SSH manually once, to ensure that connection
works and SSH updates /root/.ssh/known_hosts.

Before test start, expected ZFS layout is:
root@lab-01 #1> zfs list
NAME                     USED  AVAIL     REFER  MOUNTPOINT
sys                      142G  1.62T     1.99G  /
sys/service              111G  1.62T      111G  /service
sys/vmdk                28.9G  1.62T       24K  none
sys/vmdk/dev            28.9G  1.62T       24K  none
sys/vmdk/dev/vm1-root   10.3G  1.62T     5.07G  -
sys/vmdk/dev/vm2-data1  2.06G  1.62T       12K  -
sys/vmdk/dev/vm2-data2  2.06G  1.62T     2.00G  -
sys/vmdk/dev/vm2-data3  2.06G  1.62T     2.00G  -
sys/vmdk/dev/vm2-data4  2.06G  1.62T       12K  -
sys/vmdk/dev/vm2-root   10.3G  1.62T     5.07G  -

And directory
# ls -la /dev/zvol/sys/vmdk/dev/
total 1
dr-xr-xr-x  2 root  wheel      512 Jan  9 14:27 .
dr-xr-xr-x  3 root  wheel      512 Jan  9 14:27 ..
crw-r-----  1 root  operator  0x72 Jan  9 14:27 vm1-root
crw-r-----  1 root  operator  0x70 Jan  9 14:27 vm2-data1
crw-r-----  1 root  operator  0x71 Jan  9 14:27 vm2-data2
crw-r-----  1 root  operator  0x75 Jan  9 14:27 vm2-data3
crw-r-----  1 root  operator  0x73 Jan  9 14:27 vm2-data4
crw-r-----  1 root  operator  0x74 Jan  9 14:27 vm2-root

For me sometimes there are missing ZVOLs in /dev/zvol directory, vm2-data1 =
or
vm2-data3, even if zfs list shows them up, thus init 6, before tests can be
started.


Once the environment is ready, just run from three different SSH sessions
commands:
1) cd /service/crash; ./libexec.sh zfs_volstress
2) cd /service/crash; ./test_vm1_zvol_on_off.sh
3) cd /service/crash; ./test_vm2_zvol_5_on_off.sh

Sometimes it crashes fast (in 2 minutes) sometimes it takes some time, like=
 30
minutes.

My observations so far.

1. ZVOLs are acting weird, for example at some point i see output like this:

./libexec.sh: creating sys/stress/data1 2G
dd: /dev/zvol/sys/stress/data1: No such file or directory
./libexec.sh: creating sys/stress/data2 2G
4194304+0 records in
4194304+0 records out
2147483648 bytes transferred in 70.178650 secs (30600241 bytes/sec)
./libexec.sh: creating sys/stress/data3 2G
4194304+0 records in
4194304+0 records out
2147483648 bytes transferred in 73.259213 secs (29313496 bytes/sec)
./libexec.sh: creating sys/stress/data4 2G
dd: /dev/zvol/sys/stress/data4: Operation not supported
./libexec.sh: creating sys/stress/data5 2G
dd: /dev/zvol/sys/stress/data5: Operation not supported
./libexec.sh: creating sys/stress/data6 2G

For me this seems somewhat unexpected behaviour, since each time before dd =
is
run, zfs create has returned; it is not done in parallel from user's
perspective. See function zfs_volstress in lib.sh file.


2. Often, but not always there are problems with starting VM2 before system
crash, output:

libxl: error: libxl_device.c:1111:device_backend_callback: Domain 53:unable=
 to
add device with path /local/domain/0/backend/vbd/53/51712
libxl: error: libxl_create.c:1613:domcreate_launch_dm: Domain 53:unable to =
add
disk devices
libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 53:Non-exist=
ant
domain
libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 53:Unable=
 to
destroy guest
libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 53:Destruction =
of
domain failed
./test_vm2_zvol_single_hdd_on_off.sh: waiting VM to be ready

Sometimes must restart script ./test_vm2_zvol_single_hdd_on_off.sh, because=
 it
is not smart with waiting for VM2 start.


3. It is not necessary for VM2 to have 5 disks to crash system; even running
1) cd /service/crash; ./libexec.sh zfs_volstress
2) cd /service/crash; ./test_vm1_zvol_on_off.sh
3) cd /service/crash; ./test_vm2_zvol_on_off.sh

Will crash system eventually, but it takes much longer to wait for it;
sometimes for me it takes 2-3 hours.


4. If just running, test_vm1_zvol_on_off and test_vm2_zvol_on_off, system s=
eems
not to crash, or maybe i did not wait long enough; it was whole day. Thus Z=
FS
load seems essential to provoke panic.


5. It is possible to crash system with scripts only 2 scripts:
1) cd /service/crash; ./test_vm1_zvol_3gb.sh (this writes 3GB data inside
VM1:/tmp)
2) cd /service/crash; ./test_vm2_zvol_5_on_off.sh

Writing larger files inside VM1 tends to provoke panic sooner; with 1GB cou=
ld
not repeat the case often enough.

The problem is that there is little info when system crashes. I am open for
advices how could i capture more useful data, but below are some incomplete,
for me seemed interesting fragments from serial output:

Fatal trap 12: page fault while in kernel mode
cpuid =3D 1; apic id =3D 02
fault virtual address   =3D 0x30028
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff80c45832
stack pointer           =3D 0x28:0xfffffe00967ec930
frame pointer           =3D 0x28:0xfffffe00967ec930
cod


Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 0; apic id =3D 00
instruction pointer     =3D 0x20:0xffffffff80c45832
stack pointer           =3D 0x28:0xfffffe009666b930
frame pointer           =3D 0x28:0xfffffe009666b930
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0,


Fatal trap 12: page fault w


(d2) Booting from Hard Disk...
(d2) Booting from 0000:7c00
(XEN) d1v0: upcall vector 93
(XEN) d2v0: upcall vector 93
xnb(xnb_frontend_changed:1391): frontend_state=3DConnected, xnb_state=3DIni=
tWait
xnb(xnb_connect_comms:787): rings connected!
xbbd4: Error 12 Unable to allocate request bounce buffers
xbbd4: Fatal error. Transitioning to Closing State
panic: pmap_growkernel: no memory to grow kernel
cpuid =3D 0
time =3D 1641731595
KDB: stack backtrace:
#0 0xffffffff80c574c5 at kdb_backtrace+0x65
#1 0xffffffff80c09ea1 at vpanic+0x181
#2 0xffffffff80c09d13 at panic+0x43
#3 0xffffffff81073eed at pmap_growkernel+0x27d
#4 0xffffffff80f2da88 at vm_map_insert+0x248
#5 0xffffffff80f301e9 at vm_map_find+0x549
#6 0xffffffff80f2bf16 at kmem_init+0x226
Loading /boot/loader.conf.local



I am interested in solving this. This is a testing machine, thus i can run
modified tests any time. But i am somewhat out of ideas what could be done =
to
get more verbose output, so that at least complete messages are written in
serial output before automatic reboot happens.

As for "panic: pmap_growkernel: no memory to grow kernel", for me it seemed
that it should be enough that Dom0 has 8GB RAM, and each VM 1GB. But i do n=
ot
claim that i am XEN expert and maybe this could be clasified as
misconfiguration of system. If so, i am open to pointers what could be done=
 to
make system more stable.

The same scripts can crash RELEASE-12.1 as well. Tested.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261059-227>