Date: Sun, 09 Jan 2022 13:21:17 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 261059] Kernel panic XEN + ZFS volume. Message-ID: <bug-261059-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261059 Bug ID: 261059 Summary: Kernel panic XEN + ZFS volume. Product: Base System Version: 13.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: zedupsys@gmail.com Created attachment 230842 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D230842&action= =3Dedit all config and test script files Broadly describing, problem is simple - whole system reboots at unexpected/unwanted times uncontrollably. XEN virtualization toolstack is used. FreeBSD is run as Dom0 PVH and hosted FreeBSD VMs DomU HVM. ZFS file system is used for Dom0. And disks for DomUs= are exposed as block devices, ZFS volumes. I haven't been able to narrow down which area is one that is causing crash.= At first i thought that this is XEN related problem, but the more i tested, it somewhat started to feel ZFS related as well; sort of concurrency related. While searching i have created some scripts (will add as attachments), which when run atleast on my testing hardware can crash system most of the time. Based on my observations, the most effective way to crash the system is to = run as root three scripts in parallel: 1) one that creates 2GB ZFS volumes and copies data from IMG file onto ZVOL= by executing dd, 2) script that turns on/off VM1, 3) script that turns on/off VM2 and VM2 has at least 5 disks. But it is not the only way, it is the one that crashes system faster than o= ther ways. System hardware: CPU: Intel(R) Xeon(R) CPU X3440 @ 2.53GHz RAM: 16GB ECC HDD: 2x WDC WD2003FYYS 2TB System installed from FreeBSD-13.0-RELEASE-amd64-dvd1.iso, all defaults exc= ept IP and some basic configuration. ZFS pool created automatically with name s= ys. XEN toolstack installed by pkg install. Done freebsd-update. root@lab-01 > uname -a FreeBSD lab-01.b7.abj.lv 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Au= g 24 07:33:27 UTC 2021=20=20=20=20 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC= =20 amd64 root@lab-01 > freebsd-version 13.0-RELEASE-p5 root@lab-01 > zpool status pool: sys state: ONLINE scan: resilvered 3.70M in 00:00:03 with 0 errors on Fri Jan 7 11:06:07 2= 022 config: NAME STATE READ WRITE CKSUM sys ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/sys0 ONLINE 0 0 0 gpt/sys1 ONLINE 0 0 0 errors: No known data errors root@lab-01 > pkg info argp-standalone-1.3_4 Standalone version of arguments parsing functions from GLIBC ca_root_nss-3.69_1 Root certificate bundle from the Mozilla Pro= ject curl-7.79.1 Command line tool and library for transferri= ng data with URLs edk2-xen-x64-g202102 EDK2 Firmware for xen_x64 gettext-runtime-0.21 GNU gettext runtime libraries and programs glib-2.70.1,2 Some useful routines of C programming (curre= nt stable version) indexinfo-0.3.1 Utility to regenerate the GNU info page index libevent-2.1.12 API for executing callback functions on even= ts or timeouts libffi-3.3_1 Foreign Function Interface libiconv-1.16 Character set conversion library libnghttp2-1.44.0 HTTP/2.0 C Library libssh2-1.9.0_3,3 Library implementing the SSH2 protocol libxml2-2.9.12 XML parser library for GNOME lzo2-2.10_1 Portable speedy, lossless data compression library mpdecimal-2.5.1 C/C++ arbitrary precision decimal floating p= oint libraries pcre-8.45 Perl Compatible Regular Expressions library perl5-5.32.1_1 Practical Extraction and Report Language pixman-0.40.0_1 Low-level pixel manipulation library pkg-1.17.5 Package manager python38-3.8.12 Interpreted object-oriented programming lang= uage readline-8.1.1 Library for editing command lines as they are typed seabios-1.14.0 Open source implementation of a 16bit X86 BI= OS tmux23-2.3_1 Terminal Multiplexer (old stable version 2.3) vim-8.2.3458 Improved version of the vi editor (console flavor) xen-kernel-4.15.0_1 Hypervisor using a microkernel design xen-tools-4.15.0_2 Xen management tools yajl-2.1.0 Portable JSON parsing and serialization libr= ary in ANSI C zsh-5.8 The Z shell root@lab-01 > cat /boot/loader.conf zfs_load=3D"YES" vfs.root.mountfrom=3D"zfs:sys" beastie_disable=3D"YES" autoboot_delay=3D"5" boot_multicons=3D"YES" boot_serial=3D"YES" comconsole_speed=3D"9600" console=3D"comconsole,vidconsole" xen_kernel=3D"/boot/xen" xen_cmdline=3D"dom0_mem=3D2048M cpufreq=3Ddom0-kernel dom0_max_vcpus=3D2 do= m0=3Dpvh console=3Dvga,com1 com1=3D9600,8n1 guest_loglvl=3Dall loglvl=3Dall" hw.usb.no_boot_wait=3D1 root@lab-01 > cat /etc/rc.conf hostname=3D"lab-01.b7.abj.lv" cloned_interfaces=3D"bridge10" create_args_bridge10=3D"name xbr0" cloned_interfaces_sticky=3D"YES" ifconfig_xbr0=3D"inet 10.63.0.1/16" zfs_enable=3D"YES" sshd_enable=3D"YES" xencommons_enable=3D"YES" Besides default ZFS dataset that is mounted at /, i have created parent for= VM ZVOLs and for working in /service directory. root@lab-01 > zfs list NAME USED AVAIL REFER MOUNTPOINT sys 98.6G 1.66T 1.99G / sys/service 96.6G 1.66T 96.6G /service sys/vmdk 48K 1.66T 24K none sys/vmdk/dev 24K 1.66T 24K none # zfs create -o mountpoint=3Dnone sys/vmdk # etc. I am running scripts from folder /service/crash, so attachments can just be placed there on fresh system. Scripts need SSH key, thus create it by comma= nd ssh-keygen. Attached file descriptions: lib.sh - this file contains reusable functions for tests and VM preparation, used by test scripts and manually. libexec.sh - this is just a wrapper file which uses first arg as function n= ame to be called from lib.sh, this is used for manual function calls. test_vm1_zvol_on_off.sh - this script in loop executes VM1 boot, sleep, VM1 power off test_vm2_zvol_on_off.sh - this script in loop executes VM2 boot, sleep, VM2 power off test_vm2_zvol_5_on_off.sh - this turns on/off VM2 which has 5 HDDs test_vm1_zvol_3gb.sh - this turns VM1 on/off, and writes/removes 3GB file in VM1:/tmp folder xen-vm1-zvol.conf - XEN config file for VM1 xen-vm2-zvol.conf - XEN config file for VM2 xen-vm2-zvol-5.conf - XEN config file for VM2 with 5 HDDs. To create VMs: With all those attached files in /service/crash. Run as root: ./libexec.sh vm1_img_create ./libexec.sh vm2_img_create These commands will create VM1 and VM2 disk images, set internal IP as defi= ned in lib.sh and copy SSH key from hosts /root/.ssh int VM disks. VM image is downloaded from https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Late= st/FreeBSD-13.0-RELEASE-amd64.raw.xz, so network connection is necessary or file FreeBSD-13.0-RELEASE-amd64.raw.xz must be placed in folder /service/crash/cache. Then to convert IMG to ZVOL: ./libexec.sh vm1_img_to_zvol ./libexec.sh vm2_img_to_zvol Sometimes at this point there is dd error, that /dev/zvol/sys/vmdk/dev/vm1-= root is not accessible. There is some ZFS bug, but i could not repeat it reliably enough to write bug report. So just reboot system, it will show up, just re= run command. Create dummy disks for VM2 data. ./libexec.sh vmdk_empty_create vm2-data1.img 2G ./libexec.sh vmdk_empty_create vm2-data2.img 2G ./libexec.sh vmdk_empty_create vm2-data3.img 2G ./libexec.sh vmdk_empty_create vm2-data4.img 2G ./libexec.sh vm2_data_to_zvol Now that everything is prepared, just test VMs with xl create xen-vm1-zvol.conf To see that VM boots, run: xl console xen-vm1-zvol It is necessary to connect with SSH manually once, to ensure that connection works and SSH updates /root/.ssh/known_hosts. Before test start, expected ZFS layout is: root@lab-01 #1> zfs list NAME USED AVAIL REFER MOUNTPOINT sys 142G 1.62T 1.99G / sys/service 111G 1.62T 111G /service sys/vmdk 28.9G 1.62T 24K none sys/vmdk/dev 28.9G 1.62T 24K none sys/vmdk/dev/vm1-root 10.3G 1.62T 5.07G - sys/vmdk/dev/vm2-data1 2.06G 1.62T 12K - sys/vmdk/dev/vm2-data2 2.06G 1.62T 2.00G - sys/vmdk/dev/vm2-data3 2.06G 1.62T 2.00G - sys/vmdk/dev/vm2-data4 2.06G 1.62T 12K - sys/vmdk/dev/vm2-root 10.3G 1.62T 5.07G - And directory # ls -la /dev/zvol/sys/vmdk/dev/ total 1 dr-xr-xr-x 2 root wheel 512 Jan 9 14:27 . dr-xr-xr-x 3 root wheel 512 Jan 9 14:27 .. crw-r----- 1 root operator 0x72 Jan 9 14:27 vm1-root crw-r----- 1 root operator 0x70 Jan 9 14:27 vm2-data1 crw-r----- 1 root operator 0x71 Jan 9 14:27 vm2-data2 crw-r----- 1 root operator 0x75 Jan 9 14:27 vm2-data3 crw-r----- 1 root operator 0x73 Jan 9 14:27 vm2-data4 crw-r----- 1 root operator 0x74 Jan 9 14:27 vm2-root For me sometimes there are missing ZVOLs in /dev/zvol directory, vm2-data1 = or vm2-data3, even if zfs list shows them up, thus init 6, before tests can be started. Once the environment is ready, just run from three different SSH sessions commands: 1) cd /service/crash; ./libexec.sh zfs_volstress 2) cd /service/crash; ./test_vm1_zvol_on_off.sh 3) cd /service/crash; ./test_vm2_zvol_5_on_off.sh Sometimes it crashes fast (in 2 minutes) sometimes it takes some time, like= 30 minutes. My observations so far. 1. ZVOLs are acting weird, for example at some point i see output like this: ./libexec.sh: creating sys/stress/data1 2G dd: /dev/zvol/sys/stress/data1: No such file or directory ./libexec.sh: creating sys/stress/data2 2G 4194304+0 records in 4194304+0 records out 2147483648 bytes transferred in 70.178650 secs (30600241 bytes/sec) ./libexec.sh: creating sys/stress/data3 2G 4194304+0 records in 4194304+0 records out 2147483648 bytes transferred in 73.259213 secs (29313496 bytes/sec) ./libexec.sh: creating sys/stress/data4 2G dd: /dev/zvol/sys/stress/data4: Operation not supported ./libexec.sh: creating sys/stress/data5 2G dd: /dev/zvol/sys/stress/data5: Operation not supported ./libexec.sh: creating sys/stress/data6 2G For me this seems somewhat unexpected behaviour, since each time before dd = is run, zfs create has returned; it is not done in parallel from user's perspective. See function zfs_volstress in lib.sh file. 2. Often, but not always there are problems with starting VM2 before system crash, output: libxl: error: libxl_device.c:1111:device_backend_callback: Domain 53:unable= to add device with path /local/domain/0/backend/vbd/53/51712 libxl: error: libxl_create.c:1613:domcreate_launch_dm: Domain 53:unable to = add disk devices libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 53:Non-exist= ant domain libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 53:Unable= to destroy guest libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 53:Destruction = of domain failed ./test_vm2_zvol_single_hdd_on_off.sh: waiting VM to be ready Sometimes must restart script ./test_vm2_zvol_single_hdd_on_off.sh, because= it is not smart with waiting for VM2 start. 3. It is not necessary for VM2 to have 5 disks to crash system; even running 1) cd /service/crash; ./libexec.sh zfs_volstress 2) cd /service/crash; ./test_vm1_zvol_on_off.sh 3) cd /service/crash; ./test_vm2_zvol_on_off.sh Will crash system eventually, but it takes much longer to wait for it; sometimes for me it takes 2-3 hours. 4. If just running, test_vm1_zvol_on_off and test_vm2_zvol_on_off, system s= eems not to crash, or maybe i did not wait long enough; it was whole day. Thus Z= FS load seems essential to provoke panic. 5. It is possible to crash system with scripts only 2 scripts: 1) cd /service/crash; ./test_vm1_zvol_3gb.sh (this writes 3GB data inside VM1:/tmp) 2) cd /service/crash; ./test_vm2_zvol_5_on_off.sh Writing larger files inside VM1 tends to provoke panic sooner; with 1GB cou= ld not repeat the case often enough. The problem is that there is little info when system crashes. I am open for advices how could i capture more useful data, but below are some incomplete, for me seemed interesting fragments from serial output: Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 02 fault virtual address =3D 0x30028 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80c45832 stack pointer =3D 0x28:0xfffffe00967ec930 frame pointer =3D 0x28:0xfffffe00967ec930 cod Fatal trap 9: general protection fault while in kernel mode cpuid =3D 0; apic id =3D 00 instruction pointer =3D 0x20:0xffffffff80c45832 stack pointer =3D 0x28:0xfffffe009666b930 frame pointer =3D 0x28:0xfffffe009666b930 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, Fatal trap 12: page fault w (d2) Booting from Hard Disk... (d2) Booting from 0000:7c00 (XEN) d1v0: upcall vector 93 (XEN) d2v0: upcall vector 93 xnb(xnb_frontend_changed:1391): frontend_state=3DConnected, xnb_state=3DIni= tWait xnb(xnb_connect_comms:787): rings connected! xbbd4: Error 12 Unable to allocate request bounce buffers xbbd4: Fatal error. Transitioning to Closing State panic: pmap_growkernel: no memory to grow kernel cpuid =3D 0 time =3D 1641731595 KDB: stack backtrace: #0 0xffffffff80c574c5 at kdb_backtrace+0x65 #1 0xffffffff80c09ea1 at vpanic+0x181 #2 0xffffffff80c09d13 at panic+0x43 #3 0xffffffff81073eed at pmap_growkernel+0x27d #4 0xffffffff80f2da88 at vm_map_insert+0x248 #5 0xffffffff80f301e9 at vm_map_find+0x549 #6 0xffffffff80f2bf16 at kmem_init+0x226 Loading /boot/loader.conf.local I am interested in solving this. This is a testing machine, thus i can run modified tests any time. But i am somewhat out of ideas what could be done = to get more verbose output, so that at least complete messages are written in serial output before automatic reboot happens. As for "panic: pmap_growkernel: no memory to grow kernel", for me it seemed that it should be enough that Dom0 has 8GB RAM, and each VM 1GB. But i do n= ot claim that i am XEN expert and maybe this could be clasified as misconfiguration of system. If so, i am open to pointers what could be done= to make system more stable. The same scripts can crash RELEASE-12.1 as well. Tested. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-261059-227>