Date: Mon, 09 Mar 2020 08:24:40 +0000 From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 235856] FreeBSD freezes on AWS EC2 t3 machines Message-ID: <bug-235856-27103-aDbQZ0VGDs@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-235856-27103@https.bugs.freebsd.org/bugzilla/> References: <bug-235856-27103@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D235856 --- Comment #36 from mail@rubenvos.com --- Hi, This weekend the issue manifested itself again on one of our 12.1 instances (with an EBS volume attached): Mar 7 03:05:47 zfs01 kernel: nvme1: cpl does not map to outstanding cmd Mar 7 03:05:47 zfs01 kernel: cdw0:00000000 sqhd:001b sqid:0001 cid:001b p:0 sc:00 sct:0 m:0 dnr:0 Mar 7 03:05:47 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:05:47 zfs01 kernel: nvme1: resetting controller Mar 7 03:05:47 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:05:47 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:06:18 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:06:48 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:06:48 zfs01 kernel: nvme1: resetting controller Mar 7 03:06:48 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:06:48 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:06:48 zfs01 syslogd: last message repeated 5 times Mar 7 03:07:20 zfs01 kernel: nvme1: VERIFY sqid:1 cid:27 nsid:0 lba:0 len:1 Mar 7 03:07:20 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:07:20 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:07:20 zfs01 kernel: nvme1: VERIFY sqid:1 cid:27 nsid:0 lba:0 len:1 Mar 7 03:07:20 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:08:23 zfs01 kernel: ena0: The number of lost tx completion is abo= ve the threshold (129 > 128). Reset the device Mar 7 03:08:23 zfs01 kernel: ena0: Trigger reset is on Mar 7 03:08:23 zfs01 kernel: ena0: device is going DOWN Mar 7 03:08:23 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:08:23 zfs01 kernel: nvme1: resetting controller Mar 7 03:08:23 zfs01 dhclient[40936]: send_packet6: Network is down Mar 7 03:08:23 zfs01 dhclient[40936]: dhc6: send_packet6() sent -1 of 52 b= ytes Mar 7 03:08:23 zfs01 kernel: nvme1: aborting outstanding admin command Mar 7 03:08:23 zfs01 kernel: nvme1: CREATE IO SQ (01) sqid:0 cid:24 nsid:1 cdw10:1cb72e58 cdw11:00000000 Mar 7 03:08:23 zfs01 kernel: nvme1: ABORTED - BY REQUEST (00/07) sqid:0 ci= d:15 cdw0:0 Mar 7 03:08:23 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:08:23 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:08:23 zfs01 syslogd: last message repeated 2 times Mar 7 03:08:53 zfs01 kernel: nvme1: WRITE sqid:1 cid:18 nsid:1 lba:3856318 len:64 Mar 7 03:08:53 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:08:53 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:09:15 zfs01 kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x58 Mar 7 03:09:16 zfs01 kernel: ena0: attempting to allocate 3 MSI-X vectors = (9 supported) Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 259 to local APIC 0 ve= ctor 52 Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 260 to local APIC 0 ve= ctor 53 Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 261 to local APIC 0 ve= ctor 54 Mar 7 03:09:16 zfs01 kernel: ena0: using IRQs 259-261 for MSI-X Mar 7 03:09:16 zfs01 kernel: ena0: device is going UP Mar 7 03:09:16 zfs01 kernel: ena0: link is UP Mar 7 03:10:30 zfs01 dhclient[40936]: send_packet6: Network is down Mar 7 03:10:30 zfs01 dhclient[40936]: dhc6: send_packet6() sent -1 of 52 b= ytes Mar 7 03:10:32 zfs01 dhclient[69248]: send_packet: Network is down Mar 7 03:11:16 zfs01 syslogd: last message repeated 4 times Mar 7 03:11:33 zfs01 syslogd: last message repeated 1 times Mar 7 03:13:31 zfs01 kernel: ena0: The number of lost tx completion is abo= ve the threshold (129 > 128). Reset the device Mar 7 03:13:31 zfs01 kernel: ena0: Trigger reset is on Mar 7 03:13:31 zfs01 kernel: ena0: device is going DOWN Mar 7 03:14:25 zfs01 kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x134 Mar 7 03:14:26 zfs01 kernel: ena0: attempting to allocate 3 MSI-X vectors = (9 supported) Mar 7 03:14:26 zfs01 kernel: msi: routing MSI-X IRQ 259 to local APIC 0 ve= ctor 52 root@zfs01:/usr/home/ruben # ls -lahtuT /etc/periodic/daily/ total 128 -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 450.status-security -rwxr-xr-x 1 root wheel 1.4K Mar 7 03:01:00 2020 440.status-mailq -rwxr-xr-x 1 root wheel 705B Mar 7 03:01:00 2020 430.status-uptime -rwxr-xr-x 1 root wheel 611B Mar 7 03:01:00 2020 420.status-network -rwxr-xr-x 1 root wheel 684B Mar 7 03:01:00 2020 410.status-mfi -rwxr-xr-x 1 root wheel 590B Mar 7 03:01:00 2020 409.status-gconcat -rwxr-xr-x 1 root wheel 590B Mar 7 03:01:00 2020 408.status-gstripe -rwxr-xr-x 1 root wheel 591B Mar 7 03:01:00 2020 407.status-graid3 -rwxr-xr-x 1 root wheel 596B Mar 7 03:01:00 2020 406.status-gmirror -rwxr-xr-x 1 root wheel 807B Mar 7 03:01:00 2020 404.status-zfs -rwxr-xr-x 1 root wheel 583B Mar 7 03:01:00 2020 401.status-graid -rwxr-xr-x 1 root wheel 773B Mar 7 03:01:00 2020 400.status-disks -rwxr-xr-x 1 root wheel 724B Mar 7 03:01:00 2020 330.news -r-xr-xr-x 1 root wheel 1.4K Mar 7 03:01:00 2020 310.accounting -rwxr-xr-x 1 root wheel 693B Mar 7 03:01:00 2020 300.calendar -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 210.backup-aliases -rwxr-xr-x 1 root wheel 1.7K Mar 7 03:01:00 2020 200.backup-passwd -rwxr-xr-x 1 root wheel 603B Mar 7 03:01:00 2020 150.clean-hoststat -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 140.clean-rwho -rwxr-xr-x 1 root wheel 709B Mar 7 03:01:00 2020 130.clean-msgs -rwxr-xr-x 1 root wheel 1.1K Mar 7 03:01:00 2020 120.clean-preserve -rwxr-xr-x 1 root wheel 1.5K Mar 7 03:01:00 2020 110.clean-tmps -rwxr-xr-x 1 root wheel 1.3K Mar 7 03:01:00 2020 100.clean-disks -rwxr-xr-x 1 root wheel 811B Mar 5 03:21:29 2020 999.local -rwxr-xr-x 1 root wheel 2.8K Mar 5 03:21:29 2020 800.scrub-zfs -rwxr-xr-x 1 root wheel 845B Mar 5 03:21:29 2020 510.status-world-kern= el -rwxr-xr-x 1 root wheel 737B Mar 5 03:21:29 2020 500.queuerun -rwxr-xr-x 1 root wheel 498B Mar 5 03:21:29 2020 480.status-ntpd -rwxr-xr-x 1 root wheel 451B Mar 5 03:03:36 2020 480.leapfile-ntpd -rwxr-xr-x 1 root wheel 2.0K Mar 5 03:03:18 2020 460.status-mail-rejec= ts drwxr-xr-x 2 root wheel 1.0K Dec 7 06:23:36 2018 . drwxr-xr-x 6 root wheel 512B Dec 7 06:23:36 2018 .. root@zfs01:/usr/home/ruben #=20 root@zfs01:/usr/home/ruben # ls -lahtuT /etc/periodic/security/ total 68 -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 900.tcpwrap -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 800.loginfail -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 700.kernelmsg -r--r--r-- 1 root wheel 2.8K Mar 7 03:01:48 2020 security.functions -rwxr-xr-x 1 root wheel 2.0K Mar 7 03:01:48 2020 610.ipf6denied -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:48 2020 550.ipfwlimit -rwxr-xr-x 1 root wheel 2.1K Mar 7 03:01:48 2020 520.pfdenied -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 510.ipfdenied -rwxr-xr-x 1 root wheel 2.0K Mar 7 03:01:48 2020 500.ipfwdenied -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 410.logincheck -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 400.passwdless -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 300.chkuid0 -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 200.chkmounts -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:25 2020 110.neggrpperm -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:00 2020 100.chksetuid drwxr-xr-x 2 root wheel 512B Dec 7 06:23:36 2018 . drwxr-xr-x 6 root wheel 512B Dec 7 06:23:36 2018 .. root@zfs01:/usr/home/ruben #=20 the NIC had been going up/down ever since (for 2 days) until a coworker rebooted it this morning. There does seem to be a relationship with the periodic framework, with the issues occuring at 03:05 while the last timestamp updates around 03:01 ... Will attach the verbose boot log. Feel free to request any additional detai= ls! Kind regards, Ruben --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-235856-27103-aDbQZ0VGDs>