From owner-freebsd-virtualization@freebsd.org Mon Mar 9 08:24:42 2020 Return-Path: Delivered-To: freebsd-virtualization@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id ADFF625BE0F for ; Mon, 9 Mar 2020 08:24:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 48bWVL3sSYz47Bg for ; Mon, 9 Mar 2020 08:24:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 4813B25BE0E; Mon, 9 Mar 2020 08:24:42 +0000 (UTC) Delivered-To: virtualization@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 45EF925BE0C for ; Mon, 9 Mar 2020 08:24:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48bWVK14xPz4786 for ; Mon, 9 Mar 2020 08:24:41 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 9A6F17FF9 for ; Mon, 9 Mar 2020 08:24:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 0298OeGj051676 for ; Mon, 9 Mar 2020 08:24:40 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 0298OeQ5051675 for virtualization@FreeBSD.org; Mon, 9 Mar 2020 08:24:40 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 235856] FreeBSD freezes on AWS EC2 t3 machines Date: Mon, 09 Mar 2020 08:24:40 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mail@rubenvos.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: virtualization@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Mar 2020 08:24:42 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D235856 --- Comment #36 from mail@rubenvos.com --- Hi, This weekend the issue manifested itself again on one of our 12.1 instances (with an EBS volume attached): Mar 7 03:05:47 zfs01 kernel: nvme1: cpl does not map to outstanding cmd Mar 7 03:05:47 zfs01 kernel: cdw0:00000000 sqhd:001b sqid:0001 cid:001b p:0 sc:00 sct:0 m:0 dnr:0 Mar 7 03:05:47 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:05:47 zfs01 kernel: nvme1: resetting controller Mar 7 03:05:47 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:05:47 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:06:18 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:06:48 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:06:48 zfs01 kernel: nvme1: resetting controller Mar 7 03:06:48 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:06:48 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:06:48 zfs01 syslogd: last message repeated 5 times Mar 7 03:07:20 zfs01 kernel: nvme1: VERIFY sqid:1 cid:27 nsid:0 lba:0 len:1 Mar 7 03:07:20 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:07:20 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:07:20 zfs01 kernel: nvme1: VERIFY sqid:1 cid:27 nsid:0 lba:0 len:1 Mar 7 03:07:20 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:08:23 zfs01 kernel: ena0: The number of lost tx completion is abo= ve the threshold (129 > 128). Reset the device Mar 7 03:08:23 zfs01 kernel: ena0: Trigger reset is on Mar 7 03:08:23 zfs01 kernel: ena0: device is going DOWN Mar 7 03:08:23 zfs01 kernel: nvme1: Resetting controller due to a timeout. Mar 7 03:08:23 zfs01 kernel: nvme1: resetting controller Mar 7 03:08:23 zfs01 dhclient[40936]: send_packet6: Network is down Mar 7 03:08:23 zfs01 dhclient[40936]: dhc6: send_packet6() sent -1 of 52 b= ytes Mar 7 03:08:23 zfs01 kernel: nvme1: aborting outstanding admin command Mar 7 03:08:23 zfs01 kernel: nvme1: CREATE IO SQ (01) sqid:0 cid:24 nsid:1 cdw10:1cb72e58 cdw11:00000000 Mar 7 03:08:23 zfs01 kernel: nvme1: ABORTED - BY REQUEST (00/07) sqid:0 ci= d:15 cdw0:0 Mar 7 03:08:23 zfs01 kernel: nvme1: temperature threshold not supported Mar 7 03:08:23 zfs01 kernel: nvme1: aborting outstanding i/o Mar 7 03:08:23 zfs01 syslogd: last message repeated 2 times Mar 7 03:08:53 zfs01 kernel: nvme1: WRITE sqid:1 cid:18 nsid:1 lba:3856318 len:64 Mar 7 03:08:53 zfs01 kernel: nvme1: INVALID OPCODE (00/01) sqid:1 cid:27 cdw0:0 Mar 7 03:08:53 zfs01 kernel: nvme1: Missing interrupt Mar 7 03:09:15 zfs01 kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x58 Mar 7 03:09:16 zfs01 kernel: ena0: attempting to allocate 3 MSI-X vectors = (9 supported) Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 259 to local APIC 0 ve= ctor 52 Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 260 to local APIC 0 ve= ctor 53 Mar 7 03:09:16 zfs01 kernel: msi: routing MSI-X IRQ 261 to local APIC 0 ve= ctor 54 Mar 7 03:09:16 zfs01 kernel: ena0: using IRQs 259-261 for MSI-X Mar 7 03:09:16 zfs01 kernel: ena0: device is going UP Mar 7 03:09:16 zfs01 kernel: ena0: link is UP Mar 7 03:10:30 zfs01 dhclient[40936]: send_packet6: Network is down Mar 7 03:10:30 zfs01 dhclient[40936]: dhc6: send_packet6() sent -1 of 52 b= ytes Mar 7 03:10:32 zfs01 dhclient[69248]: send_packet: Network is down Mar 7 03:11:16 zfs01 syslogd: last message repeated 4 times Mar 7 03:11:33 zfs01 syslogd: last message repeated 1 times Mar 7 03:13:31 zfs01 kernel: ena0: The number of lost tx completion is abo= ve the threshold (129 > 128). Reset the device Mar 7 03:13:31 zfs01 kernel: ena0: Trigger reset is on Mar 7 03:13:31 zfs01 kernel: ena0: device is going DOWN Mar 7 03:14:25 zfs01 kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x134 Mar 7 03:14:26 zfs01 kernel: ena0: attempting to allocate 3 MSI-X vectors = (9 supported) Mar 7 03:14:26 zfs01 kernel: msi: routing MSI-X IRQ 259 to local APIC 0 ve= ctor 52 root@zfs01:/usr/home/ruben # ls -lahtuT /etc/periodic/daily/ total 128 -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 450.status-security -rwxr-xr-x 1 root wheel 1.4K Mar 7 03:01:00 2020 440.status-mailq -rwxr-xr-x 1 root wheel 705B Mar 7 03:01:00 2020 430.status-uptime -rwxr-xr-x 1 root wheel 611B Mar 7 03:01:00 2020 420.status-network -rwxr-xr-x 1 root wheel 684B Mar 7 03:01:00 2020 410.status-mfi -rwxr-xr-x 1 root wheel 590B Mar 7 03:01:00 2020 409.status-gconcat -rwxr-xr-x 1 root wheel 590B Mar 7 03:01:00 2020 408.status-gstripe -rwxr-xr-x 1 root wheel 591B Mar 7 03:01:00 2020 407.status-graid3 -rwxr-xr-x 1 root wheel 596B Mar 7 03:01:00 2020 406.status-gmirror -rwxr-xr-x 1 root wheel 807B Mar 7 03:01:00 2020 404.status-zfs -rwxr-xr-x 1 root wheel 583B Mar 7 03:01:00 2020 401.status-graid -rwxr-xr-x 1 root wheel 773B Mar 7 03:01:00 2020 400.status-disks -rwxr-xr-x 1 root wheel 724B Mar 7 03:01:00 2020 330.news -r-xr-xr-x 1 root wheel 1.4K Mar 7 03:01:00 2020 310.accounting -rwxr-xr-x 1 root wheel 693B Mar 7 03:01:00 2020 300.calendar -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 210.backup-aliases -rwxr-xr-x 1 root wheel 1.7K Mar 7 03:01:00 2020 200.backup-passwd -rwxr-xr-x 1 root wheel 603B Mar 7 03:01:00 2020 150.clean-hoststat -rwxr-xr-x 1 root wheel 1.0K Mar 7 03:01:00 2020 140.clean-rwho -rwxr-xr-x 1 root wheel 709B Mar 7 03:01:00 2020 130.clean-msgs -rwxr-xr-x 1 root wheel 1.1K Mar 7 03:01:00 2020 120.clean-preserve -rwxr-xr-x 1 root wheel 1.5K Mar 7 03:01:00 2020 110.clean-tmps -rwxr-xr-x 1 root wheel 1.3K Mar 7 03:01:00 2020 100.clean-disks -rwxr-xr-x 1 root wheel 811B Mar 5 03:21:29 2020 999.local -rwxr-xr-x 1 root wheel 2.8K Mar 5 03:21:29 2020 800.scrub-zfs -rwxr-xr-x 1 root wheel 845B Mar 5 03:21:29 2020 510.status-world-kern= el -rwxr-xr-x 1 root wheel 737B Mar 5 03:21:29 2020 500.queuerun -rwxr-xr-x 1 root wheel 498B Mar 5 03:21:29 2020 480.status-ntpd -rwxr-xr-x 1 root wheel 451B Mar 5 03:03:36 2020 480.leapfile-ntpd -rwxr-xr-x 1 root wheel 2.0K Mar 5 03:03:18 2020 460.status-mail-rejec= ts drwxr-xr-x 2 root wheel 1.0K Dec 7 06:23:36 2018 . drwxr-xr-x 6 root wheel 512B Dec 7 06:23:36 2018 .. root@zfs01:/usr/home/ruben #=20 root@zfs01:/usr/home/ruben # ls -lahtuT /etc/periodic/security/ total 68 -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 900.tcpwrap -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 800.loginfail -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 700.kernelmsg -r--r--r-- 1 root wheel 2.8K Mar 7 03:01:48 2020 security.functions -rwxr-xr-x 1 root wheel 2.0K Mar 7 03:01:48 2020 610.ipf6denied -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:48 2020 550.ipfwlimit -rwxr-xr-x 1 root wheel 2.1K Mar 7 03:01:48 2020 520.pfdenied -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 510.ipfdenied -rwxr-xr-x 1 root wheel 2.0K Mar 7 03:01:48 2020 500.ipfwdenied -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 410.logincheck -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 400.passwdless -rwxr-xr-x 1 root wheel 1.9K Mar 7 03:01:48 2020 300.chkuid0 -rwxr-xr-x 1 root wheel 2.3K Mar 7 03:01:48 2020 200.chkmounts -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:25 2020 110.neggrpperm -rwxr-xr-x 1 root wheel 2.2K Mar 7 03:01:00 2020 100.chksetuid drwxr-xr-x 2 root wheel 512B Dec 7 06:23:36 2018 . drwxr-xr-x 6 root wheel 512B Dec 7 06:23:36 2018 .. root@zfs01:/usr/home/ruben #=20 the NIC had been going up/down ever since (for 2 days) until a coworker rebooted it this morning. There does seem to be a relationship with the periodic framework, with the issues occuring at 03:05 while the last timestamp updates around 03:01 ... Will attach the verbose boot log. Feel free to request any additional detai= ls! Kind regards, Ruben --=20 You are receiving this mail because: You are the assignee for the bug.=