From owner-freebsd-bugs@freebsd.org Wed Jan 22 23:28:23 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 12168221FAE for ; Wed, 22 Jan 2020 23:28:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4831mk6mLkz3C5s for ; Wed, 22 Jan 2020 23:28:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id E8361221FAD; Wed, 22 Jan 2020 23:28:22 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E7FD0221FAC for ; Wed, 22 Jan 2020 23:28:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4831mk5tcqz3C5r for ; Wed, 22 Jan 2020 23:28:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id C56FE5596 for ; Wed, 22 Jan 2020 23:28:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 00MNSMUQ017918 for ; Wed, 22 Jan 2020 23:28:22 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 00MNSME2017917 for bugs@FreeBSD.org; Wed, 22 Jan 2020 23:28:22 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 243531] Unstable ena and nvme on AWS Date: Wed, 22 Jan 2020 23:28:22 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: leif@ofWilsonCreek.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jan 2020 23:28:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D243531 Bug ID: 243531 Summary: Unstable ena and nvme on AWS Product: Base System Version: 12.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: leif@ofWilsonCreek.com We just recently upgraded our systems on AWS to 12.1, and we're seeing erro= rs with nvme0 and ena0. Typically, these errors manifest together. My sample s= ize is 34 instances, all t3.medium or r4.large. I have others that are t2.small, and those are fine being that they don't use ena or nvme drivers. Of the 34, there's a group of 15 that are almost idle, and never throw these errors. Of the remaining that have load, only 5 ran without errors and 14 of them threw these errors scattered about various times in the last 4.5 days. 3 machines have crashed during this period, so the errors seem to often be nonfatal, b= ut not always. So based on that it seems load related, and does not seem isola= ted to an occasional hardware problem. Here's a sample of the log from one of the machines that crashed: Jan 22 00:17:05 jdas-dev kernel: nvme0: cpl does not map to outstanding cmd Jan 22 00:17:05 jdas-dev kernel: cdw0:00000000 sqhd:000d sqid:0001 cid:0012= p:1 sc:00 sct:0 m:0 dnr:0 Jan 22 00:17:05 jdas-dev kernel: nvme0: Resetting controller due to a timeo= ut. Jan 22 00:17:05 jdas-dev kernel: nvme0: resetting controller Jan 22 00:17:05 jdas-dev kernel: nvme0: temperature threshold not supported Jan 22 00:17:05 jdas-dev kernel: nvme0: aborting outstanding i/o Jan 22 00:17:05 jdas-dev kernel: nvme0: resubmitting queued i/o Jan 22 00:17:05 jdas-dev kernel: nvme0: WRITE sqid:2 cid:0 nsid:1 lba:69175= 6520 len:8 Jan 22 00:17:21 jdas-dev kernel: ena0: The number of lost tx completion is above the threshold (129 > 128). Reset the device Jan 22 00:17:21 jdas-dev kernel: ena0: Trigger reset is on Jan 22 00:17:21 jdas-dev kernel: ena0: device is going DOWN Jan 22 00:17:21 jdas-dev kernel: ena0: free uncompleted tx mbuf qid 0 idx 0= x154 Jan 22 00:17:22 jdas-dev kernel: ena0: device is going UP Jan 22 00:17:22 jdas-dev kernel: ena0: link is UP Jan 22 00:17:52 jdas-dev kernel: nvme0: Missing interrupt Jan 22 00:18:46 jdas-dev kernel: nvme0: Missing interrupt Jan 22 00:19:34 jdas-dev kernel: nvme0: cpl does not map to outstanding cmd Jan 22 00:19:34 jdas-dev kernel: cdw0:00000000 sqhd:001a sqid:0002 cid:001b= p:0 sc:00 sct:0 m:0 dnr:0 Jan 22 00:19:34 jdas-dev kernel: nvme0: Resetting controller due to a timeo= ut. Jan 22 00:19:34 jdas-dev kernel: nvme0: resetting controller Jan 22 00:19:35 jdas-dev kernel: nvme0: temperature threshold not supported Jan 22 00:19:35 jdas-dev kernel: nvme0: resubmitting queued i/o Jan 22 00:19:35 jdas-dev kernel: nvme0: WRITE sqid:1 cid:0 nsid:1 lba:40505= 5248 len:8 Jan 22 00:19:35 jdas-dev kernel: nvme0: aborting outstanding i/o At this point, we rebooted the machine. Jan 22 09:02:30 jdas-dev kernel: nvme0: Resetting controller due to a timeo= ut. Jan 22 09:02:30 jdas-dev kernel: nvme0: resetting controller Jan 22 09:02:30 jdas-dev kernel: nvme0: temperature threshold not supported Jan 22 09:02:30 jdas-dev kernel: nvme0: aborting outstanding i/o Jan 22 09:02:30 jdas-dev kernel: nvme0: DATASET MANAGEMENT sqid:2 cid:27 ns= id:0 Jan 22 09:02:30 jdas-dev kernel: nvme0: INVALID OPCODE (00/01) sqid:2 cid:27 cdw0:0 Jan 22 09:02:30 jdas-dev kernel: ena0: Keep alive watchdog timeout. Jan 22 09:02:30 jdas-dev kernel: ena0: Trigger reset is on Jan 22 09:02:30 jdas-dev kernel: ena0: device is going DOWN Jan 22 09:02:30 jdas-dev kernel: ena0: ena0: device is going UP Jan 22 09:02:30 jdas-dev kernel: link is UP Jan 22 09:02:30 jdas-dev kernel: ena0: Keep alive watchdog timeout. Jan 22 09:02:30 jdas-dev kernel: ena0: Trigger reset is on Jan 22 09:02:30 jdas-dev kernel: ena0: device is going DOWN Jan 22 09:02:30 jdas-dev kernel: ena0: ena0: device is going UP Jan 22 09:02:30 jdas-dev kernel: link is UP Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: 90 second watchdog timeout expired. Shutdo= wn terminated. Jan 22 09:02:30 jdas-dev kernel: Wed Jan 22 08:58:58 CST 2020 Jan 22 09:02:30 jdas-dev kernel: 2020-01-22T08:59:01.658827-06:00 jdas-dev.aws0.pla-net.cc init 1 - - /etc/rc.shutdown terminated abnormally, going to single user mode Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: 2020-01-22T08:59:23.108170-06:00 jdas-dev.aws0.pla-net.cc init 1 - - some processes would not die; ps axl advised Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21854, size: 57344 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 372873, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 118775, size: 8192 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 315370, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 312763, size: 4096 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 445, size: 12288 Jan 22 09:02:30 jdas-dev kernel: swap_pager: indefinite wait buffer: bufobj= : 0, blkno: 21337, size: 4096 Jan 22 09:02:30 jdas-dev kernel: ---<>--- Jan 22 09:02:30 jdas-dev kernel: Copyright (c) 1992-2019 The FreeBSD Projec= t. Jan 22 09:02:30 jdas-dev kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 --=20 You are receiving this mail because: You are the assignee for the bug.=