From nobody Tue Feb 13 00:28:10 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TYhxq6BLgz59Jnw for ; Tue, 13 Feb 2024 00:28:15 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TYhxq4l2Rz4dpS for ; Tue, 13 Feb 2024 00:28:15 +0000 (UTC) (envelope-from truckman@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1707784095; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=Ry4yxrGADBulMAWnHSuvKz9pQAwLBG3AwRiqUeBneUU=; b=P5mBOtt0L6scTQTTOBf6MVvxbzbvUB/s+hVN+xcF3H/dI0Z5pe0s1GlKL/vnEBbfORnv5v Uwcu8v4jbWEJ++aA02beXMHGZNEHNlc6DXJ83BOxKc3MGPwYaO4m4S3M8+GF12XkxRJ05c OniHYJRf37Sa28oxXMqqDoFdNFaklxD2fhs1mHdq3z5Hsja5xAnO4XjxGooefZpGmUuwd5 y048/5H1y16fXqHemX/NLhVyVsricy29hQ+QqntgnJIh3EM57ZLD+2GdKB6x+tqTb5aR3y AMpEarLs1dmGwmki59aIjM2nyh6mZ8IOzEp13hT0ZeAZRN+L1KCByBpGWAh8Cw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1707784095; a=rsa-sha256; cv=none; b=DWmPlLAcS9O+FMG4KBNBMKaWzwu/iYfhrN4KQLDDmtmU/X2cD969A1I7WVhcwSMJ4wUKyx QN625Rd5yOlfwTJ2h0lfxcSsxLRFrVXX5dUTiLJ9+s87CMJDfFn0Lpzv8ryUNU2rKcj+th J4/sd5/+XMGckiqWYbVQYVXaI43gMm8UOM3l/VTfMxO82Po7GQMF0QOnQzScFDwuh0xcpy 2UINlX4y78J7IshtAjH9Xn4FbidvHfskQQA34driW/0kELGyhTWIpUip5D97tTui7N2LeN wLfZAUsDWpADD1nkFS39q6smsC3kJVh3AAuc1SrHC0j96wnBqIIrzAe/e7Qtgw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1707784095; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=Ry4yxrGADBulMAWnHSuvKz9pQAwLBG3AwRiqUeBneUU=; b=Zis9FWPoy+7dgrwBuu02dk7tWkSJG6HKoUM8EHOIPC1FdqqYsB+hJNpm6hmesQQdoeD0/U VBzrzwvD5QMbcVMWPcthNklJszuX/VuTLbMK0G00Y42VFeYIIZh4d37yooWEntmTj9LGJ4 RNCHIdLVV/fvf+IstX0ORh/ZZZxx4ycwaUEZAvsaknvXdeIKyHkHyl8nwg7HViNxG8RxfH fASCCRusOZp2gvJWZ6FDHburpsmOBr2Krzi0OB6XpidSSVQwJmcyIUGSDeD/PtOErrdDip DlAk/UuSgu91P3a8bHSkbLTyoXDzMbpT7jp+x6ysO0kYgT8o1InIRyLQG0LHDA== Received: from gw.catspoiler.org (unknown [IPv6:2602:304:cd45:5b11::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: truckman) by smtp.freebsd.org (Postfix) with ESMTPSA id 4TYhxq1VJnzcJM for ; Tue, 13 Feb 2024 00:28:15 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from dl (uid 1001) (envelope-from truckman@FreeBSD.org) id 22de5f by gw.catspoiler.org (DragonFly Mail Agent v0.13 on mousie.catspoiler.org); Mon, 12 Feb 2024 16:28:12 -0800 Date: Mon, 12 Feb 2024 16:28:10 -0800 (PST) From: Don Lewis Subject: nvme controller reset failures on recent -CURRENT To: FreeBSD current , John Baldwin Message-ID: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Content-Disposition: INLINE I just upgraded my package build machine to: FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e from: FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 and I've had two nvme-triggered panics in the last day. nvme is being used for swap and L2ARC. I'm not able to get a crash dump, probably because the nvme device has gone away and I get an error about not having a dump device. It looks like a low-memory panic because free memory is low and zfs is calling malloc(). This shows up in the log leading up to the panic: Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: resetting controller Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete Feb 12 10:07:41 zipper syslogd: last message repeated 2 times Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti meout. The device looks healthy to me: SMART/Health Information Log ============================ Critical Warning State: 0x00 Available spare: 0 Temperature: 0 Device reliability: 0 Read only: 0 Volatile memory backup: 0 Temperature: 312 K, 38.85 C, 101.93 F Available spare: 100 Available spare threshold: 10 Percentage used: 3 Data units (512,000 byte) read: 5761183 Data units written: 29911502 Host read commands: 471921188 Host write commands: 605394753 Controller busy time (minutes): 32359 Power cycles: 110 Power on hours: 19297 Unsafe shutdowns: 14 Media errors: 0 No. error info log entries: 0 Warning Temp Composite Time: 0 Error Temp Composite Time: 0 Temperature 1 Transition Count: 5231 Temperature 2 Transition Count: 0 Total Time For Temperature 1: 41213 Total Time For Temperature 2: 0