Date: Thu, 30 Apr 2020 12:06:50 -0600 From: Warner Losh <imp@bsdimp.com> To: Stefan Bethke <stb@lassitu.de> Cc: freebsd-stable <freebsd-stable@freebsd.org> Subject: Re: nvme0 error Message-ID: <CANCZdfqKPH-xxd2AWnAFmq9opNs=_-7T2d=txCVPVHDsvFxQ_g@mail.gmail.com> In-Reply-To: <636DB3B3-E4C7-4A17-AB79-8AFDC6352712@lassitu.de> References: <636DB3B3-E4C7-4A17-AB79-8AFDC6352712@lassitu.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 30, 2020 at 11:48 AM Stefan Bethke <stb@lassitu.de> wrote: > nvme0: async event occurred (type 0x1, info 0x00, page 0x02) > nvme0: device reliability degraded > type 1: SMART event info 0: reliability error page 2: look at what's up here 1.4 standard says: NVM subsystem Reliability: NVM subsystem reliability has been compromised. This may be due to significant media errors, an internal error, the media being placed in read only mode, or a volatile memory backup device failing. This status value shall not be used if the read-only condition on the media is due to a change in the write protection state of a namespace (refer to section 8.19.1). Should I be concerned? I'm using this Samsung SSD as cache and log for ZFS > on a 12-stable machine. > > nvd0: <SAMSUNG MZVPW128HEGM-00000> NVMe namespace > nvd0: 122104MB (250069680 512 byte sectors) > > # nvmecontrol logpage -p 2 nvme0 > SMART/Health Information Log > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > Critical Warning State: 0x04 > Available spare: 0 > Temperature: 0 > Device reliability: 1 > Read only: 0 > Volatile memory backup: 0 > Temperature: 311 K, 37.85 C, 100.13 F > Available spare: 100 > Available spare threshold: 10 > Percentage used: 110 > Data units (512,000 byte) read: 18417596 > Data units written: 164091845 > Host read commands: 499986873 > Host write commands: 1491808067 > Controller busy time (minutes): 48315 > Power cycles: 59 > Power on hours: 20432 > Unsafe shutdowns: 26 > Media errors: 0 > No. error info log entries: 22 > Warning Temp Composite Time: 0 > Error Temp Composite Time: 0 > Temperature Sensor 1: 311 K, 37.85 C, 100.13 F > Temperature Sensor 2: 330 K, 56.85 C, 134.33 F > Temperature 1 Transition Count: 0 > Temperature 2 Transition Count: 0 > Total Time For Temperature 1: 0 > Total Time For Temperature 2: 0 > I'm thinking percent used 110 may be the thing it's alerting on, the standard says: Percentage Used: Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer=E2= =80=99s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state). Refer to the JEDEC JESD218A standard for SSD device life and endurance measurement techniques. Warner > > Stefan > > -- > Stefan Bethke <stb@lassitu.de> Fon +49 151 14070811 > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqKPH-xxd2AWnAFmq9opNs=_-7T2d=txCVPVHDsvFxQ_g>