Date: Thu, 3 May 2018 22:33:53 -0600 From: Warner Losh <imp@bsdimp.com> To: Craig Leres <leres@freebsd.org> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: nvme0: async event occurred (log page id=0x2) Message-ID: <CANCZdfqJW377rx%2Bxz2OzTdXFVNgi7kcqRbDg7h1iK1DE7n%2B3_g@mail.gmail.com> In-Reply-To: <8b1eadc2-8c9d-3f11-b877-b9a0a57512ec@freebsd.org> References: <960be682-9991-f8c6-0253-7d6f782d4cbe@freebsd.org> <CANCZdfrTD9Jw%2BrnnvoCdSTLaKnh8a=gsz6wMF40HgaU0E1i8=A@mail.gmail.com> <8b1eadc2-8c9d-3f11-b877-b9a0a57512ec@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 3, 2018 at 10:28 PM, Craig Leres <leres@freebsd.org> wrote: > On 5/3/2018 9:07 PM, Warner Losh wrote: > > Async events are 'something went wrong' messages. Log page 2 is the > > smart log page. > > > > what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this > > happens. My guess is that it's overheating. > > Interesting. I try to run smartd anywhere it's supported and have > appended the last few entries before things went sideways; 60=C2=B0 C/140= =C2=B0 F > is a bit toasty! > > This system is a couple of years old, might be time to blow the dust out > with compressed air and see if the bios has more aggressive fan settings. > > Is the Raw_Read_Error_Rate changed a problem? > > (Thanks!) > > Craig > > May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage > Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60 > May 3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage > Attribute: 194 Temperature_Celsius changed from 41 to 40 > May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage > Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58 > May 3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage > Attribute: 194 Temperature_Celsius changed from 40 to 42 > May 3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure > Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76 > Things are getting hot, and there was a recoverable error (since you didn't report a read error, though you could also check page 1 for any errors). Chances are the controller shut down completely (though from just a few data points you've given aren't enough for me to be sure). Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqJW377rx%2Bxz2OzTdXFVNgi7kcqRbDg7h1iK1DE7n%2B3_g>