Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 May 2018 22:33:53 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Craig Leres <leres@freebsd.org>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: nvme0: async event occurred (log page id=0x2)
Message-ID:  <CANCZdfqJW377rx%2Bxz2OzTdXFVNgi7kcqRbDg7h1iK1DE7n%2B3_g@mail.gmail.com>
In-Reply-To: <8b1eadc2-8c9d-3f11-b877-b9a0a57512ec@freebsd.org>
References:  <960be682-9991-f8c6-0253-7d6f782d4cbe@freebsd.org> <CANCZdfrTD9Jw%2BrnnvoCdSTLaKnh8a=gsz6wMF40HgaU0E1i8=A@mail.gmail.com> <8b1eadc2-8c9d-3f11-b877-b9a0a57512ec@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 3, 2018 at 10:28 PM, Craig Leres <leres@freebsd.org> wrote:

> On 5/3/2018 9:07 PM, Warner Losh wrote:
> > Async events are 'something went wrong' messages. Log page 2 is the
> > smart log page.
> >
> > what does 'nvmecontrol logpage -p 2 nvme0' tell you right after this
> > happens.  My guess is that it's overheating.
>
> Interesting. I try to run smartd anywhere it's supported and have
> appended the last few entries before things went sideways; 60=C2=B0 C/140=
=C2=B0 F
> is a bit toasty!
>
> This system is a couple of years old, might be time to blow the dust out
> with compressed air and see if the bios has more aggressive fan settings.
>
> Is the Raw_Read_Error_Rate changed a problem?
>
> (Thanks!)
>
>                 Craig
>
> May  3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 59 to 60
> May  3 13:59:22 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 41 to 40
> May  3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 60 to 58
> May  3 14:59:23 tiny smartd[770]: Device: /dev/ada0, SMART Usage
> Attribute: 194 Temperature_Celsius changed from 40 to 42
> May  3 17:29:23 tiny smartd[770]: Device: /dev/ada0, SMART Prefailure
> Attribute: 1 Raw_Read_Error_Rate changed from 75 to 76
>

Things are getting hot, and there was a recoverable error (since you didn't
report a read error, though you could also check page 1 for any errors).
Chances are the controller shut down completely (though from just a few
data points you've given aren't enough for me to be sure).

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqJW377rx%2Bxz2OzTdXFVNgi7kcqRbDg7h1iK1DE7n%2B3_g>