Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Jul 2019 12:37:57 -0700
From:      Ravi Pokala <rpokala@freebsd.org>
To:        <wojtek@puchar.net>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: please help translate smartctl output to human language 
Message-ID:  <3082DC9C-9D05-499F-A4FE-712338A32D14@freebsd.org>

next in thread | raw e-mail | index | archive | help
Hi Wojciech,

> i am interested how much write-wear does my samsung SSD experienced relative to maximum allowed.
> 
> on my 500GB samsung SSD smartctl says
> 
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
>   9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       31126
>  12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       59
> 177 Wear_Leveling_Count     0x0013   095   095   000    Pre-fail  Always       -       88
> 179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
> 181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
> 182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
> 183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
> 187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
> 190 Airflow_Temperature_Cel 0x0032   073   051   000    Old_age   Always       -       27
> 195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
> 199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
> 235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       28
> 241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       115175140988
> 
> All seems fine but i'm not sure if i correctly understand VALUE, WORST, THRESH data for Total_LBAs_Written

For (S)ATA SMART in general, the way it works is that "VALUE" is a normalized representation, with higher values being better than lower values. Depending on the vendor, the starting value might be 253 (aka 0xff, minus a few reserved values), 200, or 100 (aka a percentage). Or, in the case of temperatures, the value of "VALUE" is usually (100 - current temperature); in the example above, that's (100 - 27) => 73.

"WORST" is the lowest value of "VALUE" that the device has recorded. Some attributes are related to performance or short-term metrics, so the value of "VALUE" might increase and decrease over time; in that case, "WORST" is somewhat useful. Other attributes are related to usage and wear, so the value of "VALUE" will only ever decrease; in those cases, "WORST" is not very useful because it will always be the same as "VALUE".

"THRESH" is the failure threshold for the attribute; *if* the attribute is marked "Pre-fail", and *if* the value of "VALUE" is lower than the value of "THRESH", *then* the overall SMART status will be reported as failed.

In the data above, everything looks quite good; even the lowest values for "WORST" are above 90. (Except the temperature, which as described above is a little different; in this case, it looks like the highest temperature the device has seen is 49C, which isn't great, but isn't terrible.)

> 50TB was written, so it's 100 times capacity. taking some write amplification in account (i use geli so no in drive compression have effect) it would be probably like 150-200.

Nowadays, SSDs are usually rated in terms of "Device Writes per Day" (DWPD); for a device rated at 3DWPD with a 3-year warranty, the vendor is saying that it can handle writes equivalent to (3 * 3 * 365) = 3285 complete overwrites of the device. In the case of this 500GB device, that would be roughly 1.5PB of writes.

Assuming this device uses 512B logical sectors, 115175140988 LBAs written would be ~54TB, which is ~3.3% of the total writes.

> Value is 99. It was 100 when i bought it.
> 
> Does it mean that in is 1% worn and can take 100 times more writes until it fails? or i am too optimistic?

There are a few reasons why the calculated wear (~3.3%) and the reported wear (100% - 99%) might differ. For starters, it's not clear if that value is the number of LBAs written by the host, or the number LBAs written to the NAND; it's possible for a request to write a single block to trigger remapping and garbage collection, resulting in write amplification. Conversely, some drives might detect when a block is being zeroed out, and might simply put a flag on the LBA and mark the underlying NAND as obsolete and ready for erasure, resulting in write suppression.

In any case, the bottom line I see here is that this device doesn't seem anywhere near wearout.

-Ravi (rpokala@)





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3082DC9C-9D05-499F-A4FE-712338A32D14>