Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 05 Jul 2021 15:37:03 +0000
From:      Daniel Lysfjord via stable <stable@freebsd.org>
To:        stable@freebsd.org
Subject:   Re: ZFS + mysql appears to be killing my SSD's
Message-ID:  <5ff412bc5b0d1f83284895911456ee97@smokepit.net>
In-Reply-To: <f15cfc5a-e3c0-f1a7-c123-d369db9bc199@denninger.net>
References:  <f15cfc5a-e3c0-f1a7-c123-d369db9bc199@denninger.net> <89c37c3e-22e8-006e-5826-33bd7db7739e@ingresso.co.uk> <2fd9b7e4-dc75-fedc-28d7-b98191167e6b@freebsd.org> <9c71d627-55b8-2464-6cc9-489e4ce98049@ingresso.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
"Karl Denninger" <karl@denninger.net> skrev 5. juli 2021 kl. 17:10:

> On 7/5/2021 10:30, Pete French wrote:
>=20
>>=20On 05/07/2021 14:37, Stefan Esser wrote:
>>> Hi Pete,
>>>=20
>>>=20have you checked the drive state and statistics with smartctl?
>>=20
>>=20Hi, thanks for the reply - yes, I did check the statistics, and they
>> dont make a lot of sense. I was just looking at them again in fact.
>>=20
>>=20So, one of the machines that we chnaged a drive on when this first
>> started, which was 4 weeks ago.
>>=20
>>=20root@telehouse04:/home/webadmin # smartctl -a /dev/ada0 | grep Perc
>> 169 Remaining_Lifetime_Perc 0x0000   082   082   000    Old_age
>> Offline      -       82
>> root@telehouse04:/home/webadmin # smartctl -a /dev/ada1 | grep Perc
>> 202 Percent_Lifetime_Remain 0x0030   100   100   001    Old_age
>> Offline      -       0
>>=20
>>=20Now, from that you might think the 2nd drive was the one changes, bu=
t
>> no. Its the first one, which is now at 82% lifetime remaining! The
>> other druve, still at 100%, has been in there a year. The drives are
>> different manufacturers, which makes comparing most of the numbers
>> tricky unfortunately.
>>=20
>>=20Am now even more worried than when I sent the first email - if that
>> 18% is accurate then I am going to be doing this again in another 4
>> months, and thats not sustainable. It also looks as if this problem
>> has got a lot worse recently. Though I wasnt looking at the numbers
>> before, only noticing tyhe failurses. If I look at 'Percentage Used
>> Endurance Indicator' isntead of the 'Percent_Lifetime_Remain' value
>> then I see some of those well over 200%. That value is, on the newer
>> drives, 100 minus the 'Percent_Lifetime_Remain' value, so I guess they
>> ahve the same underlying metric.
>>=20
>>=20I didnt mention in my original email, but I am encrypting these with
>> geli. Does geli do any write amplification at all ? That might explain
>> the high write volumes...
>>=20
>>=20-pete.
>=20
>=20As noted elsewhere assuming ashift=3D12 the answer on write amplifica=
tion
> is no.
>=20
>=20Geli should be initialized with -s 4096; I'm assuming you did that?
>=20
>=20I have a 5-unit geli-encrypted root pool, all Intel 240gb SSDs. They =
do
> not report remaining lifetime via smart but do report indications of
> trouble.  Here's one example snippet from one of the drives in that poo=
l:
>=20
>=20SMART Attributes Data Structure revision number: 1
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
> 5 Reallocated_Sector_Ct   -O--CK   098   098   000    -    0
> 9 Power_On_Hours          -O--CK   100   100   000    - 53264
> 12 Power_Cycle_Count       -O--CK   100   100   000    -    100
> 170 Available_Reservd_Space PO--CK   100   100   010    -    0
> 171 Program_Fail_Count      -O--CK   100   100   000    -    0
> 172 Erase_Fail_Count        -O--CK   100   100   000    -    0
> 174 Unsafe_Shutdown_Count   -O--CK   100   100   000    -    41
> 175 Power_Loss_Cap_Test     PO--CK   100   100   010    -    631 (295 5=
442)
> 183 SATA_Downshift_Count    -O--CK   100   100   000    -    0
> 184 End-to-End_Error        PO--CK   100   100   090    -    0
> 187 Reported_Uncorrect      -O--CK   100   100   000    -    0
> 190 Temperature_Case        -O---K   068   063   000    -    32 (Min/Ma=
x
> 29/37)
> 192 Unsafe_Shutdown_Count   -O--CK   100   100   000    -    41
> 194 Temperature_Internal    -O---K   100   100   000    -    32
> 197 Current_Pending_Sector  -O--CK   100   100   000    -    0
> 199 CRC_Error_Count         -OSRCK   100   100   000    -    0
> 225 Host_Writes_32MiB       -O--CK   100   100   000    - 1811548
> 226 Workld_Media_Wear_Indic -O--CK   100   100   000    -    205
> 227 Workld_Host_Reads_Perc  -O--CK   100   100   000    -    49
> 228 Workload_Minutes        -O--CK   100   100   000    - 55841
> 232 Available_Reservd_Space PO--CK   100   100   010    -    0
> 233 Media_Wearout_Indicator -O--CK   089   089   000    -    0
> 234 Thermal_Throttle        -O--CK   100   100   000    -    0/0
> 241 Host_Writes_32MiB       -O--CK   100   100   000    - 1811548
> 242 Host_Reads_32MiB        -O--CK   100   100   000    - 1423217
> ||||||_ K auto-keep
> |||||__ C event count
> ||||___ R error rate
> |||____ S speed/performance
> ||_____ O updated online
> |______ P prefailure warning
>=20
>=20Device Statistics (GP Log 0x04)
> Page  Offset Size        Value Flags Description
> 0x01  =3D=3D=3D=3D=3D  =3D               =3D  =3D=3D=3D  =3D=3D General=
 Statistics (rev 2) =3D=3D
> 0x01  0x008  4             100  ---  Lifetime Power-On Resets
> 0x01  0x018  6    118722148102  ---  Logical Sectors Written
> 0x01  0x020  6        89033895  ---  Number of Write Commands
> 0x01  0x028  6     93271951909  ---  Logical Sectors Read
> 0x01  0x030  6         6797990  ---  Number of Read Commands
>=20
>=206 years in-use, roughly, and no indication of anything going on in te=
rms
> of warnings about utilization or wear-out.  There is a MYSQL database o=
n
> this box used by Cacti that is running all the time and while the
> traffic is real high, it's there (there is also a Postgres server
> running on there that sees some traffic too.)  These specific drives
> were selected due to having power-fail protection for data in-flight --
> they were one of only a few that I've tested which passed a "pull the
> cord" test even though they're actually the 730s, NOT the "DC" series.
>=20
>=20Raidz2 configuration:
>=20
>=20root@NewFS:/home/karl # zpool status zsr
> pool: zsr
> state: ONLINE
> scan: scrub repaired 0 in 0 days 00:07:05 with 0 errors on Mon Jun 28
> 03:43:58 2021
> config:
>=20
>=20NAME            STATE     READ WRITE CKSUM
> zsr             ONLINE       0     0     0
> raidz2-0      ONLINE       0     0     0
> ada0p4.eli  ONLINE       0     0     0
> ada1p4.eli  ONLINE       0     0     0
> ada2p4.eli  ONLINE       0     0     0
> ada3p4.eli  ONLINE       0     0     0
> ada4p4.eli  ONLINE       0     0     0
>=20
>=20errors: No known data errors
>=20
>=20Micron appears to be the only people making suitable replacements if =
and
> when these do start to fail on me, but from what I see here it will be =
a
> good while yet.
>=20
>=20--
> --
> Karl Denninger
> karl@denninger.net <karl@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/

Running MariaDB and PostgreSQL with FreeBSD 12.2 on a couple of Samsung 2=
50GB 960 EVO drives in a mirror. Very low usage, and expected amount of w=
ear:

smartctl snippet:
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    5=C2=A0294=C2=A0592 [2,71 TB]
Data Units Written:                 25=C2=A0471=C2=A0775 [13,0 TB]
Host Read Commands:                 55=C2=A0763=C2=A0074
Host Write Commands:                1=C2=A0245=C2=A0546=C2=A0898
Controller Busy Time:               3=C2=A0290
Power Cycles:                       81
Power On Hours:                     29=C2=A0491
Unsafe Shutdowns:                   46
Media and Data Integrity Errors:    0
Error Information Log Entries:      6
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               55 Celsius

zpool status:
  pool: znvme
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:14 with 0 errors on Fri Jun  4 0=
3:03:46 2021
config:

	NAME        STATE     READ WRITE CKSUM
	znvme       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nvd0    ONLINE       0     0     0
	    nvd1    ONLINE       0     0     0

errors: No known data errors



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ff412bc5b0d1f83284895911456ee97>