Date: Mon, 05 Jul 2021 15:37:03 +0000 From: Daniel Lysfjord via stable <stable@freebsd.org> To: stable@freebsd.org Subject: Re: ZFS + mysql appears to be killing my SSD's Message-ID: <5ff412bc5b0d1f83284895911456ee97@smokepit.net> In-Reply-To: <f15cfc5a-e3c0-f1a7-c123-d369db9bc199@denninger.net> References: <f15cfc5a-e3c0-f1a7-c123-d369db9bc199@denninger.net> <89c37c3e-22e8-006e-5826-33bd7db7739e@ingresso.co.uk> <2fd9b7e4-dc75-fedc-28d7-b98191167e6b@freebsd.org> <9c71d627-55b8-2464-6cc9-489e4ce98049@ingresso.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
"Karl Denninger" <karl@denninger.net> skrev 5. juli 2021 kl. 17:10: > On 7/5/2021 10:30, Pete French wrote: >=20 >>=20On 05/07/2021 14:37, Stefan Esser wrote: >>> Hi Pete, >>>=20 >>>=20have you checked the drive state and statistics with smartctl? >>=20 >>=20Hi, thanks for the reply - yes, I did check the statistics, and they >> dont make a lot of sense. I was just looking at them again in fact. >>=20 >>=20So, one of the machines that we chnaged a drive on when this first >> started, which was 4 weeks ago. >>=20 >>=20root@telehouse04:/home/webadmin # smartctl -a /dev/ada0 | grep Perc >> 169 Remaining_Lifetime_Perc 0x0000 082 082 000 Old_age >> Offline - 82 >> root@telehouse04:/home/webadmin # smartctl -a /dev/ada1 | grep Perc >> 202 Percent_Lifetime_Remain 0x0030 100 100 001 Old_age >> Offline - 0 >>=20 >>=20Now, from that you might think the 2nd drive was the one changes, bu= t >> no. Its the first one, which is now at 82% lifetime remaining! The >> other druve, still at 100%, has been in there a year. The drives are >> different manufacturers, which makes comparing most of the numbers >> tricky unfortunately. >>=20 >>=20Am now even more worried than when I sent the first email - if that >> 18% is accurate then I am going to be doing this again in another 4 >> months, and thats not sustainable. It also looks as if this problem >> has got a lot worse recently. Though I wasnt looking at the numbers >> before, only noticing tyhe failurses. If I look at 'Percentage Used >> Endurance Indicator' isntead of the 'Percent_Lifetime_Remain' value >> then I see some of those well over 200%. That value is, on the newer >> drives, 100 minus the 'Percent_Lifetime_Remain' value, so I guess they >> ahve the same underlying metric. >>=20 >>=20I didnt mention in my original email, but I am encrypting these with >> geli. Does geli do any write amplification at all ? That might explain >> the high write volumes... >>=20 >>=20-pete. >=20 >=20As noted elsewhere assuming ashift=3D12 the answer on write amplifica= tion > is no. >=20 >=20Geli should be initialized with -s 4096; I'm assuming you did that? >=20 >=20I have a 5-unit geli-encrypted root pool, all Intel 240gb SSDs. They = do > not report remaining lifetime via smart but do report indications of > trouble. Here's one example snippet from one of the drives in that poo= l: >=20 >=20SMART Attributes Data Structure revision number: 1 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > 5 Reallocated_Sector_Ct -O--CK 098 098 000 - 0 > 9 Power_On_Hours -O--CK 100 100 000 - 53264 > 12 Power_Cycle_Count -O--CK 100 100 000 - 100 > 170 Available_Reservd_Space PO--CK 100 100 010 - 0 > 171 Program_Fail_Count -O--CK 100 100 000 - 0 > 172 Erase_Fail_Count -O--CK 100 100 000 - 0 > 174 Unsafe_Shutdown_Count -O--CK 100 100 000 - 41 > 175 Power_Loss_Cap_Test PO--CK 100 100 010 - 631 (295 5= 442) > 183 SATA_Downshift_Count -O--CK 100 100 000 - 0 > 184 End-to-End_Error PO--CK 100 100 090 - 0 > 187 Reported_Uncorrect -O--CK 100 100 000 - 0 > 190 Temperature_Case -O---K 068 063 000 - 32 (Min/Ma= x > 29/37) > 192 Unsafe_Shutdown_Count -O--CK 100 100 000 - 41 > 194 Temperature_Internal -O---K 100 100 000 - 32 > 197 Current_Pending_Sector -O--CK 100 100 000 - 0 > 199 CRC_Error_Count -OSRCK 100 100 000 - 0 > 225 Host_Writes_32MiB -O--CK 100 100 000 - 1811548 > 226 Workld_Media_Wear_Indic -O--CK 100 100 000 - 205 > 227 Workld_Host_Reads_Perc -O--CK 100 100 000 - 49 > 228 Workload_Minutes -O--CK 100 100 000 - 55841 > 232 Available_Reservd_Space PO--CK 100 100 010 - 0 > 233 Media_Wearout_Indicator -O--CK 089 089 000 - 0 > 234 Thermal_Throttle -O--CK 100 100 000 - 0/0 > 241 Host_Writes_32MiB -O--CK 100 100 000 - 1811548 > 242 Host_Reads_32MiB -O--CK 100 100 000 - 1423217 > ||||||_ K auto-keep > |||||__ C event count > ||||___ R error rate > |||____ S speed/performance > ||_____ O updated online > |______ P prefailure warning >=20 >=20Device Statistics (GP Log 0x04) > Page Offset Size Value Flags Description > 0x01 =3D=3D=3D=3D=3D =3D =3D =3D=3D=3D =3D=3D General= Statistics (rev 2) =3D=3D > 0x01 0x008 4 100 --- Lifetime Power-On Resets > 0x01 0x018 6 118722148102 --- Logical Sectors Written > 0x01 0x020 6 89033895 --- Number of Write Commands > 0x01 0x028 6 93271951909 --- Logical Sectors Read > 0x01 0x030 6 6797990 --- Number of Read Commands >=20 >=206 years in-use, roughly, and no indication of anything going on in te= rms > of warnings about utilization or wear-out. There is a MYSQL database o= n > this box used by Cacti that is running all the time and while the > traffic is real high, it's there (there is also a Postgres server > running on there that sees some traffic too.) These specific drives > were selected due to having power-fail protection for data in-flight -- > they were one of only a few that I've tested which passed a "pull the > cord" test even though they're actually the 730s, NOT the "DC" series. >=20 >=20Raidz2 configuration: >=20 >=20root@NewFS:/home/karl # zpool status zsr > pool: zsr > state: ONLINE > scan: scrub repaired 0 in 0 days 00:07:05 with 0 errors on Mon Jun 28 > 03:43:58 2021 > config: >=20 >=20NAME STATE READ WRITE CKSUM > zsr ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > ada0p4.eli ONLINE 0 0 0 > ada1p4.eli ONLINE 0 0 0 > ada2p4.eli ONLINE 0 0 0 > ada3p4.eli ONLINE 0 0 0 > ada4p4.eli ONLINE 0 0 0 >=20 >=20errors: No known data errors >=20 >=20Micron appears to be the only people making suitable replacements if = and > when these do start to fail on me, but from what I see here it will be = a > good while yet. >=20 >=20-- > -- > Karl Denninger > karl@denninger.net <karl@denninger.net> > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ Running MariaDB and PostgreSQL with FreeBSD 12.2 on a couple of Samsung 2= 50GB 960 EVO drives in a mirror. Very low usage, and expected amount of w= ear: smartctl snippet: SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 42 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 1% Data Units Read: 5=C2=A0294=C2=A0592 [2,71 TB] Data Units Written: 25=C2=A0471=C2=A0775 [13,0 TB] Host Read Commands: 55=C2=A0763=C2=A0074 Host Write Commands: 1=C2=A0245=C2=A0546=C2=A0898 Controller Busy Time: 3=C2=A0290 Power Cycles: 81 Power On Hours: 29=C2=A0491 Unsafe Shutdowns: 46 Media and Data Integrity Errors: 0 Error Information Log Entries: 6 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 42 Celsius Temperature Sensor 2: 55 Celsius zpool status: pool: znvme state: ONLINE scan: scrub repaired 0 in 0 days 00:00:14 with 0 errors on Fri Jun 4 0= 3:03:46 2021 config: NAME STATE READ WRITE CKSUM znvme ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvd0 ONLINE 0 0 0 nvd1 ONLINE 0 0 0 errors: No known data errors
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5ff412bc5b0d1f83284895911456ee97>