Date: Wed, 18 Dec 2024 15:26:38 -0500 From: mike tancsa <mike@sentex.net> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: TRIM question and zfs Message-ID: <25c5d434-2ea1-4315-8722-342a469abb83@sentex.net>
next in thread | raw e-mail | index | archive | help
TL;DR does zpool trim <poolname> actually work as well as one expects / needs ? I had a very old server that was running as RELENG_12 for many years on some SSDs which were now getting to EOL with 6yrs of work on them-- Wear level showed it getting low for sure. I had migrated everything live off the box, but for some reason, trying to do a zfs send on a volume was REALLY slow. I am talking KB/s slow. It took a long time, but it eventually got done. As there was nothing on this server in production, I thought it a good exercise to try and upgrade it in the field. So buildworld to 13 and then 14. I deleted some of the old unneeded files and got down to just the zfs volume that was left on the pool so just under 200G. I then did a zpool trim tank1, but didnt see any improved performance at all. Still crazy slow. So I then did gpart backup <disk> > /tmp/disk-part.txt zpool offline tank1 <disk>p1 trim -f /dev/<disk> cat /tmp/disk-part.txt | gpart restore <disk> zpool online tank1 <disk>p1 zpool replace tank1 <disk>p1 <disk>p1 for all 3 <disk>s in the pool one by one. The first resilver took 13hrs, the second 8 or so and the last 13min. After the final resilver was done, I could do a zfs send of the volume pretty well at full speed with zpool iostat 1 showing close to a GB/s reads. I know that zfs autotrim and trim just kinda keeps track of what can and cant be deleted. But I would have thought the zpool trim would have had some impact ? Questions: Does this mean that prior to deploying SSDs for use in a zfs pool, you should do a full trim -f of the disk ? Apart from offlining and doing a trim, resilver etc, is there a better way to get back performance ? Or with a once a week trim prior to scrub, will it be "good enough" ? Is there a way to tell if a disk REALLY needs to be fully trimmed other than approximating for slowing performance ? I know these disks were super old, so maybe current SSDs dont have this issue ? Last few years I have switched to Samsung EVOs and they dont seem to have these problems, at least not yet in any obvious way. Not sure why this particularly showed up in the zfs volume set, and other normal datasets performed ok. ---Mike disk smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.2-STABLE amd64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: WD Blue / Red / Green SSDs Device Model: WDC WDS100T2B0A-00SM50 Serial Number: 191011A00A72 LU WWN Device Id: 5 001b44 8b89825ed Firmware Version: 401000WD User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: In smartctl database 7.3/5528 ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5 SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Dec 18 15:23:10 2024 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 128 (minimum power consumption without standby) Rd look-ahead is: Enabled Write cache is: Enabled DSN feature is: Unavailable ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Unavailable === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 4 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct -O--CK 100 100 --- - 0 9 Power_On_Hours -O--CK 100 100 --- - 47271 12 Power_Cycle_Count -O--CK 100 100 --- - 33 165 Block_Erase_Count -O--CK 100 100 --- - 906509291245 166 Minimum_PE_Cycles_TLC -O--CK 100 100 --- - 1 167 Max_Bad_Blocks_per_Die -O--CK 100 100 --- - 34 168 Maximum_PE_Cycles_TLC -O--CK 100 100 --- - 33 169 Total_Bad_Blocks -O--CK 100 100 --- - 534 170 Grown_Bad_Blocks -O--CK 100 100 --- - 0 171 Program_Fail_Count -O--CK 100 100 --- - 0 172 Erase_Fail_Count -O--CK 100 100 --- - 0 173 Average_PE_Cycles_TLC -O--CK 100 100 --- - 12 174 Unexpected_Power_Loss -O--CK 100 100 --- - 19 184 End-to-End_Error -O--CK 100 100 --- - 0 187 Reported_Uncorrect -O--CK 100 100 --- - 0 188 Command_Timeout -O--CK 100 100 --- - 0 194 Temperature_Celsius -O---K 075 044 --- - 25 (Min/Max 22/44) 199 UDMA_CRC_Error_Count -O--CK 100 100 --- - 0 230 Media_Wearout_Indicator -O--CK 007 007 --- - 0x074001140740 232 Available_Reservd_Space PO--CK 100 100 004 - 100 233 NAND_GB_Written_TLC -O--CK 100 100 --- - 12346 234 NAND_GB_Written_SLC -O--CK 100 100 --- - 90919 241 Host_Writes_GiB ----CK 253 253 --- - 80762 242 Host_Reads_GiB ----CK 253 253 --- - 19908 244 Temp_Throttle_Status -O--CK 000 100 --- - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 2 Comprehensive SMART error log 0x03 GPL R/O 1 Ext. Comprehensive SMART error log 0x04 GPL,SL R/O 8 Device Statistics log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x10 GPL R/O 1 NCQ Command Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x24 GPL R/O 2261 Current Device Internal Status Data log 0x25 GPL R/O 2261 Saved Device Internal Status Data log 0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xde GPL VS 8 Device vendor specific log SMART Extended Comprehensive Error Log Version: 1 (1 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] Selective Self-tests/Logging not supported SCT Commands not supported Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 1) == 0x01 0x008 4 33 --- Lifetime Power-On Resets 0x01 0x010 4 47271 --- Power-on Hours 0x01 0x018 6 169371253578 --- Logical Sectors Written 0x01 0x020 6 2639812949 --- Number of Write Commands 0x01 0x028 6 41752136282 --- Logical Sectors Read 0x01 0x030 6 89429189 --- Number of Read Commands 0x07 ===== = = === == Solid State Device Statistics (rev 1) == 0x07 0x008 1 1 N-- Percentage Used Endurance Indicator |||_ C monitored condition met ||__ D supports DSN |___ N normalized value Pending Defects log (GP Log 0x0c) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 0 Command failed due to ICRC error 0x0002 4 0 R_ERR response for data FIS 0x0005 4 0 R_ERR response for non-data FIS 0x000a 4 7 Device-to-host register FISes sent due to a COMRESET
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?25c5d434-2ea1-4315-8722-342a469abb83>