Date: Sat, 27 Apr 2024 14:47:08 -0400 From: mike tancsa <mike@sentex.net> To: freebsd-hardware@freebsd.org Subject: Re: WD Blue 510 SSD and strange write performance (update II) Message-ID: <44d0a3fb-199a-48b5-9ab1-78f444e81520@sentex.net> In-Reply-To: <f81c8c83-8c7e-444f-a7f0-7b18cf51ec1d@sentex.net> References: <e5c2a99d-931e-48b4-9445-fc4ad05ccc70@sentex.net> <ZfgCaxEn3w2lyq3m@lorvorc.mips.inka.de> <6199e46b-241d-4052-ad2b-fa0a0b1e7169@sentex.net> <f81c8c83-8c7e-444f-a7f0-7b18cf51ec1d@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3/21/2024 8:46 AM, mike tancsa wrote: > > summary: WD Blue 510 SSDs when attached to the mpr controller seem to > start throwing errors on random disks in the pools (see > https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000100.html > for examples) after copying and destroying a zfs 200G dataset with > many small files 3 or 4 times on a set of 4 disks in raidz1. Doing a > hard trim -f da on the disks and recreating the pool allows me to do > the tests 3 or 4 more times before hitting the errors again. The same > tests with the same disks attached to a sata controller doesnt show > the errors. I also ran into the same problem with a similar LSI > controller but using the mrsas controller/driver (<AVAGO Invader SAS > Controller>). It seems to be trim related? Using samsung SSDs on the > mpr controller does not seem to show the issue. > I decided to try the same tests on the exact same hardware but booting truenas scale (the linux variant) to see if the problem persists. If I do a manual trim between zfs send | zfs recv, zfs destroy, the performance seems fairly consistent and there are no crashes/resets of the drives in the pool on linux (6.6.20-production+truenas). Not a linux person so hard to say if there are some quirks for these disks on linux. root@truenas[/var/log]# hdparm -I /dev/sda | grep -i tri * Data Set Management TRIM supported (limit 8 blocks) * Deterministic read data after TRIM root@truenas[/var/log]# If I dont do the manual TRIM between send|recv (ie zpool trim -w pool), I get the same pattern as when I do a manual trim -f /dev/da[x] on each disk one by one on FreeBSD. I get 3 full speed loops and after that, super slow until a proper trim is done. On FreeBSD I do this to the raidz1 pool by doing a trim -f /dev/da[1-4] one by one and resilver. So it does seem to point to TRIM via zfs (be that manual or autotrim) somehow broken with this drive on FreeBSD via the mpr driver and via the ATA driver. given the output of hdparm on linux and trim being limited to 8 blocks, anyone know if there is a quirk I can try on FreeBSD to maybe get TRIM working for these SSDs ? details captured in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992 the attachment in the PR, https://bugs.freebsd.org/bugzilla/attachment.cgi?id=250268 has a PNG showing the performance when the TRIM is not done. ---Mike > > OK, some updates. I took the same 4 disks off the mpr controller and > put them off the motherboard and the problem seems to disappear. If > it is still related to trim, I notice that on the mpr controller the > trim method is ATA_TRIM and when attached to the motherboard SATA its > DSM_TRIM. Not sure if there is any difference there ? Or its some > other problem. PR time for the mpr driver ? > > kern.cam.ada.1.trim_ticks: 0 > kern.cam.ada.1.trim_goal: 0 > kern.cam.ada.1.flags: > 0x1be3bde<CAN_48BIT,CAN_FLUSHCACHE,CAN_NCQ,CAN_DMA,WAS_OTAG,CAN_TRIM,OPEN,SCTX_INIT,CAN_POWERMGT,CAN_DMA48,CAN_LOG,CAN_WCACHE,CAN_RAHEAD,PROBED,ANNOUNCED,DIRTY,PIM_ATA_EXT,UNMAPPEDIO> > kern.cam.ada.1.trim_lbas: 6356918872 > kern.cam.ada.1.trim_ranges: 171552 > kern.cam.ada.1.trim_count: 84205 > kern.cam.ada.1.delete_method: DSM_TRIM > > kern.cam.da.6.trim_ticks: 0 > kern.cam.da.6.trim_goal: 0 > kern.cam.da.6.sort_io_queue: 0 > kern.cam.da.6.unmapped_io: 1 > kern.cam.da.6.rotating: 0 > kern.cam.da.6.flags: > 0x10ef40<WAS_OTAG,OPEN,SCTX_INIT,CAN_RC16,PROBED,ANNOUCNED,CAN_ATA_DMA,CAN_ATA_LOG,UNMAPPEDIO> > kern.cam.da.6.p_type: 0 > kern.cam.da.6.error_inject: 0 > kern.cam.da.6.max_seq_zones: 0 > kern.cam.da.6.optimal_nonseq_zones: 0 > kern.cam.da.6.optimal_seq_zones: 0 > kern.cam.da.6.zone_support: None > kern.cam.da.6.zone_mode: Not Zoned > kern.cam.da.6.trim_lbas: 0 > kern.cam.da.6.trim_ranges: 0 > kern.cam.da.6.trim_count: 0 > kern.cam.da.6.minimum_cmd_size: 6 > kern.cam.da.6.delete_max: 17179607040 > kern.cam.da.6.delete_method: ATA_TRIM > > camcontrol iden doesnt show much difference really > > diff -bu wd.mpr wd.ata > --- wd.mpr 2024-03-21 08:27:02.995734000 -0400 > +++ wd.ata 2024-03-21 08:21:42.310055000 -0400 > @@ -1,5 +1,6 @@ > +# camcontrol ide ada1 > pass6: <WD Blue SA510 2.5 1000GB 52046100> ACS-4 ATA SATA 3.x device > -pass6: 600.000MB/s transfers, Command Queueing Enabled > +pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) > > protocol ACS-4 ATA SATA 3.x > device model WD Blue SA510 2.5 1000GB > > > Controller is > > mprutil show adapter > mpr0 Adapter: > Board Name: INSPUR 3008IT > Board Assembly: INSPUR > Chip Name: LSISAS3008 > Chip Revision: ALL > BIOS Revision: 18.00.00.00 > Firmware Revision: 16.00.12.00 > Integrated RAID: no > SATA NCQ: ENABLED > PCIe Width/Speed: x8 (8.0 GB/sec) > IOC Speed: Full > Temperature: 51 C > > PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device > 0 0001 0009 N 6.0 3.0 12 SAS > Initiator > 1 0001 0009 N 6.0 3.0 12 SAS > Initiator > 2 0001 0009 N 6.0 3.0 12 SAS > Initiator > 3 0001 0009 N 6.0 3.0 12 SAS > Initiator > 4 N 3.0 12 SAS > Initiator > 5 N 3.0 12 SAS > Initiator > 6 N 3.0 12 SAS > Initiator > 7 N 3.0 12 SAS > Initiator > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44d0a3fb-199a-48b5-9ab1-78f444e81520>