From nobody Sat Apr 27 18:47:08 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VRdqp2Vr2z5JfHQ for ; Sat, 27 Apr 2024 18:47:18 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VRdqn3dLHz4kwN for ; Sat, 27 Apr 2024 18:47:17 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.17.1/8.16.1) with ESMTPS id 43RIlAYZ013711 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL) for ; Sat, 27 Apr 2024 14:47:10 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:489c:cbb2:946d:c5e6] ([IPv6:2607:f3e0:0:4:489c:cbb2:946d:c5e6]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 43RIl7UX098729 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO) for ; Sat, 27 Apr 2024 14:47:09 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <44d0a3fb-199a-48b5-9ab1-78f444e81520@sentex.net> Date: Sat, 27 Apr 2024 14:47:08 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: WD Blue 510 SSD and strange write performance (update II) From: mike tancsa To: freebsd-hardware@freebsd.org References: <6199e46b-241d-4052-ad2b-fa0a0b1e7169@sentex.net> Content-Language: en-US Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.39 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; MID_RHS_MATCH_FROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEFALL_USER(0.00)[mike]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; DMARC_NA(0.00)[sentex.net]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hardware@freebsd.org]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4VRdqn3dLHz4kwN On 3/21/2024 8:46 AM, mike tancsa wrote: > > summary: WD Blue 510 SSDs when attached to the mpr controller seem to > start throwing errors on random disks in the pools (see > https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000100.html > for examples) after copying and destroying a zfs 200G dataset with > many small files 3 or 4 times on a set of 4 disks in raidz1. Doing a > hard trim -f da on the disks and recreating the pool allows me to do > the tests 3 or 4 more times before hitting the errors again.  The same > tests with the same disks attached to a sata controller doesnt show > the errors. I also ran into the same problem with a similar LSI > controller but using the mrsas controller/driver ( Controller>).  It seems to be trim related?  Using samsung SSDs on the > mpr controller does not seem to show the issue. > I decided to try the same tests on the exact same hardware but booting truenas scale (the linux variant) to see if the problem persists.  If I do a manual trim between zfs send | zfs recv, zfs destroy, the performance seems fairly consistent and there are no crashes/resets of the drives in the pool on linux (6.6.20-production+truenas). Not a linux person so hard to say if there are some quirks for these disks on linux. root@truenas[/var/log]# hdparm -I /dev/sda | grep -i tri            *    Data Set Management TRIM supported (limit 8 blocks)            *    Deterministic read data after TRIM root@truenas[/var/log]# If I dont do the manual TRIM between send|recv (ie zpool trim -w pool), I get the same pattern as when I do a manual trim -f /dev/da[x] on each disk one by one on FreeBSD.  I get 3 full speed loops and after that, super slow until a proper trim is done. On FreeBSD I do this to the raidz1 pool by doing a trim -f /dev/da[1-4] one by one and resilver. So it does seem to point to TRIM via zfs (be that manual or autotrim) somehow broken with this drive on FreeBSD via the mpr driver and via the ATA driver. given the output of hdparm on linux and trim being limited to 8 blocks, anyone know if there is a quirk I can try on FreeBSD to maybe get TRIM working for these SSDs ? details captured in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277992 the attachment in the PR, https://bugs.freebsd.org/bugzilla/attachment.cgi?id=250268 has a PNG showing the performance when the TRIM is not done.     ---Mike > > OK, some updates.  I took the same 4 disks off the mpr controller and > put them off the motherboard and the problem seems to disappear.  If > it is still related to trim, I notice that on the mpr controller the > trim method is ATA_TRIM and when attached to the motherboard SATA its > DSM_TRIM.  Not sure if there is any difference there ? Or its some > other problem.  PR time for the mpr driver ? > > kern.cam.ada.1.trim_ticks: 0 > kern.cam.ada.1.trim_goal: 0 > kern.cam.ada.1.flags: > 0x1be3bde > kern.cam.ada.1.trim_lbas: 6356918872 > kern.cam.ada.1.trim_ranges: 171552 > kern.cam.ada.1.trim_count: 84205 > kern.cam.ada.1.delete_method: DSM_TRIM > > kern.cam.da.6.trim_ticks: 0 > kern.cam.da.6.trim_goal: 0 > kern.cam.da.6.sort_io_queue: 0 > kern.cam.da.6.unmapped_io: 1 > kern.cam.da.6.rotating: 0 > kern.cam.da.6.flags: > 0x10ef40 > kern.cam.da.6.p_type: 0 > kern.cam.da.6.error_inject: 0 > kern.cam.da.6.max_seq_zones: 0 > kern.cam.da.6.optimal_nonseq_zones: 0 > kern.cam.da.6.optimal_seq_zones: 0 > kern.cam.da.6.zone_support: None > kern.cam.da.6.zone_mode: Not Zoned > kern.cam.da.6.trim_lbas: 0 > kern.cam.da.6.trim_ranges: 0 > kern.cam.da.6.trim_count: 0 > kern.cam.da.6.minimum_cmd_size: 6 > kern.cam.da.6.delete_max: 17179607040 > kern.cam.da.6.delete_method: ATA_TRIM > > camcontrol iden doesnt show much difference really > >  diff -bu wd.mpr wd.ata > --- wd.mpr      2024-03-21 08:27:02.995734000 -0400 > +++ wd.ata      2024-03-21 08:21:42.310055000 -0400 > @@ -1,5 +1,6 @@ > +# camcontrol ide ada1 >  pass6: ACS-4 ATA SATA 3.x device > -pass6: 600.000MB/s transfers, Command Queueing Enabled > +pass6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) > >  protocol              ACS-4 ATA SATA 3.x >  device model          WD Blue SA510 2.5 1000GB > > > Controller is > >  mprutil show adapter > mpr0 Adapter: >        Board Name: INSPUR 3008IT >    Board Assembly: INSPUR >         Chip Name: LSISAS3008 >     Chip Revision: ALL >     BIOS Revision: 18.00.00.00 > Firmware Revision: 16.00.12.00 >   Integrated RAID: no >          SATA NCQ: ENABLED >  PCIe Width/Speed: x8 (8.0 GB/sec) >         IOC Speed: Full >       Temperature: 51 C > > PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max Device > 0       0001        0009       N         6.0     3.0    12     SAS > Initiator > 1       0001        0009       N         6.0     3.0    12     SAS > Initiator > 2       0001        0009       N         6.0     3.0    12     SAS > Initiator > 3       0001        0009       N         6.0     3.0    12     SAS > Initiator > 4                              N                 3.0    12     SAS > Initiator > 5                              N                 3.0    12     SAS > Initiator > 6                              N                 3.0    12     SAS > Initiator > 7                              N                 3.0    12     SAS > Initiator > >