Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 07 Apr 2024 20:42:01 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 277992] mpr and possible trim issues
Message-ID:  <bug-277992-227-QNH5dKnRhz@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-277992-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-277992-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277992

--- Comment #4 from mike@sentex.net ---
With max trim set to 1/4 of the default value, I still get random errors. In
this case, I set all 4 drives to=20
 sysctl -w kern.cam.da.7.delete_max=3D4294901760
kern.cam.da.7.delete_max: 17179607040 -> 4294901760



Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 3f ab 99 08 00 00 40 00 length 32768 SMID 844 Command timeout on target
49(0x000d), 60000 set, 60.788719903 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel: mpr0: Sending abort to target 49 for SM=
ID
844
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 3f ab 99 08 00 00 40 00 length 32768 SMID 844 Aborting command
0xfffffe02068b87a0
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): READ(10). CDB: 2=
8 00
0b 07 eb d0 00 00 c8 00 length 102400 SMID 1453 Command timeout on target
49(0x000d), 60000 set, 60.933970425 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 40 a5 23 c8 00 00 10 00 length 8192 SMID 443 Command timeout on target
49(0x000d), 60000 set, 60.1019647685 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 3f ab 98 e8 00 00 20 00 length 16384 SMID 113 Command timeout on target
49(0x000d), 60000 set, 60.1101031222 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 3f ab 97 e8 00 01 00 00 length 131072 SMID 1395 Command timeout on target
49(0x000d), 60000 set, 60.1184073991 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 3f ab 96 e8 00 01 00 00 length 131072 SMID 1364 Command timeout on target
49(0x000d), 60000 set, 60.1266419429 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 4c 35 79 68 00 00 08 00 length 4096 SMID 1323 Command timeout on target
49(0x000d), 60000 set, 60.1337927805 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 4c 35 79 20 00 00 48 00 length 36864 SMID 1506 Command timeout on target
49(0x000d), 60000 set, 60.1422928220 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 4c 35 79 10 00 00 08 00 length 4096 SMID 343 Command timeout on target
49(0x000d), 60000 set, 60.1504915373 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:51:57 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 40 a5 23 d8 00 00 10 00 length 8192 SMID 830 Command timeout on target
49(0x000d), 60000 set, 60.1586143647 elapsed
Apr  7 13:51:57 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:00 r-14mfitest kernel: mpr0: mprsas_prepare_remove: Sending re=
set
for target ID 49
Apr  7 13:52:00 r-14mfitest kernel:     (da3:mpr0:0:49:0): READ(10). CDB: 2=
8 00
0c 05 97 18 00 00 30 00 length 24576 SMID 1035 Command timeout on target
49(0x000d), 60000 set, 60.52537889 elapsed
Apr  7 13:52:00 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:00 r-14mfitest kernel:     (da3:mpr0:0:49:0): READ(10). CDB: 2=
8 00
0b 07 ee 38 00 00 08 00 length 4096 SMID 615 Command timeout on target
49(0x000d), 60000 set, 60.133221431 elapsed
Apr  7 13:52:00 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:00 r-14mfitest kernel:     (da3:mpr0:0:49:0): WRITE(10). CDB: =
2a
00 4c 35 79 70 00 01 00 00 length 131072 SMID 1049 Command timeout on target
49(0x000d), 60000 set, 60.214511381 elapsed
Apr  7 13:52:00 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:00 r-14mfitest kernel:     (pass3:mpr0:0:49:0): INQUIRY. CDB: =
12
00 00 00 24 00 length 36 SMID 1789 Command timeout on target 49(0x000d), 60=
000
set, 60.158235152 elapsed
Apr  7 13:52:00 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1323 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 830 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: (da3:mpr0:0:49:0): Invalidating pack
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1453 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 443 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1364 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 343 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 113 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1395 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1506 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1789 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1035 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 615 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Controller reported scsi ioc
terminated tgt 49 SMID 1049 loginfo 31140000 departing
Apr  7 13:52:01 r-14mfitest kernel: mpr0: No pending commands: starting
remove_device for target 49 handle 0x000d
Apr  7 13:52:01 r-14mfitest kernel: mpr0: clearing target 49 handle 0x000d
Apr  7 13:52:01 r-14mfitest kernel: mpr0: At enclosure level 0, slot 3,
connector name (    )
Apr  7 13:52:01 r-14mfitest kernel: mpr0: Finished abort recovery for targe=
t 49
Apr  7 13:52:01 r-14mfitest kernel: (da3:mpr0:0:49:0): WRITE(10). CDB: 2a 0=
0 3f
ab 99 08 00 00 40 00=20
Apr  7 13:52:01 r-14mfitest kernel: (da3:mpr0:0:49:0): CAM status: Command
timeout
Apr  7 13:52:01 r-14mfitest kernel: (da3:mpr0:0:49:0): Retrying command, 3 =
more
tries remain
Apr  7 13:52:01 r-14mfitest kernel: da3 at mpr0 bus 0 scbus0 target 49 lun 0
Apr  7 13:52:01 r-14mfitest kernel: da3: <ATA WD Blue SA510 2. 6100>  s/n
240406800922 detached
Apr  7 13:52:03 r-14mfitest kernel: (da3:mpr0:0:49:0): Periph destroyed

If I do a power cycle of the box, the disk comes back and resilvers just fi=
ne.=20
No smart errors.  When I move these disks off the MFI controller and onto t=
he
same server's onboard Sata controllers, I am not able to provoke these erro=
rs
running the test for a good 12-14hrs.  Put them back on the mfi controller,=
 and
if I do a trim -f of the disks and start clean, I seem to be able to get a =
good
4 cycles out before the errors.  Hence I thought it had something to do with
trim.



I cant seem to disable trim via that sysctl=20
sysctl -w kern.cam.da.4.delete_method=3DNONE
kern.cam.da.4.delete_method: ATA_TRIM -> ATA_TRIM
I can however do=20
 sysctl -w kern.cam.da.4.delete_method=3DDISABLE
kern.cam.da.4.delete_method: ATA_TRIM -> DISABLE


it seems to work a little better with it as DISABLED. Post crash of the disk
and reboot, I managed to get 4 iterations error free. However, it really sl=
ows
down on the last loop. It normally takes about 20min to run, but by the 4th
time its an hour.

Apr  7 14:12:45 r-14mfitest LOOP[4704]: starting
Apr  7 14:40:14 r-14mfitest LOOP[16194]: ending
Apr  7 14:41:45 r-14mfitest LOOP[16835]: starting
Apr  7 15:05:24 r-14mfitest LOOP[26742]: ending
Apr  7 15:06:55 r-14mfitest LOOP[27383]: starting
Apr  7 15:31:42 r-14mfitest LOOP[37775]: ending
Apr  7 15:33:12 r-14mfitest LOOP[38413]: starting
Apr  7 16:25:19 r-14mfitest LOOP[60263]: ending

I am going to let it keep going to see if it will error out just at a later
time.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-277992-227-QNH5dKnRhz>