Date: Tue, 13 Aug 2019 19:55:22 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 224496] mpr and mps drivers seems to have issues with large seagate drives Message-ID: <bug-224496-227-fSp8itiFj7@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-224496-227@https.bugs.freebsd.org/bugzilla/> References: <bug-224496-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224496 Paul Thornton <freebsd-bugzilla@prt.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-bugzilla@prt.org --- Comment #15 from Paul Thornton <freebsd-bugzilla@prt.org> --- I too have run into this issue on a nas box, once it started taking on any = kind of load. Running 12.0-RELEASE p3 The server contains 8x Seagate Ironwolf Pro 10Tb SATA drives on an Avago 30= 08 HBA - 8 of these basically: da2 at mpr1 bus 0 scbus13 target 12 lun 0 da2: <ATA ST10000NE0004-1Z EN01> Fixed Direct Access SPC-4 SCSI device da2: Serial Number ZA237AVY da2: 1200.000MB/s transfers da2: Command Queueing enabled da2: 9537536MB (19532873728 512 byte sectors) Driver versions: dev.mpr.1.driver_version: 18.03.00.00-fbsd dev.mpr.1.firmware_version: 15.00.03.00 These drives are configured in a ZFS RAID10 setup (in case that datapoint matters): NAME STATE READ WRITE CKSUM data0 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da2.eli ONLINE 0 0 0 da3.eli ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da4.eli ONLINE 0 0 0 da5.eli ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 da6.eli ONLINE 0 0 0 da7.eli ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 da8.eli ONLINE 0 0 0 da9.eli ONLINE 0 0 0 I currently get about 25 days between reboots. The machine hangs and (I'm guessing here) kernel panics and restarts - I don't have the panic informat= ion, but log messages look very similar to what other people are seeing: Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62= 81 f9 d0 00 00 30 00 length 24576 SMID 1484 Command timeout on target 12(0x000c), 60000 set, 60.703976195 elapsed Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name ( ) Jul 20 11:14:17 nas1a kernel: mpr1: Sending abort to target 12 for SMID 1484 Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62= 81 f9 d0 00 00 30 00 length 24576 SMID 1484 Aborting command 0 xfffffe00bad0b540 Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). C= DB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1792 Command ti meout on target 12(0x000c), 60000 set, 60.707504796 elapsed Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name ( ) Jul 20 11:14:18 nas1a kernel: mpr1: Controller reported scsi ioc terminated= tgt 12 SMID 1792 loginfo 31140000 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 8= 1 f9 d0 00 00 30 00 Jul 20 11:14:18 nas1a kernel: mpr1: Abort failed for target 12, sending log= ical unit reset Jul 20 11:14:18 nas1a kernel: mpr1: (da2:mpr1:0:12:0): CAM status: CCB requ= est aborted by the host Jul 20 11:14:18 nas1a kernel: Sending logical unit reset to target 12 lun 0 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 3 more t= ries remain Jul 20 11:14:18 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name ( ) Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB= : 35 00 00 00 00 00 00 00 00 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CCB request completed with an error Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 0 more t= ries remain Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 12 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB= : 35 00 00 00 00 00 00 00 00 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem = is busy Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Error 5, Retries exhausted Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 12 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 8= 1 f9 d0 00 00 30 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem = is busy Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 2 more t= ries remain [reboot happens here] And the most recent one, today: Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). C= DB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Command tim eout on target 16(0x0010), 60000 set, 60.109683189 elapsed Aug 13 08:58:55 nas1a kernel: mpr1: At enclosure level 0, slot 6, connector name ( ) Aug 13 08:58:55 nas1a kernel: mpr1: Sending abort to target 16 for SMID 998 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). C= DB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Aborting co mmand 0xfffffe00bacdfaa0 Aug 13 08:58:55 nas1a kernel: mpr1: Abort failed for target 16, sending log= ical unit reset Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB= : 35 00 00 00 00 00 00 00 00 00 Aug 13 08:58:55 nas1a kernel: mpr1: Sending logical unit reset to target 16= lun 0 Aug 13 08:58:55 nas1a kernel: mpr1: At enclosure level 0, slot 6, connector name ( ) Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): CAM status: CCB request aborted by the host Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): Retrying command, 0 more t= ries remain Aug 13 08:58:55 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 16 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB= : 35 00 00 00 00 00 00 00 00 00 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): CAM status: CAM subsystem = is busy Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): Error 5, Retries exhausted Aug 13 08:58:55 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 16 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): WRITE(10). CDB: 2a 00 92 b= 0 7d 70 00 00 48 00 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): CAM status: CAM subsystem = is busy Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): Retrying command, 3 more t= ries remain [reboot happens here] After the reboot, there's no problem and everything works fine. ZFS never marks the pool as degraded or unavailable. Looking at the FreeNAS threads this seems to have been going on for ages. = Can anyone confirm that a downgrade to 11.1 does work around this issue as that seems to be the only thing that might help? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224496-227-fSp8itiFj7>