Date: Wed, 16 Jun 2010 12:17:34 -0400 From: Andrew Boyer <aboyer@averesystems.com> To: freebsd-scsi@freebsd.org Subject: Overlapped Commands error Message-ID: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com>
next in thread | raw e-mail | index | archive | help
Hello SCSI experts, We recently saw this SCSI command error: > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 2 = c8 7f a0 0 0 20 0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI = Status Error > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check = Condition > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND = asc:4e,0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands = attempted field replaceable unit: 1 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command (per = Sense Data) > Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 = timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800) > Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req = 0xffffffff815d5c20:40101 function 0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort = timed-out. Resetting controller > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req = 0xffffffff815d5c20:40101 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 No one here has ever seen this before. We're using a CAM and MPT stack = from August 2009 with an LSI1068e HBA connected to Seagate SAS HDDs. This is what the SCSI Architecture Manual (SAM-5 draft) has to say about = overlapped commands: > 5.10 Overlapped commands > An overlapped command occurs when a task manager or a task router = detects the use of a duplicate I_T_L_Q nexus (see 4.6.6) in a command = before that I_T_L_Q nexus completes its command lifetime (see 5.5). Each = SCSI transport protocol standard shall specify whether or not a task = manager or a task router is required to detect overlapped commands. > A task manager or a task router that detects an overlapped command = shall abort all commands received on the I_T nexus on which the = overlapped command was received and the device server shall return a = CHECK CONDITION status for the overlapped command. The sense key shall = be set to ABORTED COMMAND and the additional sense code shall be set to = OVERLAPPED COMMANDS ATTEMPTED. > NOTE 11 - An overlapped command may be indicative of a serious error = and, if not detected, may result in corrupted data. This is considered a = catastrophic failure on the part of the SCSI initiator device. = Therefore, vendor specific error recovery procedures may be required to = guarantee the data integrity on the medium. The SCSI target device = logical unit may return additional sense data to aid in this error = recovery procedure (e.g., sequential-access devices may terminate the = overlapped command with the residue of blocks remaining to be written or = read at the time the second command was received). > 4.8.2 Command identifier > A command identifier (i.e., the Q in an I_T_L_Q nexus) is assigned by = a SCSI initiator device to uniquely identify one command in the context = of a particular I_T_L nexus, allowing more than one command to be = outstanding for that I_T_L nexus at the same time. Each SCSI transport = protocol defines the size of the command identifier, up to a maximum of = 64 bytes, to be used by SCSI ports that support that SCSI transport = protocol. > SCSI transport protocols may define additional restrictions on command = identifier assignments (e.g., requiring command identifiers to be unique = per I_T nexus or per I_T_L nexus, or sharing command identifier values = with other uses such as task management functions). Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? Also, is CAM doing the right thing by retrying? scsi_error_action() in = cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even = though the spec quoted above makes it sound like this should be a fatal = error ("This is considered a catastrophic failure on the part of the = SCSI initiator device"). Should scsi_error_action() be looking at the = Additional Sense Code? Thanks, Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DD9715-89B2-4058-A4FE-7097603013CC>