From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 16:33:13 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16630106564A for ; Wed, 16 Jun 2010 16:33:13 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-243-Pennsylvania.hfc.comcastbusiness.net [75.149.8.243]) by mx1.freebsd.org (Postfix) with ESMTP id CC4468FC12 for ; Wed, 16 Jun 2010 16:33:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id 9F7F08BC919 for ; Wed, 16 Jun 2010 12:18:11 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x9C9esN4dhxF for ; Wed, 16 Jun 2010 12:18:10 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 606688BC915 for ; Wed, 16 Jun 2010 12:18:10 -0400 (EDT) From: Andrew Boyer Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Wed, 16 Jun 2010 12:17:34 -0400 Message-Id: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> To: freebsd-scsi@freebsd.org Mime-Version: 1.0 (Apple Message framework v1078) X-Mailer: Apple Mail (2.1078) Subject: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 16:33:13 -0000 Hello SCSI experts, We recently saw this SCSI command error: > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 2 = c8 7f a0 0 0 20 0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI = Status Error > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check = Condition > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND = asc:4e,0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands = attempted field replaceable unit: 1 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command (per = Sense Data) > Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 = timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800) > Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req = 0xffffffff815d5c20:40101 function 0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort = timed-out. Resetting controller > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req = 0xffffffff815d5c20:40101 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 No one here has ever seen this before. We're using a CAM and MPT stack = from August 2009 with an LSI1068e HBA connected to Seagate SAS HDDs. This is what the SCSI Architecture Manual (SAM-5 draft) has to say about = overlapped commands: > 5.10 Overlapped commands > An overlapped command occurs when a task manager or a task router = detects the use of a duplicate I_T_L_Q nexus (see 4.6.6) in a command = before that I_T_L_Q nexus completes its command lifetime (see 5.5). Each = SCSI transport protocol standard shall specify whether or not a task = manager or a task router is required to detect overlapped commands. > A task manager or a task router that detects an overlapped command = shall abort all commands received on the I_T nexus on which the = overlapped command was received and the device server shall return a = CHECK CONDITION status for the overlapped command. The sense key shall = be set to ABORTED COMMAND and the additional sense code shall be set to = OVERLAPPED COMMANDS ATTEMPTED. > NOTE 11 - An overlapped command may be indicative of a serious error = and, if not detected, may result in corrupted data. This is considered a = catastrophic failure on the part of the SCSI initiator device. = Therefore, vendor specific error recovery procedures may be required to = guarantee the data integrity on the medium. The SCSI target device = logical unit may return additional sense data to aid in this error = recovery procedure (e.g., sequential-access devices may terminate the = overlapped command with the residue of blocks remaining to be written or = read at the time the second command was received). > 4.8.2 Command identifier > A command identifier (i.e., the Q in an I_T_L_Q nexus) is assigned by = a SCSI initiator device to uniquely identify one command in the context = of a particular I_T_L nexus, allowing more than one command to be = outstanding for that I_T_L nexus at the same time. Each SCSI transport = protocol defines the size of the command identifier, up to a maximum of = 64 bytes, to be used by SCSI ports that support that SCSI transport = protocol. > SCSI transport protocols may define additional restrictions on command = identifier assignments (e.g., requiring command identifiers to be unique = per I_T nexus or per I_T_L nexus, or sharing command identifier values = with other uses such as task management functions). Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? Also, is CAM doing the right thing by retrying? scsi_error_action() in = cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even = though the spec quoted above makes it sound like this should be a fatal = error ("This is considered a catastrophic failure on the part of the = SCSI initiator device"). Should scsi_error_action() be looking at the = Additional Sense Code? Thanks, Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com