From owner-freebsd-scsi@FreeBSD.ORG Mon Jun 14 11:06:59 2010 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 879A41065703 for ; Mon, 14 Jun 2010 11:06:59 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7589E8FC2B for ; Mon, 14 Jun 2010 11:06:59 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o5EB6xcq078635 for ; Mon, 14 Jun 2010 11:06:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o5EB6wPm078633 for freebsd-scsi@FreeBSD.org; Mon, 14 Jun 2010 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 14 Jun 2010 11:06:58 GMT Message-Id: <201006141106.o5EB6wPm078633@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jun 2010 11:06:59 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/141934 scsi [cam] [patch] add support for SEAGATE DAT Scopion 130 o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri p kern/130735 scsi [cam] [patch] pass M_NOWAIT to the malloc() call insid o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/124667 scsi [amd] [panic] FreeBSD-7 kernel page faults at amd-scsi o kern/123674 scsi [ahc] ahc driver dumping f kern/123666 scsi [aac] attach fails with Adaptec SAS RAID 3805 controll o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/119668 scsi [cam] [patch] certain errors are too verbose comparing o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/94838 scsi Kernel panic while mounting SD card with lock switch o o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/40895 scsi wierd kernel / device driver bug o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 40 problems total. From owner-freebsd-scsi@FreeBSD.ORG Tue Jun 15 01:34:05 2010 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 53376106566B; Tue, 15 Jun 2010 01:34:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2AF9A8FC13; Tue, 15 Jun 2010 01:34:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o5F1Y5cx031897; Tue, 15 Jun 2010 01:34:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o5F1Y5QB031893; Tue, 15 Jun 2010 01:34:05 GMT (envelope-from linimon) Date: Tue, 15 Jun 2010 01:34:05 GMT Message-Id: <201006150134.o5F1Y5QB031893@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-scsi@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: bin/147572: [2tb] mptutil(8) doesn't support configs over 2TB X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jun 2010 01:34:05 -0000 Old Synopsis: mptutil doesn't support configs over 2TB New Synopsis: [2tb] mptutil(8) doesn't support configs over 2TB Responsible-Changed-From-To: freebsd-bugs->freebsd-scsi Responsible-Changed-By: linimon Responsible-Changed-When: Tue Jun 15 01:32:51 UTC 2010 Responsible-Changed-Why: Over to maintainer(s). Note: there is a patch for the documentation. http://www.freebsd.org/cgi/query-pr.cgi?pr=147572 From owner-freebsd-scsi@FreeBSD.ORG Tue Jun 15 19:06:01 2010 Return-Path: Delivered-To: freebsd-scsi@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 407D41065672; Tue, 15 Jun 2010 19:06:01 +0000 (UTC) (envelope-from sbruno@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1784F8FC0C; Tue, 15 Jun 2010 19:06:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o5FJ6093067826; Tue, 15 Jun 2010 19:06:00 GMT (envelope-from sbruno@freefall.freebsd.org) Received: (from sbruno@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o5FJ606e067822; Tue, 15 Jun 2010 19:06:00 GMT (envelope-from sbruno) Date: Tue, 15 Jun 2010 19:06:00 GMT Message-Id: <201006151906.o5FJ606e067822@freefall.freebsd.org> To: sbruno@FreeBSD.org, freebsd-scsi@FreeBSD.org, sbruno@FreeBSD.org From: sbruno@FreeBSD.org Cc: Subject: Re: bin/147572: [2tb] mptutil(8) doesn't support configs over 2TB X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jun 2010 19:06:01 -0000 Synopsis: [2tb] mptutil(8) doesn't support configs over 2TB Responsible-Changed-From-To: freebsd-scsi->sbruno Responsible-Changed-By: sbruno Responsible-Changed-When: Tue Jun 15 19:02:25 UTC 2010 Responsible-Changed-Why: Reassignment. http://www.freebsd.org/cgi/query-pr.cgi?pr=147572 From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 16:33:13 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16630106564A for ; Wed, 16 Jun 2010 16:33:13 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-243-Pennsylvania.hfc.comcastbusiness.net [75.149.8.243]) by mx1.freebsd.org (Postfix) with ESMTP id CC4468FC12 for ; Wed, 16 Jun 2010 16:33:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id 9F7F08BC919 for ; Wed, 16 Jun 2010 12:18:11 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x9C9esN4dhxF for ; Wed, 16 Jun 2010 12:18:10 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 606688BC915 for ; Wed, 16 Jun 2010 12:18:10 -0400 (EDT) From: Andrew Boyer Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Wed, 16 Jun 2010 12:17:34 -0400 Message-Id: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> To: freebsd-scsi@freebsd.org Mime-Version: 1.0 (Apple Message framework v1078) X-Mailer: Apple Mail (2.1078) Subject: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 16:33:13 -0000 Hello SCSI experts, We recently saw this SCSI command error: > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 2 = c8 7f a0 0 0 20 0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI = Status Error > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check = Condition > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND = asc:4e,0 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands = attempted field replaceable unit: 1 > Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command (per = Sense Data) > Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 = timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800) > Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req = 0xffffffff815d5c20:40101 function 0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort = timed-out. Resetting controller > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 > Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req = 0xffffffff815d5c20:40101 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12 > Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 No one here has ever seen this before. We're using a CAM and MPT stack = from August 2009 with an LSI1068e HBA connected to Seagate SAS HDDs. This is what the SCSI Architecture Manual (SAM-5 draft) has to say about = overlapped commands: > 5.10 Overlapped commands > An overlapped command occurs when a task manager or a task router = detects the use of a duplicate I_T_L_Q nexus (see 4.6.6) in a command = before that I_T_L_Q nexus completes its command lifetime (see 5.5). Each = SCSI transport protocol standard shall specify whether or not a task = manager or a task router is required to detect overlapped commands. > A task manager or a task router that detects an overlapped command = shall abort all commands received on the I_T nexus on which the = overlapped command was received and the device server shall return a = CHECK CONDITION status for the overlapped command. The sense key shall = be set to ABORTED COMMAND and the additional sense code shall be set to = OVERLAPPED COMMANDS ATTEMPTED. > NOTE 11 - An overlapped command may be indicative of a serious error = and, if not detected, may result in corrupted data. This is considered a = catastrophic failure on the part of the SCSI initiator device. = Therefore, vendor specific error recovery procedures may be required to = guarantee the data integrity on the medium. The SCSI target device = logical unit may return additional sense data to aid in this error = recovery procedure (e.g., sequential-access devices may terminate the = overlapped command with the residue of blocks remaining to be written or = read at the time the second command was received). > 4.8.2 Command identifier > A command identifier (i.e., the Q in an I_T_L_Q nexus) is assigned by = a SCSI initiator device to uniquely identify one command in the context = of a particular I_T_L nexus, allowing more than one command to be = outstanding for that I_T_L nexus at the same time. Each SCSI transport = protocol defines the size of the command identifier, up to a maximum of = 64 bytes, to be used by SCSI ports that support that SCSI transport = protocol. > SCSI transport protocols may define additional restrictions on command = identifier assignments (e.g., requiring command identifiers to be unique = per I_T nexus or per I_T_L nexus, or sharing command identifier values = with other uses such as task management functions). Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? Also, is CAM doing the right thing by retrying? scsi_error_action() in = cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even = though the spec quoted above makes it sound like this should be a fatal = error ("This is considered a catastrophic failure on the part of the = SCSI initiator device"). Should scsi_error_action() be looking at the = Additional Sense Code? Thanks, Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 17:56:41 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4A811065670 for ; Wed, 16 Jun 2010 17:56:41 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id A7E328FC0A for ; Wed, 16 Jun 2010 17:56:41 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GHueFC069194 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 16 Jun 2010 10:56:41 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C191052.8050407@feral.com> Date: Wed, 16 Jun 2010 10:56:34 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> In-Reply-To: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.168.221.1]); Wed, 16 Jun 2010 10:56:41 -0700 (PDT) Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 17:56:42 -0000 It would be helpful to know what the target was. From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 19:06:35 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A48A1065674 for ; Wed, 16 Jun 2010 19:06:35 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-243-Pennsylvania.hfc.comcastbusiness.net [75.149.8.243]) by mx1.freebsd.org (Postfix) with ESMTP id 2D6FC8FC19 for ; Wed, 16 Jun 2010 19:06:34 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id 85FBA8BC916; Wed, 16 Jun 2010 15:07:08 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f0J6f4YLda6N; Wed, 16 Jun 2010 15:07:07 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 47AF08BC915; Wed, 16 Jun 2010 15:07:07 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: <4C191052.8050407@feral.com> Date: Wed, 16 Jun 2010 15:06:31 -0400 Content-Transfer-Encoding: 7bit Message-Id: <9F7EACFE-6316-427A-A286-A3CEFD1C387D@averesystems.com> References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> <4C191052.8050407@feral.com> To: Matthew Jacob X-Mailer: Apple Mail (2.1078) Cc: freebsd-scsi@freebsd.org Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 19:06:35 -0000 On Jun 16, 2010, at 1:56 PM, Matthew Jacob wrote: > It would be helpful to know what the target was. > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" LSI SAS3801E-R connected to a 147GB Seagate Cheetah 15K.6 ST3146855SS 0002. Or: da1 at mpt0 bus 0 target 1 lun 0 If that's not what you were asking, please clarify. Thanks, Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 20:02:55 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FB911065672 for ; Wed, 16 Jun 2010 20:02:55 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id DE1448FC15 for ; Wed, 16 Jun 2010 20:02:54 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GK2rIA030550 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 16 Jun 2010 13:02:54 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C192DE7.7080807@feral.com> Date: Wed, 16 Jun 2010 13:02:47 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> <4C191052.8050407@feral.com> <9F7EACFE-6316-427A-A286-A3CEFD1C387D@averesystems.com> In-Reply-To: <9F7EACFE-6316-427A-A286-A3CEFD1C387D@averesystems.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.168.221.1]); Wed, 16 Jun 2010 13:02:54 -0700 (PDT) Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 20:02:55 -0000 Just checking to see if it was a JBOD or RAID box. Thsx. > LSI SAS3801E-R connected to a 147GB Seagate Cheetah 15K.6 ST3146855SS 0002. > > Or: > da1 at mpt0 bus 0 target 1 lun 0 > > If that's not what you were asking, please clarify. > > Thanks, > Andrew > > -------------------------------------------------- > Andrew Boyer aboyer@averesystems.com > > > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 20:15:22 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C941106566C for ; Wed, 16 Jun 2010 20:15:22 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 074F38FC08 for ; Wed, 16 Jun 2010 20:15:21 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GKF1Se055639 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 16 Jun 2010 13:15:21 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C1930BF.3090408@feral.com> Date: Wed, 16 Jun 2010 13:14:55 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> In-Reply-To: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.168.221.1]); Wed, 16 Jun 2010 13:15:21 -0700 (PDT) Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 20:15:22 -0000 > Can anyone point me to where in the stack the command identifier is assigned? I see where MPT assigns tags in target mode, but it's the initiator in this case. Any advice? > > The mpt f/w assigns tags. Don't really know what happened here. > Also, is CAM doing the right thing by retrying? scsi_error_action() in cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even though the spec quoted above makes it sound like this should be a fatal error ("This is considered a catastrophic failure on the part of the SCSI initiator device"). Should scsi_error_action() be looking at the Additional Sense Code? > Not really, IMO. It's up to each periph driver to decide whether commands are statefull or can be retried with impunity. From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 20:15:22 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 97F111065670 for ; Wed, 16 Jun 2010 20:15:22 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-243-Pennsylvania.hfc.comcastbusiness.net [75.149.8.243]) by mx1.freebsd.org (Postfix) with ESMTP id 69BC18FC0A for ; Wed, 16 Jun 2010 20:15:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id F3C108BC916; Wed, 16 Jun 2010 16:15:55 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qxu9ZqFmruD4; Wed, 16 Jun 2010 16:15:53 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 191E58BC915; Wed, 16 Jun 2010 16:15:53 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: <4C192DE7.7080807@feral.com> Date: Wed, 16 Jun 2010 16:15:16 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> <4C191052.8050407@feral.com> <9F7EACFE-6316-427A-A286-A3CEFD1C387D@averesystems.com> <4C192DE7.7080807@feral.com> To: Matthew Jacob X-Mailer: Apple Mail (2.1078) Cc: freebsd-scsi@freebsd.org Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 20:15:22 -0000 Ah. It's direct-attach storage. There are 8 drives connected to the = LSI, but no RAID configured. Thanks, Andrew On Jun 16, 2010, at 4:02 PM, Matthew Jacob wrote: > Just checking to see if it was a JBOD or RAID box. > Thsx. >=20 >> LSI SAS3801E-R connected to a 147GB Seagate Cheetah 15K.6 ST3146855SS = 0002. >>=20 >> Or: >> da1 at mpt0 bus 0 target 1 lun 0 >>=20 >> If that's not what you were asking, please clarify. >>=20 >> Thanks, >> Andrew >>=20 >> -------------------------------------------------- >> Andrew Boyer aboyer@averesystems.com >>=20 >>=20 >>=20 >>=20 >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >> =20 >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" -------------------------------------------------- Andrew Boyer aboyer@averesystems.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 20:31:18 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7EB131065676 for ; Wed, 16 Jun 2010 20:31:18 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-243-Pennsylvania.hfc.comcastbusiness.net [75.149.8.243]) by mx1.freebsd.org (Postfix) with ESMTP id 521C98FC18 for ; Wed, 16 Jun 2010 20:31:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id D09018BC916; Wed, 16 Jun 2010 16:31:51 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkmy+WNuWp6e; Wed, 16 Jun 2010 16:31:50 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 997318BC915; Wed, 16 Jun 2010 16:31:50 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: <4C1930BF.3090408@feral.com> Date: Wed, 16 Jun 2010 16:31:14 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> <4C1930BF.3090408@feral.com> To: Matthew Jacob X-Mailer: Apple Mail (2.1078) Cc: freebsd-scsi@freebsd.org Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 20:31:18 -0000 On Jun 16, 2010, at 4:14 PM, Matthew Jacob wrote: >> Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? >=20 > The mpt f/w assigns tags. Don't really know what happened here. >=20 >> Also, is CAM doing the right thing by retrying? scsi_error_action() = in cam/scsi/scsi_all.c always sets the retry bit on aborted commands, = even though the spec quoted above makes it sound like this should be a = fatal error ("This is considered a catastrophic failure on the part of = the SCSI initiator device"). Should scsi_error_action() be looking at = the Additional Sense Code? >=20 > Not really, IMO. It's up to each periph driver to decide whether = commands are statefull or can be retried with impunity. >=20 OK. Thank you for looking into it. -Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 23:20:14 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 65DCF106564A for ; Wed, 16 Jun 2010 23:20:14 +0000 (UTC) (envelope-from djmitche@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 17B298FC14 for ; Wed, 16 Jun 2010 23:20:13 +0000 (UTC) Received: by gwj20 with SMTP id 20so5777321gwj.13 for ; Wed, 16 Jun 2010 16:20:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=9G4ogzdzoFdExIWKYs7y2jQWL25X8CFf2SsA/Z4LwNI=; b=mtoTD34474wrYGGhX8OF5uvMYnHzoHBW85lb+fte7e9q9mdp8u8kOdawicKqbNGFPc bSIuLWB7YYX+kKrWDj1MK68tMhdJy51hC74JDtA9/ol8FJI0NkYR+HrtlyhnTjD5Wse1 IqIKA3xfKE27Lk0lGWECgn4vBxMlT9oSxh8Q8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; b=nX74XU9Q8rGFqjitQoBI0z22T5ZN9I/wF1GxGkQ2JlBhQytTkIdqbbbAr9E0fWPGXv fi/Jljtj+cCIlJje8BM8ZBZ/+K/FEvLpgWD3JNWsAyG3ouTahOVnUZOxhxV77brH2aQY UxKAlSph4XISMIBXwDJBau1h5odk4CsqxZcjE= MIME-Version: 1.0 Received: by 10.151.72.5 with SMTP id z5mr10701116ybk.235.1276728766403; Wed, 16 Jun 2010 15:52:46 -0700 (PDT) Sender: djmitche@gmail.com Received: by 10.150.144.4 with HTTP; Wed, 16 Jun 2010 15:52:46 -0700 (PDT) Date: Wed, 16 Jun 2010 17:52:46 -0500 X-Google-Sender-Auth: -4m3N1CGQ0YvXDjedlxBeJyuIJ8 Message-ID: From: "Dustin J. Mitchell" To: freebsd-scsi@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Jean-Louis Martineau Subject: sa: write returns 0 = LEOM? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 23:20:14 -0000 I'm investigating a user bug report in Amanda: http://forums.zmanda.com/showthread.php?t=2832 The problem boils down to a write(2) call for a SCSI tape device (/dev/nsa0) returning 0 after quite a bit of data and a number of filemarks have been written. Jean-Louis suspected that this was an early warning EOM indication, and that a subsequent write() would succeed, with Amanda having been duly warned that a physical EOM is coming up. But looking at scsi_sa.c, this doesn't seem to be the case. It looks like an early warning would result in a successful write instead, because resid is set to zero. cam/scsi/scsi_sa.c: 2418 /* 2419 * Handle filemark, end of tape, mismatched record sizes.... 2420 * From this point out, we're only handling read/write cases. 2421 * Handle writes && reads differently. 2422 */ 2423 2424 if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) { 2425 if (sense_key == SSD_KEY_VOLUME_OVERFLOW) { 2426 csio->resid = resid; 2427 error = ENOSPC; 2428 } else if (sense->flags & SSD_EOM) { 2429 softc->flags |= SA_FLAG_EOM_PENDING; 2430 /* 2431 * Grotesque as it seems, the few times 2432 * I've actually seen a non-zero resid, 2433 * the tape drive actually lied and had 2434 * written all the data!. 2435 */ 2436 csio->resid = 0; 2437 } That said, I don't know my way around the kernel source, so I'm probably missing something obvious. So: 1. What could cause a write syscall to return 0? 2. Since we will be using early warning in the next version of Amanda, hints as to the best way to handle early warning from userspace would be appreciated. Thanks for any pointers! Dustin -- Open Source Storage Engineer http://www.zmanda.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 23:31:48 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1466A1065674 for ; Wed, 16 Jun 2010 23:31:48 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id D59758FC0A for ; Wed, 16 Jun 2010 23:31:47 +0000 (UTC) Received: from [192.168.0.102] (m206-63.dsl.tsoft.com [198.144.206.63]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GNVlDG060455 for ; Wed, 16 Jun 2010 16:31:47 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C195EE6.1050207@feral.com> Date: Wed, 16 Jun 2010 16:31:50 -0700 From: Matthew Jacob User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.67.166.1]); Wed, 16 Jun 2010 16:31:47 -0700 (PDT) Subject: Re: sa: write returns 0 = LEOM? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 23:31:48 -0000 On 6/16/2010 3:52 PM, Dustin J. Mitchell wrote: > I'm investigating a user bug report in Amanda: > http://forums.zmanda.com/showthread.php?t=2832 > > The problem boils down to a write(2) call for a SCSI tape device > (/dev/nsa0) returning 0 after quite a bit of data and a number of > filemarks have been written. Jean-Louis suspected that this was an > early warning EOM indication, and that a subsequent write() would > succeed, with Amanda having been duly warned that a physical EOM is > coming up. That is, I believe, a specific feature of Solaris (EOM detection triggers a zero write, but allows for trailer records). I seem to recall helping architect this back in 1996. > But looking at scsi_sa.c, this doesn't seem to be the > case. It looks like an early warning would result in a successful > write instead, because resid is set to zero. > > cam/scsi/scsi_sa.c: > 2418 /* > 2419 * Handle filemark, end of tape, mismatched record sizes.... > 2420 * From this point out, we're only handling read/write cases. > 2421 * Handle writes&& reads differently. > 2422 */ > 2423 > 2424 if (csio->cdb_io.cdb_bytes[0] == SA_WRITE) { > 2425 if (sense_key == SSD_KEY_VOLUME_OVERFLOW) { > 2426 csio->resid = resid; > 2427 error = ENOSPC; > 2428 } else if (sense->flags& SSD_EOM) { > 2429 softc->flags |= SA_FLAG_EOM_PENDING; > 2430 /* > 2431 * Grotesque as it seems, the few times > 2432 * I've actually seen a non-zero resid, > 2433 * the tape drive actually lied and had > 2434 * written all the data!. > 2435 */ > 2436 csio->resid = 0; > 2437 } > > Yes, I remember this code. I remember on doing test readbacks that the residual reported was in fact incorrect- the data had actually been written. But this was really a long while back (at least 8 years ago). > That said, I don't know my way around the kernel source, so I'm > probably missing something obvious. So: > > 1. What could cause a write syscall to return 0? > I'll try and look into this. Do you happen to know whether the device you experienced this on was set in fixed block or variable block mode? > 2. Since we will be using early warning in the next version of Amanda, > hints as to the best way to handle early warning from userspace would > be appreciated. > > Urrr.... I used to have opinions about this. Now I'm not so sure. Expecting consistent behaviour from platform to platform is tough. Can't you write until you get a hard failure, back up one record (which, of course, you've hung onto), write a trailer label and then ask for a new tape? From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 23:32:23 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCC451065674 for ; Wed, 16 Jun 2010 23:32:23 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id A4C818FC12 for ; Wed, 16 Jun 2010 23:32:23 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.4/8.14.4) with ESMTP id o5GNWIJS074858; Wed, 16 Jun 2010 17:32:18 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> Date: Wed, 16 Jun 2010 17:32:18 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> To: Andrew Boyer X-Mailer: Apple Mail (2.1078) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 23:32:23 -0000 On Jun 16, 2010, at 10:17 AM, Andrew Boyer wrote: > Hello SCSI experts, > We recently saw this SCSI command error: >=20 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 = 2 c8 7f a0 0 0 20 0 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI = Status Error >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check = Condition >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND = asc:4e,0 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands = attempted field replaceable unit: 1 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command = (per Sense Data) >> Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 = timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800) >> Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req = 0xffffffff815d5c20:40101 function 0 >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort = timed-out. Resetting controller >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 >> Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req = 0xffffffff815d5c20:40101 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 >=20 > No one here has ever seen this before. We're using a CAM and MPT = stack from August 2009 with an LSI1068e HBA connected to Seagate SAS = HDDs. >=20 > This is what the SCSI Architecture Manual (SAM-5 draft) has to say = about overlapped commands: >> [...] >=20 > Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? Don't want to step on Matt, but wanted to expand on what he's said so = far. CAM doesn't assign tag identifiers for initiator I/O, it leaves that up = to the driver and hardware. The tag_id field that you see in CCB's is = for target I/O only. In the case of MPT, the firmware assigns tags, = while on simpler controllers like ESP the driver does it. CAM does = provide the tag action message, i.e. SIMPLE, ORDERED, HEAD_OF_Q, and = it's up to the driver to relay that to hardware, which MPT does in = mpt_start(). The MPT architecture abstracts a lot of the transport protocol away, so = it's generally assumed that it's going to do the right thing in a case = like this. I don't know if the firmware is wrong, or if FreeBSD is = wrong. CAM almost always attaches a SIMPLE action flag with I/O = commands, and the MPT driver looks like it will faithfully translate = that into the corresponding MPT flag. By looking at the inquiry data, = it's roughly possible to determine if the device supports tagged = queuing, so maybe CAM needs to be smarter about this. Instead of the TQ = flag just affecting command scheduling, maybe it also needs to suppress = attaching the SIMPLE action flag, and likewise the MPT driver should set = an UNTAGGED flag in correlation to that. I would expect the MPT firmware to look at the inquiry data and behave = appropriately despite what might be sent in the MPT i/o request, but = again, maybe that's asking too much. If you're adventurous, try = modifying the MPT driver to always set the MPI_SCSIIO_CONTROL_UNTAGGED = flag in mpt_start(), and see if that makes your problem go away. >=20 > Also, is CAM doing the right thing by retrying? scsi_error_action() = in cam/scsi/scsi_all.c always sets the retry bit on aborted commands, = even though the spec quoted above makes it sound like this should be a = fatal error ("This is considered a catastrophic failure on the part of = the SCSI initiator device"). Should scsi_error_action() be looking at = the Additional Sense Code? >=20 The error recovery code in CAM already cross references the ASC/ASCQ to = an action table, but that table is often incomplete for uncommon edge = cases. Try the following: RCS file: /usr1/ncvs/src/sys/cam/scsi/scsi_all.c,v retrieving revision 1.55.2.3 diff -u -r1.55.2.3 scsi_all.c --- scsi_all.c 14 Feb 2010 19:38:27 -0000 1.55.2.3 +++ scsi_all.c 16 Jun 2010 23:31:47 -0000 @@ -1962,7 +1962,7 @@ { SST(0x4D, 0xFF, SS_RDEF | SSQ_RANGE, NULL) }, /* Range 0x00->0xFF */ /* DTLPWROMAEBKVF */ - { SST(0x4E, 0x00, SS_RDEF, + { SST(0x4E, 0x00, SS_FATAL | ENXIO, "Overlapped commands attempted") }, /* T */ { SST(0x50, 0x00, SS_RDEF, Scott From owner-freebsd-scsi@FreeBSD.ORG Thu Jun 17 00:14:42 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F931106566B for ; Thu, 17 Jun 2010 00:14:42 +0000 (UTC) (envelope-from djmitche@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id E47738FC0A for ; Thu, 17 Jun 2010 00:14:41 +0000 (UTC) Received: by gwj20 with SMTP id 20so5811887gwj.13 for ; Wed, 16 Jun 2010 17:14:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=o8ItulErkUN8ALt3cYgvoY9F2iJ7AsaUi/mdak/yn/w=; b=SVg9c09t6o6sfjz6y84T+AjoPQru0h//qcaUN6Agm+oKYwyReerVHjr7jZk1nI+Usp h4MN4lpALcRHXjTfXfXzYdmQczvxCxWwc4/erJn/hC8Wb0btCbCgZLBiEdkSe+v6+NL6 aVHbXSS78EMrLkNSq4SaLIyD2tg4vcacyFAEg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=j6wtZn0CthzqDtfT2Je9ku4CRc8neviISX9nRlTrv2E1xPZVa2kGpNvhSVQwr0n+Ou DAlQhc8FZAAC+1fWKY7dy7rSD022RJVotHWsD5EiLSkl+nxYCbd2morbkAwu0emyHnfk qZJOcfgo3xvIDVnTM9Ap/qLONkeVsfNuRX2Yo= MIME-Version: 1.0 Received: by 10.150.118.13 with SMTP id q13mr10909723ybc.255.1276733681039; Wed, 16 Jun 2010 17:14:41 -0700 (PDT) Sender: djmitche@gmail.com Received: by 10.150.144.4 with HTTP; Wed, 16 Jun 2010 17:14:41 -0700 (PDT) In-Reply-To: <4C195EE6.1050207@feral.com> References: <4C195EE6.1050207@feral.com> Date: Wed, 16 Jun 2010 19:14:41 -0500 X-Google-Sender-Auth: 2me_MxeK2iqWqypOfnpVopU4OGM Message-ID: From: "Dustin J. Mitchell" To: Matthew Jacob Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-scsi@freebsd.org Subject: Re: sa: write returns 0 = LEOM? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jun 2010 00:14:42 -0000 On Wed, Jun 16, 2010 at 6:31 PM, Matthew Jacob wrote: > That is, I believe, a specific feature of Solaris (EOM detection triggers= a > zero write, but allows for trailer records). =C2=A0I seem to recall helpi= ng > architect this back in 1996. Gotcha. My plan is to introduce support for EOM detection platform-by-platform, as I can research and verify the behavior of that platform's tape driver. > Yes, I remember this code. I remember on doing test readbacks that the > residual reported was in fact incorrect- the data had actually been writt= en. > But this was really a long while back (at least 8 years ago). I don't dispute this at all. I'm often surprised when a tape drive *does* do what it claims to do! >> 1. What could cause a write syscall to return 0? >> > > I'll try and look into this. > > Do you happen to know whether the device you experienced this on was set = in > fixed block or variable block mode? I will ask. I know that Amanda always operates *as if* it was in fixed block mode (which is to say, it always reads and writes with identically-sized buffers), so the underlying driver configuration usually doesn't matter. I'll let you know what I find out. > Can't you write until you get a hard failure, back up one record (which, = of > course, you've hung onto), write a trailer label and then ask for a new > tape? Historically, Amanda has not trusted tape drives at physical EOM - or any time, really. This strategy has paid off, but as technologies have advanced it may be time to trust a bit more. Also, I was under the impression that most (or at least some) drives would not let you BSR in write mode. Thanks for looking into this! Dustin --=20 Open Source Storage Engineer http://www.zmanda.com From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 23:50:23 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB7421065679 for ; Wed, 16 Jun 2010 23:50:23 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id A26618FC15 for ; Wed, 16 Jun 2010 23:50:23 +0000 (UTC) Received: from [192.168.0.102] (m206-63.dsl.tsoft.com [198.144.206.63]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5GNoNmQ060552 for ; Wed, 16 Jun 2010 16:50:23 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C196342.3090107@feral.com> Date: Wed, 16 Jun 2010 16:50:26 -0700 From: Matthew Jacob User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Default is to whitelist mail, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.67.166.1]); Wed, 16 Jun 2010 16:50:23 -0700 (PDT) X-Mailman-Approved-At: Thu, 17 Jun 2010 05:29:48 +0000 Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 23:50:23 -0000 > Don't want to step on Matt, but wanted to expand on what he's said so far. > > not stepping on at all- glad you helped clarify it better From owner-freebsd-scsi@FreeBSD.ORG Sat Jun 19 18:05:26 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02134106566C for ; Sat, 19 Jun 2010 18:05:26 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id B58658FC16 for ; Sat, 19 Jun 2010 18:05:25 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o5JI5OOW039018 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Sat, 19 Jun 2010 11:05:25 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C1D06DD.6090506@feral.com> Date: Sat, 19 Jun 2010 11:05:17 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.3 (ns1.feral.com [192.168.221.1]); Sat, 19 Jun 2010 11:05:25 -0700 (PDT) Subject: Re: sa: write returns 0 = LEOM? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jun 2010 18:05:26 -0000 On 6/16/2010 3:52 PM, Dustin J. Mitchell wrote: > I'm investigating a user bug report in Amanda: > http://forums.zmanda.com/showthread.php?t=2832 > > The problem boils down to a write(2) call for a SCSI tape device > (/dev/nsa0) returning 0 after quite a bit of data and a number of > filemarks have been written. Jean-Louis suspected that this was an > early warning EOM indication, and that a subsequent write() would > succeed, with Amanda having been duly warned that a physical EOM is > coming up. But looking at scsi_sa.c, this doesn't seem to be the > case. It looks like an early warning would result in a successful > write instead, because resid is set to zero. > I did some more thinking and remembering about this- sorry, but it's really been years since I had to think about tape drives. The code in question notes that EOM (end of media) is pending so that the *next* write operation will get a full residual. So, yes, when you got a full residual back (no data moves), but no error, that is an EOM indicator for the user app.