From owner-freebsd-scsi@FreeBSD.ORG  Tue Feb 22 11:17:40 2005
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4978316A4CE
	for <scsi@freebsd.org>; Tue, 22 Feb 2005 11:17:40 +0000 (GMT)
Received: from svm.csie.ntu.edu.tw (svm.csie.ntu.edu.tw [140.112.90.75])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9876F43D5C
	for <scsi@freebsd.org>; Tue, 22 Feb 2005 11:17:39 +0000 (GMT)
	(envelope-from rafan@svm.csie.ntu.edu.tw)
Received: from svm.csie.ntu.edu.tw (localhost [127.0.0.1])
	by svm.csie.ntu.edu.tw (8.13.1/8.13.1) with ESMTP id j1MBHaNS011031
	for <scsi@freebsd.org>; Tue, 22 Feb 2005 19:17:36 +0800 (CST)
	(envelope-from rafan@svm.csie.ntu.edu.tw)
Received: (from rafan@localhost)
	by svm.csie.ntu.edu.tw (8.13.1/8.13.1/Submit) id j1MBHV7D011030
	for scsi@freebsd.org; Tue, 22 Feb 2005 19:17:31 +0800 (CST)
	(envelope-from rafan)
Date: Tue, 22 Feb 2005 19:17:31 +0800
From: Rong-En Fan <rafan@csie.org>
To: scsi@freebsd.org
Message-ID: <20050222111731.GA10825@svm.csie.ntu.edu.tw>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.7i
Subject: strange SCSI problem on 5.3, 4.10 ok
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Feb 2005 11:17:40 -0000

[please CC'ed, thanks]

Hi,

Recently I upgraded a 4.10 box to 5.3 and encounter some strange
SCSI problem and causes panic. My configuration is:

  Adaptec 29160, ahc(4)
  SliverStar GT1008 Hardware (manfactured by Infortrend), 160MB/s, da1
  Infortrend IFT7200 RAID, 160MB/s, da2, da3
  dmesg: http://rafan.infor.org/tmp/scsi/dmesg.boot

While I'm doing rsync all da2's data to da1 (about 20MB/s)
after few mintures,

(da1:ahc1:0:0:0): Unexpected busfree in Command phase
SEQADDR == 0x16c
(da1:ahc1:0:0:0): lost device
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): WRITE(10). CDB: 2a 0 0 33 6d 3f 0 0 80 0
(da1:ahc1:0:0:0): CAM Status: SCSI Status Error
(da1:ahc1:0:0:0): SCSI Status: Check Condition
(da1:ahc1:0:0:0): UNIT ATTENTION asc:29,0
(da1:ahc1:0:0:0): Power on, reset, or bus device reset occurred
(da1:ahc1:0:0:0): Retrying Command (per Sense Data)
panic: softdep_move_dependencies: need merge code
cpuid = 1
boot() called on cpu#1
Uptime: 23h1m3s
(da1:ahc1:0:0:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0
(da1:ahc1:0:0:0): Sense Error Code 0x80 at block no. -1051665751 (decimal)
Cannot dump. No dump device defined.
Shutting down ACPI

Then I tried again and again:

login: (da1:ahc1:0:0:0): Unexpected busfree in Command phase
SEQADDR == 0x16c                                            
(da1:ahc1:0:0:0): lost device
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): Invalidating pack
(da1:ahc1:0:0:0): WRITE(10). CDB: 2a 0 d 12 a8 ff 0 0 80 0
(da1:ahc1:0:0:0): CAM Status: SCSI Status Error           
(da1:ahc1:0:0:0): SCSI Status: Check Condition 
(da1:ahc1:0:0:0): UNIT ATTENTION asc:29,0     
(da1:ahc1:0:0:0): Power on, reset, or bus device reset occurred
(da1:ahc1:0:0:0): Retries Exhausted                            
(da1:ahc1:0:0:0): Invalidating pack
panic: softdep_move_dependencies: need merge code
cpuid = 0                                        
boot() called on cpu#0
Uptime: 8h56m7s       
(da1:ahc1:0:0:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0
(da1:ahc1:0:0:0): Sense Error Code 0x50                       

another one:

(da1:ahc1:0:0:0): Unexpected busfree in Command phase
SEQADDR == 0x16c                                             
(da1:ahc1:0:0:0): lost device
(da1:ahc1:0:0:0): Invalidating pack
initiate_write_filepage: already started
...
(da1:ahc1:0:0:0): Invalidating pack
initiate_write_filepage: already started
...
panic: initiate_write_inodeblock_ufs2: already started
cpuid = 0
boot() called on cpu#0
Uptime: 22m28s

The former two, tag depth is 32 (as the same as the RAID), last
time tag depth is 8. Before upgrading to 5.3, this box works
pretty well under this kind of IO. I saw some old posts from Xin
Li last Oct or Nov have some problems on 5.3 and OK on RELENG_4.

I'm upgrading to RELENG_5 and will try current later, see if they
are OK or not. Since hardware (even firmware) is identical as before,
I suspect that there are some fix in RELENG_4 and not go back to
current or RELENG_5, or this is a new problem in RELENG_5?

Regards,
Rong-En Fan