From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 22 11:17:40 2005 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4978316A4CE for ; Tue, 22 Feb 2005 11:17:40 +0000 (GMT) Received: from svm.csie.ntu.edu.tw (svm.csie.ntu.edu.tw [140.112.90.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9876F43D5C for ; Tue, 22 Feb 2005 11:17:39 +0000 (GMT) (envelope-from rafan@svm.csie.ntu.edu.tw) Received: from svm.csie.ntu.edu.tw (localhost [127.0.0.1]) by svm.csie.ntu.edu.tw (8.13.1/8.13.1) with ESMTP id j1MBHaNS011031 for ; Tue, 22 Feb 2005 19:17:36 +0800 (CST) (envelope-from rafan@svm.csie.ntu.edu.tw) Received: (from rafan@localhost) by svm.csie.ntu.edu.tw (8.13.1/8.13.1/Submit) id j1MBHV7D011030 for scsi@freebsd.org; Tue, 22 Feb 2005 19:17:31 +0800 (CST) (envelope-from rafan) Date: Tue, 22 Feb 2005 19:17:31 +0800 From: Rong-En Fan To: scsi@freebsd.org Message-ID: <20050222111731.GA10825@svm.csie.ntu.edu.tw> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.7i Subject: strange SCSI problem on 5.3, 4.10 ok X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2005 11:17:40 -0000 [please CC'ed, thanks] Hi, Recently I upgraded a 4.10 box to 5.3 and encounter some strange SCSI problem and causes panic. My configuration is: Adaptec 29160, ahc(4) SliverStar GT1008 Hardware (manfactured by Infortrend), 160MB/s, da1 Infortrend IFT7200 RAID, 160MB/s, da2, da3 dmesg: http://rafan.infor.org/tmp/scsi/dmesg.boot While I'm doing rsync all da2's data to da1 (about 20MB/s) after few mintures, (da1:ahc1:0:0:0): Unexpected busfree in Command phase SEQADDR == 0x16c (da1:ahc1:0:0:0): lost device (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): WRITE(10). CDB: 2a 0 0 33 6d 3f 0 0 80 0 (da1:ahc1:0:0:0): CAM Status: SCSI Status Error (da1:ahc1:0:0:0): SCSI Status: Check Condition (da1:ahc1:0:0:0): UNIT ATTENTION asc:29,0 (da1:ahc1:0:0:0): Power on, reset, or bus device reset occurred (da1:ahc1:0:0:0): Retrying Command (per Sense Data) panic: softdep_move_dependencies: need merge code cpuid = 1 boot() called on cpu#1 Uptime: 23h1m3s (da1:ahc1:0:0:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 (da1:ahc1:0:0:0): Sense Error Code 0x80 at block no. -1051665751 (decimal) Cannot dump. No dump device defined. Shutting down ACPI Then I tried again and again: login: (da1:ahc1:0:0:0): Unexpected busfree in Command phase SEQADDR == 0x16c (da1:ahc1:0:0:0): lost device (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): Invalidating pack (da1:ahc1:0:0:0): WRITE(10). CDB: 2a 0 d 12 a8 ff 0 0 80 0 (da1:ahc1:0:0:0): CAM Status: SCSI Status Error (da1:ahc1:0:0:0): SCSI Status: Check Condition (da1:ahc1:0:0:0): UNIT ATTENTION asc:29,0 (da1:ahc1:0:0:0): Power on, reset, or bus device reset occurred (da1:ahc1:0:0:0): Retries Exhausted (da1:ahc1:0:0:0): Invalidating pack panic: softdep_move_dependencies: need merge code cpuid = 0 boot() called on cpu#0 Uptime: 8h56m7s (da1:ahc1:0:0:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 (da1:ahc1:0:0:0): Sense Error Code 0x50 another one: (da1:ahc1:0:0:0): Unexpected busfree in Command phase SEQADDR == 0x16c (da1:ahc1:0:0:0): lost device (da1:ahc1:0:0:0): Invalidating pack initiate_write_filepage: already started ... (da1:ahc1:0:0:0): Invalidating pack initiate_write_filepage: already started ... panic: initiate_write_inodeblock_ufs2: already started cpuid = 0 boot() called on cpu#0 Uptime: 22m28s The former two, tag depth is 32 (as the same as the RAID), last time tag depth is 8. Before upgrading to 5.3, this box works pretty well under this kind of IO. I saw some old posts from Xin Li last Oct or Nov have some problems on 5.3 and OK on RELENG_4. I'm upgrading to RELENG_5 and will try current later, see if they are OK or not. Since hardware (even firmware) is identical as before, I suspect that there are some fix in RELENG_4 and not go back to current or RELENG_5, or this is a new problem in RELENG_5? Regards, Rong-En Fan