From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 3 10:31:42 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF028106564A for ; Thu, 3 Nov 2011 10:31:42 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.9]) by mx1.freebsd.org (Postfix) with ESMTP id 5CFD08FC16 for ; Thu, 3 Nov 2011 10:31:42 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MaE2a-1RfdK904c4-00K1wW; Thu, 03 Nov 2011 11:31:41 +0100 Message-ID: <4EB26D8B.1090804@brockmann-consult.de> Date: Thu, 03 Nov 2011 11:31:39 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:j9R91YIvAcdO5yG4kFGHNpADIxxh0zjDlwLoz5RyZDt dxykdEygGq3v0xYZfDRgMvFPg51uo7sbjDaWP1U6DmDqaaOMPD qJnUooAJ+1l/k5H7bV+fWx0osCv0fRfGLnCePnbCpGjQjjjqQu thnnZAnqpviM0YNgXrAk40Dg8lfIhG+xQdbOoKpzWy1VR2w/o3 WEoa66QCW2XdA8BpZ8YyuOMbf21UJWHBJ5CESbaKCy/kc+bT3w UAScDXs+6F77BQhC0mBsHKDaFGRlmpbxxGDlckgYFULcdwbGH7 8XUY7PqF+KP/vrPjMOHbe+FzPWY8d4O1cwGNGbyg3yV6L3GH5H v3nltVs+zF8EooMysUOjJdA66Rr9GtFCG7S+AAVwd Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Nov 2011 10:31:42 -0000 Dear Jason, On 11/02/2011 07:05 PM, Jason Wolfe wrote: > Hello, > Testing with the LSI supplied driver, it appears they have a code path for > this condition that causes our driver to crash. Here are 2 sets of > messages: > > mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm > 0xffffff800040bdf8 > (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 > SMID 97 command timeout cm 0xffffff800040bdf8 ccb 0xffffff00 > mpslsi0: mpssas_alloc_tm freezing simq > mpslsi0: timedout cm 0xffffff800040bdf8 allocated tm 0xffffff8000409070 > (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 55 96 48 7f 0 0 80 0 length 65536 > SMID 171 completed cm 0xffffff80004105a8 ccb 0xffffff03c3443y > (da0:mpslsi0:0:8:0): READ(10). CDB: 28 0 54 f8 a4 3f 0 0 80 0 length 65536 > SMID 762 completed cm 0xffffff8000434230 ccb 0xffffff001317ay > (da0:mpslsi0:0:8:0): WRITE(10). CDB: 2a 0 55 bf 5a 3f 0 1 0 0 length 131072 > SMID 97 completed timedout cm 0xffffff800040bdf8 ccb 0xffff1 > (noperiph:mpslsi0:0:8:0): SMID 50 finished recovery after aborting TaskMID > 97 > mpslsi0: mpssas_free_tm releasing simq > > > mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff80003fb000 cm > 0xffffff8000441e18 > (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length > 131072 SMID 989 command timeout cm 0xffffff8000441e18 ccb 0xfffff0 > mpslsi0: mpssas_alloc_tm freezing simq > mpslsi0: timedout cm 0xffffff8000441e18 allocated tm 0xffffff80004063e0 > (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 14 a1 4f 0 1 0 0 length 131072 > SMID 857 completed cm 0xffffff8000439e38 ccb 0xffffff001316y > (da7:mpslsi0:0:15:0): READ(10). CDB: 28 0 71 e4 98 57 0 0 80 0 length 65536 > SMID 300 completed cm 0xffffff80004182a0 ccb 0xffffff0392f0y > (da7:mpslsi0:0:15:0): WRITE(10). CDB: 2a 0 33 76 29 ef 0 1 0 0 length > 131072 SMID 989 completed timedout cm 0xffffff8000441e18 ccb 0xff1 > (noperiph:mpslsi0:0:15:0): SMID 4 finished recovery after aborting TaskMID > 989 > mpslsi0: mpssas_free_tm releasing simq > > The server ran for 10 minutes with these happening every 10-30 seconds, > with our community driver the first instance of commands timing out during > this smartctl storm would cause the server to hang and sometimes the > controller to reset. Hopefully this is helpful to someone. > Does this mean it didn't hang? or it ran your smartctl -a test for 10 minutes before a hang? I am also trying the mpslsi driver now, but I couldn't reproduce the problem using "smartctl -a" (also tried -A, -h and -i) with the mps driver. Tags was set to 255 on all disks. I only tried it on the backup server, which didn't crash randomly on its own either. So I will just have to assume it works if it doesn't do the same thing in a month or two. However, with the mpslsi driver, during a scrub on the backup server (probably during smartctl -a), I got these messages (including what looks like a controller reset), and no disks were lost, with no read errors reported in zpool status. But I can't get it to happen a second time. So I hope that means our problems are over. Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f65f698 Nov 3 09:17:10 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 command timeout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq Nov 3 09:17:10 bcnas1bak kernel: mpslsi0: timedout cm 0xffffff800f65f698 allocated tm 0xffffff800f6340f8 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 completed cm 0xffffff800f643cd8 ccb 0xffffff0026bd1000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2c f3 be e2 0 0 2a 0 length 21504 SMID 261 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 completed cm 0xffffff800f654550 ccb 0xffffff0026b96000 during recovery i oc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 52 1e 2 e3 0 0 2b 0 length 22016 SMID 534 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 completed cm 0xffffff800f664510 ccb 0xffffff003d438000 during recovery i oc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 3a 5 14 a3 0 0 2b 0 length 22016 SMID 798 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 completed cm 0xffffff800f657b90 ccb 0xffffff00314ce800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 81 86 6f 0 0 2b 0 length 22016 SMID 590 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 completed cm 0xffffff800f65a630 ccb 0xffffff0026ba1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 39 47 e8 2c 0 0 2a 0 length 21504 SMID 634 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 completed cm 0xffffff800f65ece8 ccb 0xffffff0026bb1800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 2d 8b 96 af 0 0 2b 0 length 22016 SMID 707 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 717 completed timedout cm 0xffffff800f65f698 ccb 0xffffff0026bbb800 during recov(da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 completed cm 0xffffff800f65dc70 ccb 0xffffff0026bea800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 1c dc 68 73 0 0 2b 0 length 22016 SMID 690 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 completed cm 0xffffff800f66d568 ccb 0xffffff0026bf9000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 58 d da 33 0 0 2b 0 length 22016 SMID 947 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 completed cm 0xffffff800f65d5a8 ccb 0xffffff003d47f800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4b 30 d1 80 0 0 2a 0 length 21504 SMID 683 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 completed cm 0xffffff800f641428 ccb 0xffffff0031536000 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 4a d 10 d0 0 0 2b 0 length 22016 SMID 219 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 completed cm 0xffffff800f63e3b8 ccb 0xffffff00314ec800 during recovery ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 length 21504 SMID 169 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 completed cm 0xffffff800f63c6a8 ccb 0xffffff0026a89000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 139 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 completed cm 0xffffff800f6690a0 ccb 0xffffff00314c8800 during recovery ioc 8(pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 6 2c 0 da 0 0 0 0 0 4f 0 c2 0 b0 0 length 0 SMID 876 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 completed cm 0xffffff800f65c058 ccb 0xffffff0026b7d000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 661 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 completed cm 0xffffff800f650848 ccb 0xffffff0026be7800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 471 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 completed cm 0xffffff800f641048 ccb 0xffffff0026bef800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 215 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 completed cm 0xffffff800f6404a8 ccb 0xffffff0026bb6000 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 203 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 completed cm 0xffffff800f6550f0 ccb 0xffffff003d447800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d0 0 1 0 0 0 4f 0 c2 0 b0 0 length 512 SMID 546 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 completed cm 0xffffff800f6530f8 ccb 0xffffff0026bcb800 during recovery ioc (pass0:mpslsi0:0:10:0): ATA COMMAND PASS THROUGH(16). CDB: 85 8 e 0 d5 0 1 0 6 0 4f 0 c2 0 b0 0 length 512 SMID 513 terminated ioc 804b scsi 0 state c xfer 0 Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 abort TaskMID 717 status 0x0 code 0x0 count 20 Nov 3 09:17:11 bcnas1bak kernel: (noperiph:mpslsi0:0:10:0): SMID 1 finished recovery after aborting TaskMID 717 Nov 3 09:17:11 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): READ(10). CDB: 28 0 41 1e 9a 58 0 0 2a 0 Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): CAM status: SCSI Status Error Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI status: Check Condition Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) Peter > Jason > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------