From owner-freebsd-fs@FreeBSD.ORG Sun Sep 25 15:51:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E46B1065675 for ; Sun, 25 Sep 2011 15:51:34 +0000 (UTC) (envelope-from nowakpl@platinum.linux.pl) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id C7C508FC15 for ; Sun, 25 Sep 2011 15:51:33 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id A62BB47E23; Sun, 25 Sep 2011 17:33:25 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.3.2 Received: from [172.19.191.2] (078088011125.bialystok.vectranet.pl [78.88.11.125]) by platinum.linux.pl (Postfix) with ESMTPA id 7246747E1D for ; Sun, 25 Sep 2011 17:33:22 +0200 (CEST) Message-ID: <4E7F49A7.1020909@platinum.linux.pl> Date: Sun, 25 Sep 2011 17:32:55 +0200 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.22) Gecko/20110902 Thunderbird/3.1.14 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Subject: ZFS and 3ware controller resets X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Sep 2011 15:51:34 -0000 I have a 20 disk storage system, every now and then a disk dies and causes 3ware controller to reset because of disk timeouts. This cuts out ZFS from all disks, even healthy ones and the system requires a hard reset. Two issues here: 1) Why the controller has to reset? Thats a completely insane way of dealing with drive timeout. 2) ZFS not reopening the disk after controller reset. FreeBSD version: 8.1-RELEASE-p1 /c0 Driver Version = 3.80.06.003 /c0 Model = 9650SE-16ML /c0 Available Memory = 224MB /c0 Firmware Version = FE9X 4.10.00.007 /c0 Bios Version = BE9X 4.08.00.002 /c0 Boot Loader Version = BL9X 3.08.00.001 pool: zp2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zp2 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 da4p1 ONLINE 0 0 0 da5p1 ONLINE 0 0 0 da6p1 ONLINE 0 0 0 da7p1 ONLINE 0 0 0 da9p1 ONLINE 0 0 0 da8p1 ONLINE 0 0 0 da10p1 ONLINE 0 0 0 Then when disk starts behaving: twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a3 f4 e7 60 0 0 8 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 7c 43 b8 0 0 10 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 ce e5 ca 30 0 0 20 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da3:twa0:0:3:0): READ(10). CDB: 28 0 a4 2d 2d f8 0 0 8 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 91 7c f8 0 0 20 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: Request 72 timed out! twa0: INFO: (0x16: 0x1108): Resetting controller...: twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0 twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=3 twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1 twa0: [ITHREAD] (da1:twa0:0:1:0): lost device (da2:twa0:0:2:0): lost device (da3:twa0:0:3:0): lost device (da4:twa0:0:4:0): lost device (da5:twa0:0:5:0): lost device (da6:twa0:0:6:0): lost device (da7:twa0:0:7:0): lost device (da8:twa0:0:8:0): lost device (da9:twa0:0:9:0): lost device (da10:twa0:0:10:0): lost device (da11:twa0:0:11:0): lost device (da12:twa0:0:12:0): lost device (da13:twa0:0:13:0): lost device (da1:twa0:0:1:0): removing device entry da1 at twa0 bus 0 scbus0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da2:twa0:0:2:0): removing device entry da2 at twa0 bus 0 scbus0 target 2 lun 0 da2: Fixed Direct Access SCSI-5 device da2: 100.000MB/s transfers da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da3:twa0:0:3:0): removing device entry da3 at twa0 bus 0 scbus0 target 3 lun 0 da3: Fixed Direct Access SCSI-5 device da3: 100.000MB/s transfers da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da4:twa0:0:4:0): removing device entry da4 at twa0 bus 0 scbus0 target 4 lun 0 da4: Fixed Direct Access SCSI-5 device da4: 100.000MB/s transfers da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da5:twa0:0:5:0): removing device entry da5 at twa0 bus 0 scbus0 target 5 lun 0 da5: Fixed Direct Access SCSI-5 device da5: 100.000MB/s transfers da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da6:twa0:0:6:0): removing device entry da6 at twa0 bus 0 scbus0 target 6 lun 0 da6: Fixed Direct Access SCSI-5 device da6: 100.000MB/s transfers da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da7:twa0:0:7:0): removing device entry da7 at twa0 bus 0 scbus0 target 7 lun 0 da7: Fixed Direct Access SCSI-5 device da7: 100.000MB/s transfers da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da8:twa0:0:8:0): removing device entry da8 at twa0 bus 0 scbus0 target 8 lun 0 da8: Fixed Direct Access SCSI-5 device da8: 100.000MB/s transfers da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da9:twa0:0:9:0): removing device entry da9 at twa0 bus 0 scbus0 target 9 lun 0 da9: Fixed Direct Access SCSI-5 device da9: 100.000MB/s transfers da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da10:twa0:0:10:0): removing device entry da10 at twa0 bus 0 scbus0 target 10 lun 0 da10: Fixed Direct Access SCSI-5 device da10: 100.000MB/s transfers da10: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da11:twa0:0:11:0): removing device entry da11 at twa0 bus 0 scbus0 target 11 lun 0 da11: Fixed Direct Access SCSI-5 device da11: 100.000MB/s transfers da11: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da12:twa0:0:12:0): removing device entry da12 at twa0 bus 0 scbus0 target 12 lun 0 da12: Fixed Direct Access SCSI-5 device da12: 100.000MB/s transfers da12: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da13:twa0:0:13:0): removing device entry da13 at twa0 bus 0 scbus0 target 13 lun 0 da13: Fixed Direct Access SCSI-5 device da13: 100.000MB/s transfers da13: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) pool: zp2 state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM zp2 ONLINE 7 11 0 raidz2 ONLINE 16 32 0 da1p1 ONLINE 4 10 0 da2p1 ONLINE 4 10 0 da3p1 ONLINE 5 642 1 da4p1 ONLINE 3 8 0 da5p1 ONLINE 3 12 0 da6p1 ONLINE 3 12 0 da7p1 ONLINE 3 12 0 da9p1 ONLINE 3 12 0 da8p1 ONLINE 3 14 0 da10p1 ONLINE 3 10 0 errors: 10 data errors, use '-v' for a list