Date: Sun, 25 Sep 2011 17:32:55 +0200 From: Adam Nowacki <nowakpl@platinum.linux.pl> To: freebsd-fs@freebsd.org Subject: ZFS and 3ware controller resets Message-ID: <4E7F49A7.1020909@platinum.linux.pl>
next in thread | raw e-mail | index | archive | help
I have a 20 disk storage system, every now and then a disk dies and causes 3ware controller to reset because of disk timeouts. This cuts out ZFS from all disks, even healthy ones and the system requires a hard reset. Two issues here: 1) Why the controller has to reset? Thats a completely insane way of dealing with drive timeout. 2) ZFS not reopening the disk after controller reset. FreeBSD version: 8.1-RELEASE-p1 /c0 Driver Version = 3.80.06.003 /c0 Model = 9650SE-16ML /c0 Available Memory = 224MB /c0 Firmware Version = FE9X 4.10.00.007 /c0 Bios Version = BE9X 4.08.00.002 /c0 Boot Loader Version = BL9X 3.08.00.001 pool: zp2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zp2 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 da4p1 ONLINE 0 0 0 da5p1 ONLINE 0 0 0 da6p1 ONLINE 0 0 0 da7p1 ONLINE 0 0 0 da9p1 ONLINE 0 0 0 da8p1 ONLINE 0 0 0 da10p1 ONLINE 0 0 0 Then when disk starts behaving: twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a3 f4 e7 60 0 0 8 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 a5 4 83 80 0 0 80 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 7c 43 b8 0 0 10 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 ce e5 ca 30 0 0 20 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da3:twa0:0:3:0): READ(10). CDB: 28 0 a4 2d 2d f8 0 0 8 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=2 (da3:twa0:0:3:0): READ(10). CDB: 28 0 cb 91 7c f8 0 0 20 0 (da3:twa0:0:3:0): CAM status: SCSI Status Error (da3:twa0:0:3:0): SCSI status: Check Condition (da3:twa0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) twa0: Request 72 timed out! twa0: INFO: (0x16: 0x1108): Resetting controller...: twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0 twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=3 twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1 twa0: [ITHREAD] (da1:twa0:0:1:0): lost device (da2:twa0:0:2:0): lost device (da3:twa0:0:3:0): lost device (da4:twa0:0:4:0): lost device (da5:twa0:0:5:0): lost device (da6:twa0:0:6:0): lost device (da7:twa0:0:7:0): lost device (da8:twa0:0:8:0): lost device (da9:twa0:0:9:0): lost device (da10:twa0:0:10:0): lost device (da11:twa0:0:11:0): lost device (da12:twa0:0:12:0): lost device (da13:twa0:0:13:0): lost device (da1:twa0:0:1:0): removing device entry da1 at twa0 bus 0 scbus0 target 1 lun 0 da1: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da2:twa0:0:2:0): removing device entry da2 at twa0 bus 0 scbus0 target 2 lun 0 da2: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da2: 100.000MB/s transfers da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da3:twa0:0:3:0): removing device entry da3 at twa0 bus 0 scbus0 target 3 lun 0 da3: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da3: 100.000MB/s transfers da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da4:twa0:0:4:0): removing device entry da4 at twa0 bus 0 scbus0 target 4 lun 0 da4: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da4: 100.000MB/s transfers da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da5:twa0:0:5:0): removing device entry da5 at twa0 bus 0 scbus0 target 5 lun 0 da5: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da5: 100.000MB/s transfers da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da6:twa0:0:6:0): removing device entry da6 at twa0 bus 0 scbus0 target 6 lun 0 da6: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da6: 100.000MB/s transfers da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da7:twa0:0:7:0): removing device entry da7 at twa0 bus 0 scbus0 target 7 lun 0 da7: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da7: 100.000MB/s transfers da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da8:twa0:0:8:0): removing device entry da8 at twa0 bus 0 scbus0 target 8 lun 0 da8: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da8: 100.000MB/s transfers da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da9:twa0:0:9:0): removing device entry da9 at twa0 bus 0 scbus0 target 9 lun 0 da9: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da9: 100.000MB/s transfers da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da10:twa0:0:10:0): removing device entry da10 at twa0 bus 0 scbus0 target 10 lun 0 da10: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da10: 100.000MB/s transfers da10: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da11:twa0:0:11:0): removing device entry da11 at twa0 bus 0 scbus0 target 11 lun 0 da11: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da11: 100.000MB/s transfers da11: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da12:twa0:0:12:0): removing device entry da12 at twa0 bus 0 scbus0 target 12 lun 0 da12: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da12: 100.000MB/s transfers da12: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) (da13:twa0:0:13:0): removing device entry da13 at twa0 bus 0 scbus0 target 13 lun 0 da13: <AMCC 9650SE-16M DISK 4.10> Fixed Direct Access SCSI-5 device da13: 100.000MB/s transfers da13: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) pool: zp2 state: ONLINE status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: none requested config: NAME STATE READ WRITE CKSUM zp2 ONLINE 7 11 0 raidz2 ONLINE 16 32 0 da1p1 ONLINE 4 10 0 da2p1 ONLINE 4 10 0 da3p1 ONLINE 5 642 1 da4p1 ONLINE 3 8 0 da5p1 ONLINE 3 12 0 da6p1 ONLINE 3 12 0 da7p1 ONLINE 3 12 0 da9p1 ONLINE 3 12 0 da8p1 ONLINE 3 14 0 da10p1 ONLINE 3 10 0 errors: 10 data errors, use '-v' for a list
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E7F49A7.1020909>