Date: Sun, 27 Jan 2013 16:39:11 +0200 From: "Vladislav Prodan" <universite@ukr.net> To: "Steven Hartland" <killing@multiplay.co.uk> Cc: current@freebsd.org, fs@freebsd.org Subject: Re[2]: AHCI timeout when using ZFS + AIO + NCQ Message-ID: <93308.1359297551.14145052969567453184@ffe15.ukr.net> In-Reply-To: <221B307551154F489452F89E304CA5F7@multiplay.co.uk> References: <13391.1359029978.3957795939058384896@ffe16.ukr.net> <221B307551154F489452F89E304CA5F7@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
> Is it always the same disk, of so replace it SMART helps identify issues > but doesn't tell you 100% there's no problem. Now it has fallen off a different HDD - ada0. I'm 99% sure that MHDD will not find problems in HDD - ada0 and ada2. I still have three servers with similar chipsets that have similar problems with blade ahci times out. > ----- Original Message ----- > From: "Vladislav Prodan" <universite@ukr.net> > To: <fs@freebsd.org> > Cc: <current@freebsd.org> > Sent: Thursday, January 24, 2013 12:19 PM > Subject: AHCI timeout when using ZFS + AIO + NCQ > > > >I have the server: > > > > FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012 > > > > Jan 24 12:53:01 vesuvius kernel: atapci0: <JMicron ATA controller> port > > 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe210000-0xfe2101ff irq 51 at device 0.0 on pci3 > > ... > > Jan 24 12:53:01 vesuvius kernel: ahci0: <ATI IXP700 AHCI SATA controller> port > > 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0 > > Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported > > ... > > Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0 > > Jan 24 12:53:01 vesuvius kernel: ada2: <ST3000DM001-9YN166 CC4C> ATA-8 SATA 3.x device > > Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) > > Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled > > Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C) > > Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12 > > ... > > I use 4 HDD in RAID10 via ZFS. > > > > With a very irregular intervals fall off HDD drives. As a result, the server stops. > > > > Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0 > > Jan 24 06:48:06 vesuvius kernel: ahcich2: is 00000000 cs 00000000 ss 000000c0 rs 000000c0 tfd 40 serr 00000000 cmd 0000e817 > > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00 > > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout > > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command > > Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080) > > Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 > > Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817 > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked > > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 > > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 > > Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080) > > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192 > > Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0 > > Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817 > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout > > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked > > Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6 > > Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named) > > Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device > > Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone > > Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11 > > ... > > > > Helps only restart by pressing Power. > > Judging by the state of SMART, HDD have no problems. SATA data cable changed. > > > > > > I found a similar problem: > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html > > PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout > > > > -- > > Vladislav V. Prodan > > System & Network Administrator > > http://support.od.ua > > +380 67 4584408, +380 99 4060508 > > VVP88-RIPE -- Vladislav V. Prodan System & Network Administrator http://support.od.ua +380 67 4584408, +380 99 4060508 VVP88-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93308.1359297551.14145052969567453184>