Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Jan 2013 16:39:11 +0200
From:      "Vladislav Prodan" <universite@ukr.net>
To:        "Steven Hartland" <killing@multiplay.co.uk>
Cc:        current@freebsd.org, fs@freebsd.org
Subject:   Re[2]: AHCI timeout when using ZFS + AIO + NCQ
Message-ID:  <93308.1359297551.14145052969567453184@ffe15.ukr.net>
In-Reply-To: <221B307551154F489452F89E304CA5F7@multiplay.co.uk>
References:  <13391.1359029978.3957795939058384896@ffe16.ukr.net> <221B307551154F489452F89E304CA5F7@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help


> Is it always the same disk, of so replace it SMART helps identify issues
> but doesn't tell you 100% there's no problem.


Now it has fallen off a different HDD - ada0.
I'm 99% sure that MHDD will not find problems in HDD - ada0 and ada2.
I still have three servers with similar chipsets that have similar problems with blade ahci times out.


> ----- Original Message ----- 
> From: "Vladislav Prodan" <universite@ukr.net>
> To: <fs@freebsd.org>
> Cc: <current@freebsd.org>
> Sent: Thursday, January 24, 2013 12:19 PM
> Subject: AHCI timeout when using ZFS + AIO + NCQ
> 
> 
> >I have the server:
> >
> > FreeBSD 9.1-PRERELEASE #0: Wed Jul 25 01:40:56 EEST 2012
> >
> > Jan 24 12:53:01 vesuvius kernel: atapci0: <JMicron ATA controller> port 
> > 0xc040-0xc047,0xc030-0xc033,0xc020-0xc027,0xc010-0xc013,0xc000-0xc00f mem 0xfe210000-0xfe2101ff irq 51 at device 0.0 on pci3
> > ...
> > Jan 24 12:53:01 vesuvius kernel: ahci0: <ATI IXP700 AHCI SATA controller> port 
> > 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfe307000-0xfe3073ff irq 19 at device 17.0 on pci0
> > Jan 24 12:53:01 vesuvius kernel: ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
> > ...
> > Jan 24 12:53:01 vesuvius kernel: ada2 at ahcich2 bus 0 scbus4 target 0 lun 0
> > Jan 24 12:53:01 vesuvius kernel: ada2: <ST3000DM001-9YN166 CC4C> ATA-8 SATA 3.x device
> > Jan 24 12:53:01 vesuvius kernel: ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> > Jan 24 12:53:01 vesuvius kernel: ada2: Command Queueing enabled
> > Jan 24 12:53:01 vesuvius kernel: ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
> > Jan 24 12:53:01 vesuvius kernel: ada2: Previously was known as ad12
> > ...
> > I use 4 HDD in RAID10 via ZFS.
> >
> > With a very irregular intervals fall off HDD drives. As a result, the server stops.
> >
> > Jan 24 06:48:06 vesuvius kernel: ahcich2: Timeout on slot 6 port 0
> > Jan 24 06:48:06 vesuvius kernel: ahcich2: is 00000000 cs 00000000 ss 000000c0 rs 000000c0 tfd 40 serr 00000000 cmd 0000e817
> > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 4c 4e 1e 40 68 00 00 01 00 00
> > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
> > Jan 24 06:48:06 vesuvius kernel: (ada2:ahcich2:0:0:0): Retrying command
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
> > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
> > Jan 24 06:51:11 vesuvius kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 4227133, size: 8192
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: Timeout on slot 8 port 0
> > Jan 24 06:51:11 vesuvius kernel: ahcich2: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 00 serr 00000000 cmd 0000e817
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
> > Jan 24 06:51:11 vesuvius kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
> > Jan 24 06:51:11 vesuvius kernel: swap_pager: I/O error - pagein failed; blkno 4227133,size 8192, error 6
> > Jan 24 06:51:11 vesuvius kernel: (ada2:(pass2:vm_fault: pager read error, pid 1943 (named)
> > Jan 24 06:51:11 vesuvius kernel: ahcich2:0:ahcich2:0:0:0:0): lost device
> > Jan 24 06:51:11 vesuvius kernel: 0): passdevgonecb: devfs entry is gone
> > Jan 24 06:51:11 vesuvius kernel: pid 1943 (named), uid 53: exited on signal 11
> > ...
> >
> > Helps only restart by pressing Power.
> > Judging by the state of SMART, HDD have no problems. SATA data cable changed.
> >
> >
> > I found a similar problem:
> >
> > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055374.html
> > PR: amd64/165547: NVIDIA MCP67 AHCI SATA controller timeout
> >
> > -- 
> > Vladislav V. Prodan
> > System & Network Administrator
> > http://support.od.ua
> > +380 67 4584408, +380 99 4060508
> > VVP88-RIPE


-- 
Vladislav V. Prodan            
System & Network Administrator 
http://support.od.ua           
+380 67 4584408, +380 99 4060508
VVP88-RIPE




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93308.1359297551.14145052969567453184>