Date: Thu, 12 Jul 2018 22:35:01 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 229745] ahcich: CAM status: Command timeout Message-ID: <bug-229745-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D229745 Bug ID: 229745 Summary: ahcich: CAM status: Command timeout Product: Base System Version: 11.2-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: fbsd98816551@avksrv.org Hello! We have some Supermicro server based on X11SSH-F All servers were installed half year ago and works under Fbsd 11.1. All ser= ver have 4 HDD HGST HUS722T1TALA604 All of them works fine for this time with half year uptime. Recently servers were upgraded to Fbsd 11.2 (self build 11.2-STABLE r335679 with default make.conf src.conf and GENERIC) and after some time (all the time different, from 2 hours to 7 days) one or some disks started timeout: Jul 13 00:56:24 mrr32 kernel: ahcich2: Timeout on slot 17 port 0 Jul 13 00:56:24 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00060000 = rs 00060000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 20 ca 22 23 40 06 00 00 00 00 00 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 00:58:16 srv32 kernel: ahcich2: Timeout on slot 26 port 0 Jul 13 00:58:16 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 04000000 = rs 04000000 tfd 40 serr 00000000 cmd 0004da17 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 e0 8a cc c6 40 18 00 00 00 00 00 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:01:46 srv32 kernel: ahcich2: Timeout on slot 18 port 0 Jul 13 01:01:46 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00040000 = rs 00040000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 20 2a 2b 23 40 06 00 00 00 00 00 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:07:12 srv32 kernel: ahcich0: Timeout on slot 23 port 0 Jul 13 01:07:12 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00800000 = rs 00800000 tfd 40 serr 00000000 cmd 0004d717 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 18 62 f5 c6 40 18 00 00 00 00 00 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command tim= eout Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command Jul 13 01:07:43 srv32 kernel: ahcich0: Timeout on slot 2 port 0 Jul 13 01:07:43 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00000004 = rs 00000004 tfd 40 serr 00000000 cmd 0004c217 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 10 62 12 7b 40 06 00 00 00 00 00 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command tim= eout Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command reboot (/sbin/shutdown -r or /sbin/reboot) does not solve the problem, disks still timeout after boot. Only power off / power on solve problem for some time. and after while it generate timeount=20 Servers were updated to latest bios available on Supermicro. No changes. ahci0: <Intel Sunrise Point AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xdf310000-0xdf311fff,0xdf31e000-0xdf31e0ff,0xdf31d000-0xdf31d7ff irq 16 at device 23.0 on pci0 ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahcich6: <AHCI channel> at channel 6 on ahci0 ahcich7: <AHCI channel> at channel 7 on ahci0 ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 1.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <HGST HUS722T1TALA604 RAGNWA07> ACS-3 ATA SATA 3.x device ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors) ahci0@pci0:0:23:0: class=3D0x010601 card=3D0x088415d9 chip=3D0xa102808= 6 rev=3D0x31 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D 'Sunrise Point-H SATA controller [AHCI mode]' class =3D mass storage subclass =3D SATA We use zfs on all servers, some servers are raidz1, some raid-10, with same results We use to use smartd on all servers, I tried to disable smartd. Looks like = no changes. We already upgraded zpools to new features, it require remove features befo= re downgrade back to 11.1 --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-229745-227>