From owner-freebsd-bugs@freebsd.org Thu Jul 12 22:35:03 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72B431029906 for ; Thu, 12 Jul 2018 22:35:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E4B2484628 for ; Thu, 12 Jul 2018 22:35:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id A556610298E3; Thu, 12 Jul 2018 22:35:02 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8185F10298DE for ; Thu, 12 Jul 2018 22:35:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D3B584624 for ; Thu, 12 Jul 2018 22:35:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 443CA2CEAB for ; Thu, 12 Jul 2018 22:35:01 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w6CMZ1I8047953 for ; Thu, 12 Jul 2018 22:35:01 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w6CMZ13C047952 for bugs@FreeBSD.org; Thu, 12 Jul 2018 22:35:01 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 229745] ahcich: CAM status: Command timeout Date: Thu, 12 Jul 2018 22:35:01 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: fbsd98816551@avksrv.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jul 2018 22:35:03 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D229745 Bug ID: 229745 Summary: ahcich: CAM status: Command timeout Product: Base System Version: 11.2-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: fbsd98816551@avksrv.org Hello! We have some Supermicro server based on X11SSH-F All servers were installed half year ago and works under Fbsd 11.1. All ser= ver have 4 HDD HGST HUS722T1TALA604 All of them works fine for this time with half year uptime. Recently servers were upgraded to Fbsd 11.2 (self build 11.2-STABLE r335679 with default make.conf src.conf and GENERIC) and after some time (all the time different, from 2 hours to 7 days) one or some disks started timeout: Jul 13 00:56:24 mrr32 kernel: ahcich2: Timeout on slot 17 port 0 Jul 13 00:56:24 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00060000 = rs 00060000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 20 ca 22 23 40 06 00 00 00 00 00 Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 00:56:24 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 00:58:16 srv32 kernel: ahcich2: Timeout on slot 26 port 0 Jul 13 00:58:16 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 04000000 = rs 04000000 tfd 40 serr 00000000 cmd 0004da17 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 e0 8a cc c6 40 18 00 00 00 00 00 Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 00:58:16 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:01:46 srv32 kernel: ahcich2: Timeout on slot 18 port 0 Jul 13 01:01:46 srv32 kernel: ahcich2: is 00000000 cs 00000000 ss 00040000 = rs 00040000 tfd 40 serr 00000000 cmd 0004d217 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 20 2a 2b 23 40 06 00 00 00 00 00 Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): CAM status: Command tim= eout Jul 13 01:01:46 srv32 kernel: (ada2:ahcich2:0:0:0): Retrying command Jul 13 01:07:12 srv32 kernel: ahcich0: Timeout on slot 23 port 0 Jul 13 01:07:12 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00800000 = rs 00800000 tfd 40 serr 00000000 cmd 0004d717 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 18 62 f5 c6 40 18 00 00 00 00 00 Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command tim= eout Jul 13 01:07:12 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command Jul 13 01:07:43 srv32 kernel: ahcich0: Timeout on slot 2 port 0 Jul 13 01:07:43 srv32 kernel: ahcich0: is 00000000 cs 00000000 ss 00000004 = rs 00000004 tfd 40 serr 00000000 cmd 0004c217 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB= : 61 10 62 12 7b 40 06 00 00 00 00 00 Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): CAM status: Command tim= eout Jul 13 01:07:43 srv32 kernel: (ada0:ahcich0:0:0:0): Retrying command reboot (/sbin/shutdown -r or /sbin/reboot) does not solve the problem, disks still timeout after boot. Only power off / power on solve problem for some time. and after while it generate timeount=20 Servers were updated to latest bios available on Supermicro. No changes. ahci0: port 0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xdf310000-0xdf311fff,0xdf31e000-0xdf31e0ff,0xdf31d000-0xdf31d7ff irq 16 at device 23.0 on pci0 ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 ahcich6: at channel 6 on ahci0 ahcich7: at channel 7 on ahci0 ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 ses0: SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ACS-3 ATA SATA 3.x device ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 953869MB (1953525168 512 byte sectors) ahci0@pci0:0:23:0: class=3D0x010601 card=3D0x088415d9 chip=3D0xa102808= 6 rev=3D0x31 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D 'Sunrise Point-H SATA controller [AHCI mode]' class =3D mass storage subclass =3D SATA We use zfs on all servers, some servers are raidz1, some raid-10, with same results We use to use smartd on all servers, I tried to disable smartd. Looks like = no changes. We already upgraded zpools to new features, it require remove features befo= re downgrade back to 11.1 --=20 You are receiving this mail because: You are the assignee for the bug.=