From owner-freebsd-scsi@freebsd.org Fri Jun 30 19:17:56 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E494D98B93 for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 059467994B for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: by mailman.ysv.freebsd.org (Postfix) id 04FA6D98B91; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) Delivered-To: scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 049C6D98B90 for ; Fri, 30 Jun 2017 19:17:56 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AA87379948; Fri, 30 Jun 2017 19:17:55 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v5UJHq88050537; Fri, 30 Jun 2017 21:17:52 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id C736F45B; Fri, 30 Jun 2017 21:17:51 +0200 (CEST) Message-ID: <5956A3DF.8060109@omnilan.de> Date: Fri, 30 Jun 2017 21:17:51 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Alexander Motin , scsi@freebsd.org Subject: bhyve ahcich0: Timeout on slot 0 port 0, , regression with stable/11->releng/11.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Fri, 30 Jun 2017 21:17:52 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 19:17:56 -0000 Hello, on releng/11.1 I noticed a severe performace degradation during file unlinking in a FreeBSD guest. Host was running quiet recent stable/11 before. On the host, the vm is started with ahci,hd:/dev/adaN The guest attaches: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ACS-2 ATA SATA 3.x device The geust has very high Sys-load during unlinking (50-75%@2 cores). Also, the host logs these errors: ahcich0: Timeout on slot 0 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7ffe ahcich0: Timeout on slot 10 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7bfe ahcich0: Timeout on slot 14 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb3bfe ahcich0: Timeout on slot 17 port 0 … ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00018000 ahcich0: Timeout on slot 15 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00010000 ahcich0: Timeout on slot 16 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 ff 44 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … And so on. I always saw performance penalty using ahci instead of virtio-blk, most likely due to TRIM support, but never noticed such a huge difference: obj-tree deleting takes <1min with virtio-blk and usually took about 8 minutes with ahci on stable/11. Now (releng/11.1) it takes >20min (not yet finished) and I get really lots of these errors. Can someone (mav?) interpret the command errors and tell if it could be a new problem due to recent MFCs? Will bisect stable/11 revisions to see where it starts if nobody has a quick idea about the cause. Thanks, -harry