Date: Fri, 30 Jun 2017 21:17:51 +0200 From: Harry Schmalzbauer <freebsd@omnilan.de> To: Alexander Motin <mav@freebsd.org>, scsi@freebsd.org Subject: bhyve ahcich0: Timeout on slot 0 port 0, , regression with stable/11->releng/11.1 Message-ID: <5956A3DF.8060109@omnilan.de>
next in thread | raw e-mail | index | archive | help
Hello, on releng/11.1 I noticed a severe performace degradation during file unlinking in a FreeBSD guest. Host was running quiet recent stable/11 before. On the host, the vm is started with ahci,hd:/dev/adaN The guest attaches: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <BHYVE SATA DISK 001> ACS-2 ATA SATA 3.x device The geust has very high Sys-load during unlinking (50-75%@2 cores). Also, the host logs these errors: ahcich0: Timeout on slot 0 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7ffe ahcich0: Timeout on slot 10 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb7bfe ahcich0: Timeout on slot 14 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffb7fff tfd 50 serr 00000000 cmd 0001cd17 ahcich0: ... waiting for slots fffb3bfe ahcich0: Timeout on slot 17 port 0 … ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00018000 ahcich0: Timeout on slot 15 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 ahcich0: ... waiting for slots 00010000 ahcich0: Timeout on slot 16 port 0 ahcich0: is 00000008 cs 00000000 ss 00000000 rs fffbffff tfd 50 serr 00000000 cmd 0001c617 (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 e8 30 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 ff 44 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 00 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 40 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 80 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 c0 00 45 40 04 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command … And so on. I always saw performance penalty using ahci instead of virtio-blk, most likely due to TRIM support, but never noticed such a huge difference: obj-tree deleting takes <1min with virtio-blk and usually took about 8 minutes with ahci on stable/11. Now (releng/11.1) it takes >20min (not yet finished) and I get really lots of these errors. Can someone (mav?) interpret the command errors and tell if it could be a new problem due to recent MFCs? Will bisect stable/11 revisions to see where it starts if nobody has a quick idea about the cause. Thanks, -harry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5956A3DF.8060109>