From owner-freebsd-hackers@freebsd.org Thu Mar 2 05:17:02 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 201A7CF4454 for ; Thu, 2 Mar 2017 05:17:02 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com [IPv6:2a00:1450:400c:c09::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A2B0E99E for ; Thu, 2 Mar 2017 05:17:01 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-wm0-x22b.google.com with SMTP id n11so15089889wma.1 for ; Wed, 01 Mar 2017 21:17:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=l9o0TLWsLkiOszZcYl8zezc3/QhjD6lob0LRyuaYc3E=; b=nbdMT/jiDlJdNgqd3XXvAq59f1xi5e9qofgP1eX4Cb8UmDKIJI1oMfscVfCBWoy2Hx bjMN1f2QgyaxzZMkTZ+6vgoweH5dGKWIMU75Vme1yGmi7yJVn7LhB1dbmdFRHYqzGSUr /LCGVD3b1TVoQ8qcgzhkCSdw6c1vTEC5MiMQhGwZTGDUjmLiv3FzIb4hOlXXms9s3Erm gdMiOUVNYhcOTEcsIlLvOuxBcW91t5qbifHhS+xtB2ALepF4tU3s+JGA7mM4pfv2vUcy NrSrzGyWAOQLEJvOwFEZUC4tjz/l7HWyeJIcd9jD9WO52sEk0iL0X6KtvEE2huw2Ol9E BMYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=l9o0TLWsLkiOszZcYl8zezc3/QhjD6lob0LRyuaYc3E=; b=sVbFfezLyhEOWKtn0OBF2XKr+My1fEOMfLv99t1CfkHCu3ARbJp5auJIC9vz6TeSOq /KlG1fC5kFpGCVbrQvn2Rg9HPAHh0TgJ/IfMZDGIAXjEkULs0oobGul4g5KLl4FGmnzd H4F8kTQxYzoAJ6B25QfnlBEbp/F9BxBJsWJOeQT/Dn/AXuQ/F9bZRpYvo4aM2z3K7ZmF Rqp9CLa8cvbo5P+wD5VgQ4Q+ucavpdb6Txt5YVYaXtBG3GLuLe6XD/rP3gfgnIiA7Sl9 YDT173jKLaDpsw2IeWfhEykTO/czSKJgvFiiQDIG88mtjus+aUuMrcnoakvU4mT0YGPy rV8w== X-Gm-Message-State: AMke39m7P/RNipC+OOPA/MJW5HgSABDwNn7pJEy8X1Y0GTdFCUXkqt+EHkJUsXMbDHOGOqK3dPCdFoagkvDzRg== X-Received: by 10.28.10.70 with SMTP id 67mr6539338wmk.76.1488431819008; Wed, 01 Mar 2017 21:16:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.80.138.143 with HTTP; Wed, 1 Mar 2017 21:16:58 -0800 (PST) From: Zaphod Beeblebrox Date: Thu, 2 Mar 2017 00:16:58 -0500 Message-ID: Subject: Disk controller heizenbug. To: FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Mar 2017 05:17:02 -0000 I have a disk controller. I works in a modern AMD motherboard at home (9590 processor), but when connected to a sunfire 4140 (opteron 2345 based machine vintage 2008-ish) the disks spontaneously detach by just doing a "zfs import" The board has it's own mounting for the flash disks (two of them) and probes as: ahci0: port 0x8c00-0x8c07,0x8880-0x8883,0x8800-0x8807,0x8480-0x8483,0x8400-0x841f mem 0xdfbff800-0xdfbfffff irq 16 at device 0.0 numa-domain 0 on pci3 The disks show up as: ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 ada0: ACS-2 ATA SATA 3.x device ada0: Serial Number S248NXAH112465B ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada0: Command Queueing enabled ada0: 238475MB (488397168 512 byte sectors) ada0: quirks=0x3<4K,NCQ_TRIM_BROKEN> Under heavy bonnie++, they work in the AMD 9590 system. On the opteron machine, the following occurs: ahcich1: Timeout on slot 11 port 0 ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr ffffffff cmd ffffffff (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: Command timeout (ada1:ahcich1:0:0:0): Retrying command ahcich1: stopping AHCI engine failed ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0 Timeout on slot 31 port 0 ada1: ahcich0: is ffffffff cs ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff s/n S248NXAH112471L detached (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: stopping AHCI engine failed ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 ada0: s/n S248NXAH112465B detached [2:43:343]root@yak:/usr/ports/net-mgmt/net-snmp> less /var/run/dmesg.boot [2:44:344]root@yak:/usr/ports/net-mgmt/net-snmp> dmesg pid 78200 (httpd), uid 80: exited on signal 11 ahcich1: Timeout on slot 11 port 0 ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr ffffffff cmd ffffffff (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 00 00 00 (ada1:ahcich1:0:0:0): CAM status: Command timeout (ada1:ahcich1:0:0:0): Retrying command ahcich1: stopping AHCI engine failed ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0 Timeout on slot 31 port 0 ada1: ahcich0: is ffffffff cs ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff s/n S248NXAH112471L detached (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: Command timeout (ada0:ahcich0:0:0:0): Retrying command ahcich0: stopping AHCI engine failed ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 ada0: s/n S248NXAH112465B detached I'm posting here to hackers because this seems to violate layers --- on the AMD machine ... it runs fine... even under load. The SATA bus is local to the card (and so travels with it to the server), yet the error looks like a SATA BUS or drive error. What gives?