From owner-freebsd-bugs@FreeBSD.ORG Wed Apr 3 22:10:01 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A1D9C21E for ; Wed, 3 Apr 2013 22:10:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 93DCC266 for ; Wed, 3 Apr 2013 22:10:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r33MA15D070798 for ; Wed, 3 Apr 2013 22:10:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r33MA0qW070796; Wed, 3 Apr 2013 22:10:00 GMT (envelope-from gnats) Date: Wed, 3 Apr 2013 22:10:00 GMT Message-Id: <201304032210.r33MA0qW070796@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org Cc: From: Matthias Andree Subject: Re: kern/157397: [ada] ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Matthias Andree List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Apr 2013 22:10:01 -0000 The following reply was made to PR kern/157397; it has been noted by GNATS. From: Matthias Andree To: bug-followup@FreeBSD.org, Alexander Motin Cc: Subject: Re: kern/157397: [ada] ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability Date: Thu, 04 Apr 2013 00:08:12 +0200 Further information: - I have /usr (and only /usr) on the drive in question. # tunefs -p /dev/label/usr tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) enabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) enabled tunefs: gjournal: (-J) disabled tunefs: trim: (-t) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L) usr - I am running with kern.cam.ada.default_timeout=5 which makes the computer recover faster - write/read status for stalls is unclear to me, but the kernel only ever logs WRITE_FPDMA_QUEUED, so I guess the answer is "write". "rm -rf /usr/obj" or "log in to GNOME and try starting gnome-terminal" are sufficient to trigger it. - reducing the number of tags to 31 does not appear to help. Linux's libata does that only to distinguish the bit mask 0xffffffff it might get with 32 tags from "fatal errors". - disabling NCQ through "camcontrol negotiate ada1 -T disable" would appear to help, causing massive slowdown (as is expected; as I run with ata caches disabled), but requires further long-winded testing before I'd really confirm it helps # camcontrol identify ada1 pass1: ATA-7 SATA 2.x device pass1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-7 SATA 2.x device model SAMSUNG HD103SI firmware revision 1AG01118 serial number (elided) WWN (elided) cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 268435455 sectors LBA48 supported 1953525168 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 Feature Support Enabled Value Vendor read ahead yes yes write cache yes no flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management yes yes 254/0xFE automatic acoustic management yes no 0/0x00 254/0xFE media status notification no no power-up in Standby yes no write-read-verify no no unload no no free-fall no no data set management (TRIM) no # camcontrol tags ada1 -N31 (pass1:ahcich1:0:0:0): tagged openings now 31 (pass1:ahcich1:0:0:0): device openings: 31 Logs through "egrep ahcich1\|ada1\|pass1\|ahci0" available from , with Serial numbers removed. OBSERVE that this only ever affects odd-numbered slots, never even-numbered slots. Linux findings: - Linux uses 31 out of 32 slots so it can distinguish a fatal error from "all bits set in 32-bit bitmask", see: - Linux sources at for browsing; check ata_device_blacklist in libata-core.c -> no Samsung stuff. Regarding the ATI/AMD SB7x0 that I am using, it might be worthwhile investigating the AHCI_HFLAG_IGN_SERR_INTERNAL flag - it gets set by Linux on the SB700 that my computer is using, see ahci_error_intr() in libahci.h - I am not going to interpret that for lack of expertise.