From owner-freebsd-bugs@FreeBSD.ORG Sun May 29 13:50:09 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8E4F1065672 for ; Sun, 29 May 2011 13:50:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CA2C98FC16 for ; Sun, 29 May 2011 13:50:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4TDo9YB044515 for ; Sun, 29 May 2011 13:50:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4TDo9v4044514; Sun, 29 May 2011 13:50:09 GMT (envelope-from gnats) Resent-Date: Sun, 29 May 2011 13:50:09 GMT Resent-Message-Id: <201105291350.p4TDo9v4044514@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Matthias Andree Received: from apollo.emma.line.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 0EDB11065676; Sun, 29 May 2011 13:43:06 +0000 (UTC) (envelope-from mandree@FreeBSD.org) Received: from mandree by apollo.emma.line.org with local (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QQgGn-0006V6-55; Sun, 29 May 2011 15:43:05 +0200 Message-Id: Date: Sun, 29 May 2011 15:43:05 +0200 From: Matthias Andree To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: mav@FreeBSD.org Subject: kern/157397: ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Matthias Andree List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2011 13:50:10 -0000 >Number: 157397 >Category: kern >Synopsis: ahci/ada/cam NCQ timeouts on Samsung and non-disable-ability >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun May 29 13:50:09 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Matthias Andree >Release: FreeBSD 8.2-STABLE amd64 >Organization: >Environment: System: FreeBSD apollo.emma.line.org 8.2-STABLE FreeBSD 8.2-STABLE #67: Fri May 27 21:36:00 CEST 2011 toor@apollo.emma.line.org:/usr/obj/usr/src/sys/GENERIC amd64 >Description: (please disregard "class" field above) Greetings, I seem to have difficulties with the interaction between ahci(4), ada(4) and the CAM subsystem on RELENG_8. Perhaps my HDD is quirky (Samsung HD103SI rev. 1AG01118), perhaps ahci or ada. (dmesg below) I seem to have two problems: 1. the problem is that once in a while queued ATA commands time out, and let the computer freeze for like half a minute or two. It then recovers. This only ever affects the Samsung drive, I've never seen it on the Western Digital. Either drive doesn't report any suspicious data to S.M.A.R.T. - no pending blocks, no past reallocations, no parameters even close to Thresh, no self-test failures. Possibly the Samsung also has firmware quirks, silent limits and so on, I know there were firmware updates to avoid data loss for larger siblings in the same drive series, but apparently not for the HD103SI. 2. Trying to reduce the number of tagged openings with camcontrol tags ada1 -N 16 doesn't work because min == max; trying to disable tagged negotiation through "camcontrol negotiate ada1 -T disable" does not have any visible effect. This is my /boot/loader.conf: ahci_load="YES" atapicam_load="YES" cuse4bsd_load="YES" geom_journal_load="YES" sem_load="YES" snd_hda_load="YES" vboxdrv_load="YES" hw.ata.wc=0 kern.cam.ada.write_cache=0 kern.ipc.shmall=32768 kern.ipc.shmmax=67108864 kern.maxfiles="25000" vfs.root.mountfrom="ufs:/dev/ada0s4a" This is dmesg with sensitive and useless information omitted. At the end, it shows two episodes with freezes, of a handful of timeouts each time. ... FreeBSD 8.2-STABLE #67: Fri May 27 21:36:00 CEST 2011 toor@apollo.emma.line.org:/usr/obj/usr/src/sys/GENERIC amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Phenom(tm) II X4 905e Processor (2508.56-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x100f42 Family = 10 Model = 4 Stepping = 2 Features=0x178bfbff Features2=0x802009 AMD Features=0xee500800 AMD Features2=0x37ff TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 3838828544 (3660 MB) ACPI APIC Table: <052010 APIC1732> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ACPI Warning: Optional field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20101013/tbfadt-655) ioapic0 irqs 0-23 on motherboard Cuse4BSD v0.1.14 @ /dev/cuse kbd1 at kbdmux0 acpi0: <052010 XSDT1732> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of fee00000, 1000 (3) failed acpi0: reservation of ffb80000, 80000 (3) failed acpi0: reservation of fec10000, 20 (3) failed acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, cfe00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0xc000-0xc0ff mem 0xd0000000-0xdfffffff,0xfbde0000-0xfbdeffff,0xfbc00000-0xfbcfffff irq 18 at device 5.0 on pci1 hdac0: mem 0xfbdfc000-0xfbdfffff irq 19 at device 5.1 on pci1 hdac0: HDA Driver Revision: 20100226_0142 hdac0: [ITHREAD] pcib2: irq 18 at device 6.0 on pci0 pci2: on pcib2 ale0: port 0xdc00-0xdc7f mem 0xfbec0000-0xfbefffff irq 18 at device 0.0 on pci2 ale0: 960 Tx FIFO, 1024 Rx FIFO ale0: Using 1 MSI messages. ale0: 4GB boundary crossed, switching to 32bit DMA addressing mode. miibus0: on ale0 atphy0: PHY 0 on miibus0 atphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto ale0: Ethernet address: ... ale0: [FILTER] pcib3: irq 19 at device 7.0 on pci0 pci3: on pcib3 fwohci0: <1394 Open Host Controller Interface> port 0xe800-0xe8ff mem 0xfbfff800-0xfbffffff irq 19 at device 0.0 on pci3 fwohci0: [ITHREAD] fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 ... fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: on fwohci0 fwe0: on firewire0 if_fwe0: Fake Ethernet address: ... fwe0: Ethernet address: ... fwip0: on firewire0 fwip0: Firewire address: ... dcons_crom0: on firewire0 dcons_crom0: bus_addr 0x1028000 fwohci0: Initiate bus reset fwohci0: fwohci_intr_core: BUS reset fwohci0: fwohci_intr_core: node_id=0x00000000, SelfID Count=1, CYCLEMASTER mode ahci0: port 0xb000-0xb007,0xa000-0xa003,0x9000-0x9007,0x8000-0x8003,0x7000-0x700f mem 0xfbbffc00-0xfbbfffff irq 22 at device 17.0 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.10 with 6 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ahcich1: at channel 1 on ahci0 ahcich1: [ITHREAD] ahcich2: at channel 2 on ahci0 ahcich2: [ITHREAD] ahcich3: at channel 3 on ahci0 ahcich3: [ITHREAD] ahcich4: at channel 4 on ahci0 ahcich4: [ITHREAD] ahcich5: at channel 5 on ahci0 ahcich5: [ITHREAD] ohci0: mem 0xfbbfd000-0xfbbfdfff irq 16 at device 18.0 on pci0 ohci0: [ITHREAD] usbus0: on ohci0 ohci1: mem 0xfbbfe000-0xfbbfefff irq 16 at device 18.1 on pci0 ohci1: [ITHREAD] usbus1: on ohci1 ehci0: mem 0xfbbff800-0xfbbff8ff irq 17 at device 18.2 on pci0 ehci0: [ITHREAD] usbus2: EHCI version 1.0 usbus2: on ehci0 ohci2: mem 0xfbbfb000-0xfbbfbfff irq 18 at device 19.0 on pci0 ohci2: [ITHREAD] usbus3: on ohci2 ohci3: mem 0xfbbfc000-0xfbbfcfff irq 18 at device 19.1 on pci0 ohci3: [ITHREAD] usbus4: on ohci3 ehci1: mem 0xfbbff400-0xfbbff4ff irq 19 at device 19.2 on pci0 ehci1: [ITHREAD] usbus5: EHCI version 1.0 usbus5: on ehci1 pci0: at device 20.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] hdac1: mem 0xfbbf4000-0xfbbf7fff irq 16 at device 20.2 on pci0 hdac1: HDA Driver Revision: 20100226_0142 hdac1: [ITHREAD] isab0: at device 20.3 on pci0 isa0: on isab0 pcib4: at device 20.4 on pci0 pci4: on pcib4 ohci4: mem 0xfbbfa000-0xfbbfafff irq 18 at device 20.5 on pci0 ohci4: [ITHREAD] usbus6: on ohci4 acpi_button0: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 acpi_hpet1: iomem 0xfed00000-0xfed003ff on acpi0 device_attach: acpi_hpet1 attach returned 12 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] ppc0: cannot reserve I/O port range acpi_throttle0: on cpu0 hwpstate0: on cpu0 Timecounters tick every 1.000 msec vboxdrv: fAsync=0 offMin=0x15a offMax=0x29f firewire0: 1 nodes, maxhop <= 0 cable IRM irm(0) (me) firewire0: bus manager 0 hdac0: HDA Codec #0: ATI RS690/780 HDMI pcm0: at cad 0 nid 1 on hdac0 hdac1: HDA Codec #0: VIA VT1708S_0 pcm1: at cad 0 nid 1 on hdac1 pcm2: at cad 0 nid 1 on hdac1 pcm3: at cad 0 nid 1 on hdac1 usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 480Mbps High Speed USB v2.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 480Mbps High Speed USB v2.0 usbus6: 12Mbps Full Speed USB v1.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ugen6.1: at usbus6 uhub6: on usbus6 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors: 16H 63S/T 16383C) ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada1: ATA-7 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) cd0 at ahcich3 bus 0 scbus3 target 0 lun 0 cd0: <...> Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! GEOM: ada0: partition 2 does not start on a track boundary. GEOM: ada0: partition 2 does not end on a track boundary. GEOM: ada0: partition 1 does not start on a track boundary. GEOM: ada0: partition 1 does not end on a track boundary. GEOM: ada1: partition 2 does not start on a track boundary. GEOM: ada1: partition 2 does not end on a track boundary. uhub6: 2 ports with 2 removable, self powered uhub0: 3 ports with 3 removable, self powered uhub1: 3 ports with 3 removable, self powered uhub3: 3 ports with 3 removable, self powered uhub4: 3 ports with 3 removable, self powered GEOM_JOURNAL: Journal 3658629072: ada1s1d contains data. GEOM_JOURNAL: Journal 3658629072: ada1s1e contains journal. GEOM_JOURNAL: Journal ada1s1d clean. Root mount waiting for: GJOURNAL usbus5 usbus2 GEOM: ada1s1d.journal: invalid disklabel. GEOM: ufsid/4d540ab0e4fef3cb: invalid disklabel. GEOM: ufs/usrgjournal: invalid disklabel. Root mount waiting for: usbus5 usbus2 Root mount waiting for: usbus5 usbus2 uhub2: 6 ports with 6 removable, self powered uhub5: 6 ports with 6 removable, self powered ugen5.2: at usbus5 umass0: on usbus5 umass0: SCSI over Bulk-Only; quirks = 0x0000 Root mount waiting for: usbus5 usbus2 ugen0.2: at usbus0 ukbd0: on usbus0 ugen2.2: at usbus2 uhub7: on usbus2 kbd2 at ukbd0 uhid0: on usbus0 umass0:6:0:-1: Attached to scbus6 uhub7: 4 ports with 4 removable, self powered (probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:0): SCSI status: Check Condition (probe0:umass-sim0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present) da0 at umass-sim0 bus 0 scbus6 target 0 lun 0 da0: Removable Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present ugen2.3: at usbus2 ulpt0: on usbus2 ulpt0: using bi-directional mode Root mount waiting for: usbus2 (probe0:umass-sim0:0:0:1): TEST UNIT READY. CDB: 0 20 0 0 0 0 (probe0:umass-sim0:0:0:1): CAM status: SCSI Status Error (probe0:umass-sim0:0:0:1): SCSI status: Check Condition (probe0:umass-sim0:0:0:1): SCSI sense: NOT READY asc:3a,0 (Medium not present) da1 at umass-sim0 bus 0 scbus6 target 0 lun 1 da1: Removable Direct Access SCSI-0 device da1: 40.000MB/s transfers da1: Attempt to query device size failed: NOT READY, Medium not present ugen2.4: ... ugen2.5: ... ulpt1: ... ulpt1: using bi-directional mode Trying to mount root from ufs:/dev/ada0s4a ugen0.3: ... ums0: ... ums0: 8 buttons and [XYZT] coordinates ID=0 cryptosoft0: on motherboard GEOM_ELI: Device ada0s4b.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: software vboxnet0: Ethernet address: ... ale0: link state changed to UP fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.8 drm0: on vgapci0 info: [drm] MSI enabled 1 message(s) info: [drm] Initialized radeon 1.31.0 20080613 info: [drm] Setting GART location based on new memory map info: [drm] Loading RS780/RS880 Microcode info: [drm] Resetting GPU info: [drm] writeback test succeeded in 1 usecs drm0: [ITHREAD] info: [drm] Resetting GPU First failure episode: ahcich2: Timeout on slot 28 ahcich2: is 00000000 cs 8fffffff ss ffffffff rs ffffffff tfd c0 serr 00000000 ahcich2: Timeout on slot 26 ahcich2: is 00000000 cs e3ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000 ahcich2: Timeout on slot 24 ahcich2: is 00000000 cs f8ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000 Computer has recovered. Several minutes later a second episode: ahcich2: Timeout on slot 24 ahcich2: is 00000000 cs f80000ff ss ff0000ff rs ff0000ff tfd c0 serr 00000000 ahcich2: Timeout on slot 23 ahcich2: is 00000000 cs fc00003f ss ff80003f rs ff80003f tfd c0 serr 00000000 ahcich2: Timeout on slot 19 ahcich2: is 00000000 cs ffc00007 ss fff80007 rs fff80007 tfd c0 serr 00000000 ahcich2: Timeout on slot 18 ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000 ahcich2: Timeout on slot 18 ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000 ahcich2: Timeout on slot 18 ahcich2: is 00000000 cs ffe00007 ss fffc0007 rs fffc0007 tfd c0 serr 00000000 ahcich2: Timeout on slot 18 ahcich2: is 00000000 cs ffe0000f ss fffc000f rs fffc000f tfd c0 serr 00000000 ahcich2: Timeout on slot 25 ahcich2: is 00000000 cs f1ffffff ss ffffffff rs ffffffff tfd c0 serr 00000000 >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: