From owner-freebsd-stable Sat Aug 31 2:30:50 2002 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C218337B400 for ; Sat, 31 Aug 2002 02:30:41 -0700 (PDT) Received: from ogyo.bugsgrief.net (mail.bugsgrief.net [210.164.96.147]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5E13443E3B for ; Sat, 31 Aug 2002 02:30:40 -0700 (PDT) (envelope-from horio@bugsgrief.net) Received: from gorgon.near.this (gorgon.near.this [10.0.3.12]) by ogyo.bugsgrief.net (8.11.6/8.11.6) with ESMTP id g7V9UcK57531 for ; Sat, 31 Aug 2002 18:30:38 +0900 (JST) (envelope-from horio@bugsgrief.net) Message-Id: <200208310930.g7V9UcK57531@ogyo.bugsgrief.net> Date: Sat, 31 Aug 2002 18:11:00 +0900 (JST) From: BugsGrief@bugsgrief.net To: freebsd-stable@freebsd.org Subject: ata problem(s) X-Received: (from horio@localhost) by byte.near.this (8.10.1/8.10.1) id g7V9B0120302; Sat, 31 Aug 2002 18:11:00 +0900 (JST) X-Received: from byte.near.this (bytenet.near.this [10.0.3.1]) by gorgon.near.this (8.11.6/8.11.6) with ESMTP id g7V9B1O26488 for ; Sat, 31 Aug 2002 18:11:01 +0900 (JST) (envelope-from horio@near.this) X-Message-Id: <200208310911.g7V9B0120302@byte.near.this> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have been using following IBM ata disk since late 4.4 stable. 1. 4.4-stable % dmesg|grep ata atapci0: port 0xffa0-0xffaf at device 13.1 on pci0 atapci0: Busmastering DMA not supported ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ad0: 58644MB [119150/16/63] at ata0-master BIOSPIO (full dmesg attached) The disk occasionally caused read error (per one or two days), sometimes causing victim process segfault. But the system itself has never freezed or crashed, after swap space on the disk is removed. The tag value had no relevance to the error. 2. 4.6.2-release with tag=0 After moving to 4.6.2, a read error has become very serious. It prints reset message and the system freezes. login: ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done On the other hand, the system stands for write errors although the recovery took at leaset 2 x 3 retries (not sure if a series of retry attempts is related to single write event, but the errors happened when I peeked the progress of copy with du into the disk), as far as I tested. login: ad0: WRITE command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: WRITE command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: timeout waiting for DRQ - resetting ata0: resetting devices .. done ad0: WRITE command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done 3. 4.6.2-release with tag=1 This is more stable than tag=0. But freezes do occur. The messages always claim tag=0. login: ad0: READ command timeout tag=0 serv=0 - resetting ata0: resetting devices .. done 4. Non-technical observations. o READ timeout never recovers. o In a conservative speak, tag=1 is slightly better than 4.4 wrt. stability. o When tag=1, READ command timeout says tag=0. But at least on the way observable, tag=1. o Besides heavy load, "abruptness" is very much hated. For example, a freeze occurred when man atacontrol is typed on it which has been quiescent for a while, saying Formatting page, please wait...Done. Similar freezes are experienced with ls, sysctl, grep, top and reboot (some of them might be wrong, since initially I was careless about the relationships of freezes and command inputs). o 40/80 pin cables have no difference. o 'di apm0' causes almost immediate hang at the first login, while 'en apm0' is much better, but giving neither is the best. horio shoichi http://http.bugsgrief.net/ ---D-M-E-S-G---------------------------------------------------- Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.6.2-RELEASE #1: Thu Aug 20 21:32:03 JST 2002 horio@ghost.near.this:/usr/obj/usr/src/sys/GHOST Timecounter "i8254" frequency 1193182 Hz CPU: Pentium Pro (199.43-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x617 Stepping = 7 Features=0xfbff real memory = 67108864 (65536K bytes) config> q avail memory = 61571072 (60128K bytes) Changing APIC ID for IO APIC #0 from 0 to 2 on chip Programming 24 pins in IOAPIC #0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc03b8000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc03b809c. Pentium Pro MTRR support enabled apm0: on motherboard apm: found APM BIOS v1.2, connected at v1.2 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #0 intpin 16 -> irq 2 IOAPIC #0 intpin 17 -> irq 16 pci0: on pcib0 isab0: at device 13.0 on pci0 isa0: on isab0 atapci0: port 0xffa0-0xffaf at device 13.1 on pci0 atapci0: Busmastering DMA not supported ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pcib1: at device 14.0 on pci0 IOAPIC #0 intpin 18 -> irq 17 pci1: on pcib1 rl0: port 0xec00-0xecff mem 0xfcfffc00-0xfcfffcff irq 17 at device 10.0 on pci1 rl0: Ethernet address: 00:40:95:20:19:04 miibus0: on rl0 rlphy0: on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: at 16.0 irq 2 ahc0: port 0xd800-0xd8ff mem 0xfe810000-0xfe810fff irq 16 at device 17.0 on pci0 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs orm0: