From owner-freebsd-mobile Mon Oct 6 15:17:11 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id PAA03590 for mobile-outgoing; Mon, 6 Oct 1997 15:17:11 -0700 (PDT) (envelope-from owner-freebsd-mobile) Received: from pinot.eecs.harvard.edu (pinot.eecs.harvard.edu [140.247.60.65]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id PAA03563 for ; Mon, 6 Oct 1997 15:16:28 -0700 (PDT) (envelope-from karp@eecs.harvard.edu) Received: (from karp@localhost) by pinot.eecs.harvard.edu (8.6.12/8.6.12) id SAA21955 for freebsd-mobile@freebsd.org; Mon, 6 Oct 1997 18:16:18 -0400 Date: Mon, 6 Oct 1997 18:16:18 -0400 From: Brad Karp Message-Id: <199710062216.SAA21955@pinot.eecs.harvard.edu> To: freebsd-mobile@freebsd.org Subject: wd interrupt timeouts w/2.2.2, PAO, IBM 380 Sender: owner-freebsd-mobile@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I'm running 2.2.2-RELEASE with the latest PAO from makefile.org on a new IBM 380. I can find almost no mention of experience reports with the 380 on the web and in mailing list archives, probably because this particular model is so new. At any rate, I sporadically get messages like the following on the console: wd0: interrupt timeout: wd0: status 58 error 0 wd0: interrupt timeout: wd0: status 58 error 1 When these messages occur, the system hangs while it retries a disk operation. The retries frequently go on for up to two minutes. After a restart, I often go through long periods without these messages. But once they start, they occur quite frequently. What's more, even if I reboot, they continue to occur frequently after rebooting. While it sounds implausible, I find that leaving the laptop powered off (not sleeping, but fully off) for > 30 minutes returns it to a state where the timeout messages vanish for a while. I suspect an interaction between APM and the wd driver, naturally...it appears the wd driver is intolerant of disk spin-downs. At Poul Henning-Kemp's suggestion, I folded in the following code from 3.0-current into my 2.2.2 wd.c:wdcommand() : if (du->cfg_flags & WDOPT_SLEEPHACK) { /* OK, so the APM bios has put the disk into SLEEP mode, * how can we tell ? Uhm, we can't. There is no * standardized way of finding out, and the only way to * wake it up is to reset it. Bummer. * * All the many and varied versions of the IDE/ATA standard * explicitly tells us not to look at these registers if * the disk is in SLEEP mode. Well, too bad really, we * have to find out if it's in sleep mode before we can * avoid reading the registers. * * I have reason to belive that most disks will return * either 0xff or 0x00 in all but the status register * when in SLEEP mode, but I have yet to see one return * 0x00, so we don't check for that yet. * * The check for WDCS_BUSY is for the case where the * bios spins up the disk for us, but doesn't initialize * it correctly /phk */ if(inb(wdc + wd_precomp) + inb(wdc + wd_cyl_lo) + inb(wdc + wd_cyl_hi) + inb(wdc + wd_sdh) + inb(wdc + wd_sector) + inb(wdc + wd_seccnt) == 6 * 0xff) { if (bootverbose) printf("wd(%d,%d): disk aSLEEP\n", du->dk_ctrlr, du->dk_unit); wdunwedge(du); } else if(inb(wdc + wd_status) == WDCS_BUSY) { if (bootverbose) printf("wd(%d,%d): disk is BUSY\n", du->dk_ctrlr, du->dk_unit); wdunwedge(du); } } I also built a kernel with flag 0x4000 for my laptop's disk, so that SLEEPHACK is active (see below). I still get the timeouts and console messages with this code added, though. :-( Relevant lines from my kernel config file: options LAPTOP options APM_PCCARD_RESUME options PCIC_RESUME_RESET options "APM_NOSUSPEND_IMMEDIATE=3" controller wdc0 at isa? port "IO_WD1" bio irq 14 vector wdintr disk wd0 at wdc0 drive 0 flags 0x4000 device apm0 at isa? #options APM_BROKEN_STATCLOCK When I boot, I'm told the following about my disk and the wd driver: wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): , sleep-hack wd0: 1033MB (2116800 sectors), 2100 cyls, 16 heads, 63 S/T, 512 B/S So the kernel is definitely turning on the sleep-specific code. My question is: has anyone out there seen similar behavior on any model of laptop? If so, what did you do to correct it? The machine is more or less unusable when it goes away for minutes at a time in disk retries... Many thanks, -Brad, karp@eecs.harvard.edu