Date: Wed, 25 Jun 1997 18:00:03 +1000 (EST) From: Mike McGaughey <mmcg@heraclitus.cs.monash.edu.au> To: FreeBSD-gnats-submit@FreeBSD.ORG Cc: mmcg@heraclitus.cs.monash.edu.au Subject: kern/3949: Erroneous wdc probe failure and possible fix Message-ID: <199706250800.SAA00488@mjolnir.cs.monash.edu.au> Resent-Message-ID: <199706250900.CAA11812@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 3949 >Category: kern >Synopsis: The WD controller probe can fail when it shouldn't (and a plausible fix) >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Jun 25 02:00:01 PDT 1997 >Last-Modified: >Originator: Mike McGaughey >Organization: Monash University >Release: FreeBSD 2.2.2-RELEASE i386 >Environment: 3 IDE disk drives, including one as master on wdc1. Diamond Data 16x CDrom, as slave on wdc1. Interesting `feature' of this drive - it seems to return an error condition (0x04 - I have no idea what this means) until it has been properly initialised by the ATAPI code. The same thing happens under DOS, as evidenced by the fact that the disk light behaves the same under both operating systems - it's on permanently until the CD driver is loaded (not a lot of evidence, but different to my other ATAPI drives). Here's the relevant parts of dmesg for my (now working) system: FreeBSD 2.2.2-RELEASE #5: Wed Jun 25 17:05:39 EST 1997 mmcg@mjolnir.cs.monash.edu.au:/usr/src/sys/compile/MJOLNIR CPU: Pentium (119.75-MHz 586-class CPU) [...] chip2 <Intel 82371FB IDE interface> rev 2 on pci0:7:1 [...] wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): <QUANTUM FIREBALL_TM2550A>, 32-bit, multi-block-16 wd0: 2445MB (5008752 sectors), 4969 cyls, 16 heads, 63 S/T, 512 B/S wdc0: unit 1 (wd1): <NEC Corporation DSE1700A>, 32-bit, multi-block-16 wd1: 1627MB (3332448 sectors), 3306 cyls, 16 heads, 63 S/T, 512 B/S [These next two messages were obtained by uncommenting the two printf's in sys/i386/isa/wd.c:wdprobe() - MMCG] WDC1 - Error : 81 WDC1 - Error (drv 1) : 4 wdc1 at 0x170-0x177 irq 15 on isa wdc1: unit 0 (wd2): <QUANTUM SIROCCO2550A> wd2: 2445MB (5008752 sectors), 4969 cyls, 16 heads, 63 S/T, 512 B/S wdc1: unit 1 (atapi): </P61A>, removable, dma, iordy wcd0: 1757Kb/sec, 120Kb cache, audio play, 255 volume levels, ejectable tray wcd0: no disc inside, unlocked >Description: src/sys/i386/isa/wd.c:wdprobe() attempts to determine whether a controller is present. According to the comments in the source, there are some controllers which return 0x81 (indicating drive 0 OK, drive 1 bad), but which will return `good' status for the second drive if it is probed directly. The code in wdprobe() attempts to get around this by directly probing the second drive if 0x81 is returned. Then, if the second drive returns an error status, it merrily assumes there is no device, and (for some bizarre reason) no controller. Thus, those of us running two devices off wdc1 cannot use either. Now, if you got an 0x81 status return in the first place, there probably *is* a controller present (and it was simply the probe for disk 1 that failed) - if there were no controller, surely the controller reset (earlier in the function) would have failed instead? >How-To-Repeat: Attach an IDE CDrom that is known not to work with FreeBSD as a slave drive on either controller. In sys/i386/isa/wd.c:wdprobe(), uncomment the two `error' print statements; compile and install a new kernel. Reboot. If you're lucky, when probing the controller with the ATAPI drive, you'll see something like: Error : 81 Error (drv 1) : 4 where the (drv 1) error is anything other than 0x81 or 0x01, and the *controller* probe will fail. >Fix: I'm not altogether clear why the direct probe for drive 1 on the controller is in there in the first place - as far as I can see, we are not looking for a drive, but rather, for a controller (do things fail if we have a controller with no attached drives?). In any case, the quick fix for me was to put #if 0/#endif around the test described above (and a comment). I've included enough trailing context here to patch it by hand; this is the last 30 lines of the modified wdprobe(): /* * If drive 1 fails, why do we simply go to nodevice here? Drive * 0 may have been OK, because of the return status of 0x8x (and * it not being due to an ATAPI slave), but the ATAPI itself * could have failed for any number of reasons. My Intel 82371FB * reports 0x81 before my ATAPI drive has been correctly * initialised (the ATAPI drive isn't initialised until the * ATAPI code probes it!). And, as * far as I'm concerned, getting a valid status return at * all (0x81) implies we had a controller... - MMCG */ #if 0 if(du->dk_error != 0x01 && du->dk_error != 0x81) goto nodevice; #endif } else /* drive 0 fail */ goto nodevice; } free(du, M_TEMP); return (IO_WDCSIZE); nodevice: free(du, M_TEMP); return (0); } >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199706250800.SAA00488>