Date: 11 Aug 2002 17:44:50 -0400 From: Jim Frost <jimf@frostbytes.com> To: freebsd-stable@freebsd.org Subject: FreeBSD 4.6 rl0 and xl0 watchdog timeout problems (and solution) Message-ID: <1029102290.9472.188.camel@snowball.frostbytes.com>
next in thread | raw e-mail | index | archive | help
I'm posting this mostly so that someone who runs into the same problem can perhaps find the message. Short version: Try a different slot. For some reason BSD doesn't see interrupts in slots that share PCI interrupts. Long version: I recently bought a new PC to use as a server to replace an aging Cobalt Qube2. The Qube was a great little box ... or it was until the last security update which broke something in the kernel such that the box hangs regularly. Sun's "support", even their paid "support", has a couple of workarounds that reduce the frequency but they are clearly not interested in fixing it, and the whole reason I bought the box was so that I could manage it with point-and-click; I don't really feel like tracking down sources and building my own kernels anymore. Anyway, one of the reasons for using the Qube2 is that it's not Windows and it's not Intel so almost nobody's attack scripts will work even if the machine has a hole I haven't patched yet on a service that the firewall and machine configuration exposes. Not wanting to spend the money on a newer Cobalt box given the crappy support I got with the one I have, I decided to give in and run an Intel box again. No way was I going to run Windows on an exposed box, and I'd prefer not to run Red Hat (as I do on my laptop) because it's the first target of the script kiddies. BSD seemed like a good solution and one which I'm fairly familiar with from days past. Besides, my pro-BSD buddies raved about how fast and stable it is. So I bought some fairly generic PC from a local supplier: an MSI board of some sort with a hunk o' RAM and disk, a 1.6GHz P4, and a DLink DFE-530TX+ ethercard. Nothing special these days, but not junk either, and way more capability than I really need on my home server (hey, that Qube2 was working just fine until Sun broke it). The local PC company couldn't guarantee the system would run FreeBSD but they burned it in with WinXP so at least I knew the parts worked, and the net tells me that all this stuff should work on BSD. Besides, at this point the UNIXen have pretty much got the PC hardware figured out, right? I ordered up a FreeBSD 4.6 subscription from bsdmall and got to installing it. First impression: That installer sucks ass. I mean, sucks like the stuff we used to get from Sun in the 3.x days. Sucks worse than SysVR3 did. Sucks sucks sucks. Never mind that the X11 configuration hung and I had to give up on that and rerun the install and skip it (Red Hat has got that /nailed/ at this point), the thing that really pisses me off is that I just wanted it to install everything on the disk. What the hell, the disk space is cheap and I am not sure what I'm going to want. So far as I can tell there's no way to do that, so I had to check off like a thousand packages one at a time. That SUCKS. Primitive, irritating, and gawdawful easy to fix. Wassup with that? Not an auspicious start, but I still managed to get the whole install done in about half the time of any Windows product I've installed in the last seven or eight years, so it's not /that/ big a deal. It just looks way lame relative to any Linux release we've seen since like 1997. Anyway, I fired it up and got "rl0: watchdog timeout" errors. Shit. I've seen those before from waaaay back when SunOS was my favorite system, and it meant that the ethernet cable fell out. The man page for the rl driver says that that's probably what it is. Problem is, the cabling checks out: it was showing good connection lights on both ends. Just to be sure I pulled known-good cabling from other stuff. Still no go. I thought maybe the thing was incorrectly sensing the media; I still run 10baseT because it's here and it works and I don't see why I should spend money on a new hub. ifconfig said it autoconfigured to 10baseT/UTP but just to be sure I forced the config. Same problem. Ok, I've used the various UNIXen enough to know that they're often sensitive to card firmware versions; maybe the 530TX+ has new firmware that screwed it up. So I picked up a 3c905 card and threw it in. Same problem. That didn't leave much. At this point I figured it's an interrupt problem of some sort and started looking at the PCI configuration in the BIOS. I remember something about NT et al needing something or other disabled to work on new motherboards and figure that maybe the PC vendor set that up, but I don't see anything out of the ordinary. But while I was in there I noticed that four slots share two interrupt configurations: Slots 1 and 3 share one, and slots 2 and 6 share another. Hmm. The ethercard is in slot 3, one of the shared slots. On a hunch I move the ethercard to slot 4 and reboot. Voila, works like a champ. I'd be interested in an explanation if someone has one, and if nobody does then I'd be willing to help track down some details to fix it so some other poor schmuck doesn't waste a lot of time tracking it down. So far this has been way more effort than it should have been and I haven't even gotten to configuring the services I need. The only reason I didn't just dump it in favor of RH7.3 was that my discs are at work right now. But, now that it's working, I'm going to proceed and hope for the best. Hell, if nothing else it boots a lot faster than Linux. jim frost jimf@frostbytes.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1029102290.9472.188.camel>