Date: Sat, 27 Jun 1998 11:54:50 +0930 From: Greg Lehey <grog@lemis.com> To: Donald Burr <dburr@POBoxes.com>, FreeBSD Hackers <hackers@FreeBSD.ORG> Subject: Re: odd problems with AMD K6 Message-ID: <19980627115450.O16259@freebie.lemis.com> In-Reply-To: <XFMail.980626105721.dburr@POBoxes.com>; from Donald Burr on Fri, Jun 26, 1998 at 10:57:21AM -0700 References: <XFMail.980626105721.dburr@POBoxes.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, 26 June 1998 at 10:57:21 -0700, Donald Burr wrote (to -hardware): After reading this message, I don't think any of these are hardware problems, so I'm following up to -hackers. > I have just updated my system to an AMD K6/233. Chip serial number is > "C 9818 FPJW". And, of course, my OS is FreeBSD 2.2.6-RELEASE. > > Some other details on my setup: > Mobo: EFA E5TX-AT-5 "Pegasus I" > Chipset: Intel 430TX (Triton); MTXC (82439TX)*1, PIIX4 (82371AB)* 1; I/O > chipset is ALI M5135 (Yes, I am using the IWill sio patches...) > OS: FreeBSD 2.2.6-RELEASE, with the IWill sio patches. > > I heard a while back about serious problems with the K6, so I searched the > mailing list archives. There were indeed problems, but AMD reportedly > fixed them as of chip revision "B 9729 xxxx". Since I have heard no new > problem reports since then, I am assuming my chip is one of the "good" > revisions. I think this is a safe assumption. Since the VM problems in early versions, I haven't heard of any problems, and my own K6/233 is running fine since I found an adequate cooling fan. Your dmesg output indicates that you have the same stepping as my chip. > Anyway, the system appears to be working fine and all, that is, for normal > usage. However, the other day I tried a "make world" on the 2.2.6-RELEASE > sources, and got an error. Which leads into... > > 1. Make world fails at *exactly* the same file, with *exactly* the same > error. > > I've run three consecutive 'make world's, and all of them fail, but they > fail at *exactly* the same file, with *exactly* the same error. Here is > the log from one such session: > > ===> share/doc/papers/memfs > indxbib -c > /usr/src/share/doc/papers/memfs/../../../../contrib/groff/indxbib/eign -o > ref.bib /usr/src/share/doc/papers/memfs/ref.bib > indxbib in free(): warning: page is already free. > indxbib in free(): warning: page is already free. > vgrind -f < /usr/src/share/doc/papers/memfs/A.t > A.gt > refer -n -e -l -s -p /usr/src/share/doc/papers/memfs/ref.bib > /usr/src/share/doc/papers/memfs/0.t /usr/src/share/doc/papers/memfs/1.t > A.gt > paper.t > Failed assertion at line 161, file > `/usr/src/gnu/usr.bin/groff/refer/../../../../contrib/groff/refer/token.cc' > . > Abort trap - core dumped > *** Error code 134 > > In all cases, the "refer" process dies with signal 6, according to the > system logs (dmesg). More to the point, it's voluntary. > Since the problem occurs at exactly the same spot, with exactly the same > error, I am leaning towards suspecting a problem on my installed system, > rather than a hardware problem (since hardware trouble generally produces > random, unpredictable errors). A good assumption, up to a point. If you built an executable with flaky hardware, and the executable is broken as a result, you can frequently get repeatable problems like this. In this case, it can also mean that the input file to refer is corrupted, which is what I think is meant by the line: A.gt > paper.t I'd take a look at /usr/src/share/doc/papers/memfs/A.t if I were you. It should have a size of 5077 bytes. > 2. Odd system crash -- once. > > When I said before that "the system appears to be working fine and all," > that was sort of a lie. The system did crash, *ONCE*. I have *NOT* been > able to reproeduce this crash, however. > > What happened is this: I was in X, doing a *LOT* of things simultaneously > (i.e. the system was heavily loaded) -- a bunch of usenet articles were > being spooled in, I was running a make world, encoding some mp3's, the > usual Netscape and email client, etc. Then I started up XV to view some > graphics that just came in. The system froze, and rebooted. The system > did dump core, however, and this is the result (using kgdb): > > IdlePTD 279000 > current pcb at 25325c > panic: vref used where vget required > #0 0xf0116c5e in boot () > (kgdb) bt > #0 0xf0116c5e in boot () > #1 0xf0116f4a in panic () > #2 0xf013cc07 in vref () > #3 0xf01043d0 in iso_iget () > #4 0xf010686a in cd9660_root () > #5 0xf013b6c0 in lookup () > #6 0xf013b04d in namei () > #7 0xf013fa04 in stat () > #8 0xf01f59a6 in syscall () > #9 0x2fc5 in ?? () > #10 0x107e in ?? () > > Again, I have not been able to reproduce this. I've run the system > ragged since then, and it hasn't crashed a single time. Doesn't look like hardware. There have been some software problems in this area. Do you still have the dump? > Now, last, but not least, I have a not-so-serious (but cosmetically > ugly) problem: > > 3. Dmesg output is slightly screwy. > > If I boot tje GENERIC kernel, the CPU type is properly detected and prints > out properly: > > CPU: AMD-K6tm w/ multimedia extensions (233.86-MHz 586-class CPU) > Origin = "AuthenticAMD" Id = 0x562 Stepping=2 > Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX> > > however, if I boot my own custom kernel, it shows something really odd. > > CPU: \^E (233.86-MHz 586-class CPU) > Origin = "AuthenticAMD" Id = 0x562 Stepping=2 > Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX> > > Note the odd-looking CPU name... Strange. > Perhaps I'm doing something slightly wrong in my config? A copy of it is > attached for your viewing pleasure. I can't think that this is something that could be influenced by the kernel config. I'd guess that you have some data corruption somewhere. Greg -- See complete headers for address and phone numbers finger grog@lemis.com for PGP public key To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980627115450.O16259>