Date: Wed, 28 Jun 95 19:45 CDT From: uhclem%nemesis@fw.ast.com (Frank Durda IV) To: hackers@freebsd.org Subject: 2.0.5 caches, lockups and crashes - oh my! Message-ID: <m0sR7j9-0004vyC@nemesis.lonestar.org>
next in thread | raw e-mail | index | archive | help
[0]On Sun, 25 Jun 1995, Jan Isley wrote: [0]The first try was just installing bin. After printing xxxx blocks [0]on debug window I got Fatal trap 12: page fault while in kernel mode. [1]Somebody wrote: [1]Sounds like hardware. Try disabling the cache. [2]Somebody wrote: [2]What is the problem with FreeBSD and caches - I could not install, and [2]cannot rebuild a kernel with my cache turned on ! Turning it off is a [2]pretty easy fix - but I also take a pretty big performance hit. [2]What does it mean if your cache is causing these problems ? [3]From: "Rodney W. Grimes" <rgrimes@gndrsh.aac.dev.com> [3]Date: Wed, 28 Jun 1995 01:40:53 -0700 (PDT) [3]It means more than likely one of two things, either you have a marginal [3]cache SRAM that when pounded on as hard as FreeBSD pounds on a cache [3]it fails and corrupts data, or you have a bus master DMA cache coherency [3]problem that fails to invalidate data in the cache. Well, I have support for over ten different systems that all loved FreeBSD 1.1.5.1, loved all the SNAPs, and simply will no run worth a hoot on 2.0.5A or 2.0.5R. I plan to try the post-2.0.5R SNAPs shortly, but am not optimistic. The original symptom (that was reportedly in these lists during 2.0.5A testing) was that these systems (all of them made by different makers but all 486 systems of various speeds and different cache designs and makers) would ALL refuse to boot from the boot floppy if the cache was turned on. But if you turn the cache off (or remove it in the case of Intel 485-T Turbocache modules), you can boot 2.0.5 and even install successfully. A considerable amount of reinstalling was done and various things were tried and all we learned was that it was something to do with the fact that the kernel was compressed that caused the boot to nuke. The uncompressed kernel.MFS could be booted (from hard disk) without incident. Well, nothing really got done and 2.0.5R went out as it was. So when I install it on these Tandy, GRiD, DEC and AST systems, I have to shut the cache off before the boot from floppy will work. Now remember that all of these systems worked fine on earlier versions of FreeBSD. Some systems ran 1.1.5.1 and early SNAPs 24 hours a day for several months with nothing more than the once every other week crash (usually a power failure). And this was with the cache enabled (and/or installed). Once installed, you could put the cache back in or turn it on and the systems would boot and run from the HD OK. Or so I thought at the time. Then I and the other users got around to actually doing more with 2.0.5R than just installing the system over and over again (and driving from location to location to do this) and we found that although 2.0.5 would boot and run with the cache enabled and/or installed (using the uncompressed kernel on the hard disk), if you gave the system any significant computing load, such as compiling something, it would lock-up or crash. The lock-up was most common. I could do a make world, and it would go off and churn cleaning up old files, making directories and such and look like things were going great. But when it hit the first one or two compiles, the system would die. Everytime. If you compiled some other program in a completely different part of the tree, the system would also die. So I removed the cache on three of the systems again and re-ran the make world. 20 to 30 hours later, they all completed the builds. Great, but nobody wants to touch these machines now because they run so slow. Again, these systems used to routinely rebuilt SNAP systems and 1.1.5.1 systems with make world and did hundreds of kernel builds without incident. I find it impossible to believe that months of operation under 1.1.5.1 and the SNAPs didn't beat the system as hard as 2.0.5 does, and that the cache hardware on all of these different systems is so sensitive and selective that it can detect the presence of 2.0.5 vs SNAP 04xx and earlier versions and crash accordingly. A better explanation is needed. I have been forced to put 1.1.5.1 back on a couple of systems since it works with the cache enabled. One client was annoyed enough to suggest I install the "L"-system instead. Ugh. I still believe some significant problem was introduced in the blast of major structural modifications that occurred between the 04xx snap and 2.0.5A. I have no hard suspects. It may be the same problem that causes compressed kernels to malfunction, but it may not be. In fact, I am looking for a site that still has the 04xx SNAP online who would be willing to let me FTP a copy so I can run my more performance- critical systems with the cache enabled rather than go back to 1.1.5.1. (In my haste to prepare for rapid testing of 2.0.5A, I wiped the copies all the SNAPs I had. Anyone who can help me out on this, please EMAIL direct. Thanks.) I would be happy to attempt a staged upgrade from 04xx to 2.0.5, to try to determine what set of modifications are causing the problem, particularly if someone would suggest groups of modifications that go together or must be applied together. This would be limited to kernel changes only. I would also run any other tests that would help locate the true cause of these problems. Thanks for any input on this. Frank Durda IV uhclem%nemesis@fw.ast.com "What would you rather be running: 'FreeBSD', or 'Bob-Pro' (aka Windows '95)?" :-) (C) (TM) 1995 FDIV
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m0sR7j9-0004vyC>