From owner-freebsd-bugs Mon Jan 29 13:04:47 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id NAA03082 for bugs-outgoing; Mon, 29 Jan 1996 13:04:47 -0800 (PST) Received: from obelix.cica.es (obelix.cica.es [150.214.1.10]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id NAA03068 for ; Mon, 29 Jan 1996 13:04:37 -0800 (PST) Received: (from amora@localhost) by obelix.cica.es (8.7.1/8.7.1) id WAA02694 for bugs@freebsd.org; Mon, 29 Jan 1996 22:01:53 +0100 (GMT-1:00) Date: Mon, 29 Jan 1996 22:01:53 +0100 (GMT-1:00) From: "Jesus A. Mora Marin" Message-Id: <199601292101.WAA02694@obelix.cica.es> To: bugs@freebsd.org Sender: owner-bugs@freebsd.org Precedence: bulk Hi, world! On the road again. I apologize for my delay in answering, but I was given the replies to my previous message some days after they were posted, and was busy last week. Thomas Graichen (graichen@omega.physik.fu-berlin.de) said: > is this a joke or truth ... ? I like jokes, but also DO hate to waste bandwith just for hoaxing. Sending endless bug reports causes me no ethical concerns :) > ... - you must have been sitting for hours to > write this bug-report (?) ... Yep. I spent a full Sunday afternoon, mostly collecting data and trying to figure out what stood /dev/<#?_^@! for in my hand-written notes. Also, trying to polish my awful English was not a piece of cake. Jordan -jkh@time.cdrom.com- said: > Yes, Jesus, we will indeed do our best to help you with this problem! Nice to meet you, Jordan. Well, in fact I didn't mean to ask for a hint, for something that isn't clear yet whether is a real bug in any place of FreeBSD code or a peculiarity in my bitty-box' guts. When I got interested in FreeBSD -or in Linux or whatever free stuff-, I knew that no support should be expected. Writting that report, my aim was only to notify a *possible* bug and to lend a hand, if possible. Just an ACK would suffice, but I see I've got much more. Thanks. Now, replying to Frank Durda IV -uhclem@nemesis.lonestar.org-. First things first: many many thanks, Frank, for your suggestions and ideas. I am very pleased working with you and, now, we'll review some results: > This version of firmware is newer than any I have seen, but I don't > think this is a problem if you can do something like > dd if=/dev/rmatcd0a of=/dev/null bs=100k > and let that run for ten minutes or so without any crashes or data errors. Ok, I've run a command like this, using block sizes ranging from 64k up to 256k. After transferring more than 100MB, no problem. > Does the crash occur with the GENERIC kernel, ie, the one that > came on the CD-ROM? If that version also crashes, it will help > eliminate the numerous differences between the GENERIC kernel and > your custom kernel. Yes. It happens all the time. I've seen it with kernel.GENERIC, and with some previous versions of customized kernels. It doesn't seem to be related with any option I can imagine: you can use or not DDB, KTRACE, XSERVER, and so on, but the nasty crash remains. The panic message with GENERIC kernel looks this way: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xf1dff000 fault code = supervisor write, page not present instruction pointer = 0x8:0xf01cffce code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL=0 current process = Idle interrupt mask = panic: page fault That is, exactly the same that the one obtained with the custom kernel, except of fault addresses (JAMMBSD: 0xf1e17000) and eip (JAMMBSD: 0x8:0xf01839de). Of course: different kernels -> different addresses. Note that using the same kernel, you'll always get the same addresses. I am not sure whether different virtual addresses, for different kernels, can be translated to the same physical address (Apologize if I am saying nonsenses, but I've never seen a good text describing clearly then inners and workings of 80x86 MMU). I wonder this because think of a possibly broken RAM SIMM. More about this, later. > I assume you issued a umount before this command since the system attempted > to mount the CD automatically when it came up. FreeBSD will let you mount > on top of mounts, although it isn't a real good idea. > If you did not do a umount first, please do so and try again, OR > don't do the mount at all since the media should be mounted. Good, even having the CD-ROM into the drive before booting, it seems that it's not mounted until you do it -if `mount' is to be believed-. Further, trying to umount /dev/matcd0a just after finishing the boot up, gets an error, i.e., 'device not mounted'. Anyway, I verified this and an explicite mount was required to access the CD-ROM. And, of course, the crash was there. > It would be nice if you could cause a failure with some utility that > is part of the bin distribution (/bin /usr/bin /sbin /usr/sbin, etc) > and that would let me look at it right away. That's the really funny side of this story! Frank, I have tried dd, cat, less, more, cp,..., on files in /cdrom/ports -where the offending file appears to be-. To be sure that the I/O was not using the blocks in the buffer cache, each command issued was preceded by a full dismount-mount cycle of matcd0a (think this suffices to return the blocks to the free list, I am not sure of the implementation). They all worked! I cannot believe that this is a bug related with a specific user app, but it only happens with Midnight Commander. Of course, I've tried also running under other accounts than superuser, and verified that it has not SUID/SGID bits. Still the same... > The code in question should have been reading from the CD (does the light > on the drive stay on when the panic occurs?), but some of the other > state doesn't make sense right now. If the drive light is out when the > panic occurs, the processor has somehow wandered into this section of > code by accident. Yes, Frank: the light in the CD drive is on just before the panic occurs and then goes off. Checked. > As to all the settings of your BIOS, I really can't advise except to > recommend you go with the settings that were present when the board > was purchased, rather than any accelerated values you may be using now. I've tried this way with original settings, and this doesn't change the picture. > Because of what I see in the rest of your description, you might make > sure you don't have a memory problem. This is easy to try... This is a point to check! I have been wondering this, because when I bought this 486 board, had to buy new SIMMs also. I got a lot of troubles with Windows, Doom -THIS broke my heart- and even a panic in SCO Unix 386 (a trap 0x0e, i.e., exactly the same: a page fault in kernel mode). I identified the damned SIMM and got rid of it, and all has been working great thereafter. Nevertheless, a faulty SIMM must be discarded. I have rotated the three 4MB SIMMs in this scheme: 123 -> 312, so no SIMM remained in its original position. But, alas, this didn't fixed the problem: the crash reproduced exactly the same. Well, perhaps there is some obscure hardware incompatibility causing the problem -think this cannot be never discarded when dealing with PC clones-. Now I'll try to do some hacking -I cannot promise anything but I will do my best :) -. I'll re-`config -g' the kernel, turn the CD-ROM driver debugging options on, and so on. Again, I'll try to get a kernel dump after the crash. Good, must think again about all this and plan carefully. Now, time to finish. Any contribution, idea, suggestion will be welcome. Thanks, Jesus A. Mora Marin amora@obelix.cica.es