From owner-freebsd-hackers  Tue Mar 12 13:21:46 2002
Delivered-To: freebsd-hackers@freebsd.org
Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12])
	by hub.freebsd.org (Postfix) with ESMTP id BBB6037B41B
	for <freebsd-hackers@freebsd.org>; Tue, 12 Mar 2002 13:21:15 -0800 (PST)
Received: from pool0291.cvx40-bradley.dialup.earthlink.net ([216.244.43.36] helo=mindspring.com)
	by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16ktdm-0002xS-00; Tue, 12 Mar 2002 13:17:02 -0800
Message-ID: <3C8E703D.430CEC4E@mindspring.com>
Date: Tue, 12 Mar 2002 13:16:45 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Clark C . Evans" <cce@clarkevans.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: panic: pmap_enter
References: <20020311210332.A38510@doublegemini.com> <1015919910.4901.5.camel@blackbox.pacbell.net> <3C8DBC98.508D76A9@mindspring.com> <20020312100850.A41104@doublegemini.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

"Clark C . Evans" wrote:
> | > It seems to me that you are showing only the last part of the trace,
> | > which shows where a second panic occurred. While that may also be an
> | > issue the real reason for the panic occurred earlier. Please post the
> | > complete trace.
> 
> Thank you Mike, I'll do my best to duplicate it, I'm not an
> expert.  It seems that I have the problem (panic) when ever
> a program core-dumps.  But that said, I'm getting core dumps
> fairly easily... is the memory file system stable?
> 
> | You faulted on a 4M page mapping for which backing store was
> | not assigned.a
> 
> By "backing store" you mean "swap"?  I don't have a swap space,
> although I do have 1GB memory and I'm not using much memory.

No, I mean allocate kernel memory pages.  As I said, 4M pages
are not permitted to be swapped, so they should never go through
the swap code.

Because of this, a fault indicating that there is not a page
present behind some of the memory is fatal.


> | You are not permitted to create 4M pages without assigned
> | backing store (basically, you can't page them in and out).
> 
> Ok.  This is swap related... I'm running with just a read-only
> CD-ROM and a MFS.  Must I have a swap?  How do I tell FreeBSD
> not to use a swap?  I have /var and /tmp as a MFS.

Right now, there are two possibilities one is extremely ugly.

As I told you before:

	Add DISABLE_PSE to your config file and try it again
	and tell us what happens.

This will actually diagnose which of the two is the problem.

Basically, your posted Python program and the fdisk program
down in /usr/src/sbin/i386/fdisk do nothing to exercise the
4M page path (i.e. they don't mmap a device).

It would be useful to know the virtual address of the panic,
to know whether it was because of a pysical page backing for
one of the MFS', or for the kernel.  Personally, I don't use
the MFS code enough, and have no idea which version of it you
are using anyway, to know whether or not it tries to use 4M
pages or not (basically, if the allocation is on a 4M boundary
in KVA space, and goes for at least 4M, then it's possible;
that's 4M +/- 4M of space to trigger use of 4M pages).

From my reasing of the non-kernel 4M page mapping, it seems
to me that it's not possible to end up without backing store
in the 4M mapping case for devices, wihch is basically the
only other place that 4M pages get invoked.

So my guess is that you are exhausting memory, and so your
reference is causing it to blow up.

If the DISABLE_PSE fixes your problem (by making the system
use only 4K pages), then it's most likely that you are
running into a kernel image that's smaller than 4M being
used with a4M mapping, so the memory at the end of the 4M
page is not given physical pages as backing for the mapping
(i.e. the mapping is bogus), and when you go to access the
memory, it explodes.

There are also some subtle bugs in the AMD and Intel CPUs
having to do with 4M pages.  Disabling the page size extension
with the DISABLE_PSE will push your code out of the running as
the cause of the problem for this, so you should try this first.

If you are personally using 4M pages in the kernel, and getting
panics as a result, then you probably don't know what you are
doing with the 4M page allocations, and need to back out that
code until you do understand.

In one of the postings someone (maybe you?) suggested that they
were having panics of this sort on an SMP system.  THe use of
4M pages in the presence of MESI cache coherency as impemeneted
by Intel and AMD is particularly problematic.  Again, DISABLE_PSE
would be a good diagnostic.

In the worst case, you may just be running code from the small
period of time when Peter Wemm had reordered some of the assembly
code to do some needed cleanup, and tickled the Intel/AMD 4M page
bugs, and had to back it out.  If you are running -current, make
sure you are running recent code, and not some old "stable snapshot
of 5.0" that could contain these bugs (and not be as stable as you
thought it was at the time you picked the date as your snapshot
date).

So...

Again: add DISABLE_PSE to your config, and tell us if that fixes
the problem for you.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message