From owner-freebsd-current  Sat Jul 11 19:11:49 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id TAA01587
          for freebsd-current-outgoing; Sat, 11 Jul 1998 19:11:49 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from smtp01.primenet.com (daemon@smtp01.primenet.com [206.165.6.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id TAA01575
          for <current@FreeBSD.ORG>; Sat, 11 Jul 1998 19:11:46 -0700 (PDT)
          (envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.8.8/8.8.8) id TAA01054;
	Sat, 11 Jul 1998 19:11:45 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp01.primenet.com, id smtpd001028; Sat Jul 11 19:11:39 1998
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id TAA29647;
	Sat, 11 Jul 1998 19:11:36 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199807120211.TAA29647@usr08.primenet.com>
Subject: Re: Arrgh ! resubscribing again again again....
To: dg@root.com
Date: Sun, 12 Jul 1998 02:11:36 +0000 (GMT)
Cc: tlambert@primenet.com, current@FreeBSD.ORG
In-Reply-To: <199807120115.SAA28466@implode.root.com> from "David Greenman" at Jul 11, 98 06:15:45 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >I see in the VM code where a SIGKILL could result, but it seems to me that
> >the page table entry exists, it just doesn't have pages to back it, and
> >when the page to back the entry fails allocation, you get SIGSEGV, since
> >it isn't mapped when you do the reference.
> >
> >Am I reading this code wrong?
> 
>    Yes, you are reading the code wrong. A SIGSEGV will only occur when there
> is no mapping. If this happend for you, then either there was no mapping (a
> programatic error), or there is a bug in the kernel.
>    In fact, one of the VM system test programs that John and I used frequently
> is called "testswap", which does something similar to that suggested above; it
> never exited with SIGSEGV in the past. Source attached.

[ ... code elided ... ]

This code doesn't use a shared memory segment, it uses heap memory.

I believe the falure is specific to shared memory and/or mmap.

By using dbm, which mmap's its files, I can read the clean pages
out of the dabase file backing the object.

Then I swap the system heavily, causing the page to be LRU'ed out
from the vnode that is backing the object -- but *not* dissociated
from the process address space.  I keep the database open a very
long time.  This is typical behaviour for some types of password
file using programs that don't explicitly call endpwent.

Then the page gets marked dirty by another process.

Then I write the page (modify the database), and the page gets written
to the wrong file.

I am able to get it to fairly consistently corrupt crontab by running
cron and having it do something (newsyslog, in my test case) once a
minute.  It is generally always part of the password dbm contents
that are written to the crontab.


This example is just to show that there are bugs in the mmap code.
Unfortunately, this is not a set of test programs, it's a production
system that behaves this way, fairly reliably.


Now with a second test case, I can map a very large file, and then
rotor through all the pages except one, constantly.

I run something else, which grabs and sbrk's back memory, sleeping 20
seconds between iterations, one page more each time, touching the memory
that it sbrk's in before giving it back.

Once every 20 rotors in the first program, I touch the page I skipped,
causing it to be dirty, and be written.  I *only* write the page, and
I *only* write the page with a page worth of data on a page boundry,
so there is no read-before write.

Eventually, the program doing the sbrk's SIGSEGV's (signal 11, logged
to the console).

It's not logical, given the code, but it happens.

I suspect that ther page is marked as being in core, but isn't, because
it has been improperly reused out from under it (ie: there get to be two
mappings, and the page is written out as a result of a write, and having
been written is discarded, leaving the other page mapping hanging).


I first noticed this problem in 2.2.6-stable.

I'm at work right now; the code is pretty obvious, but I can send it
to you when I get home, if you want.  You will need enough disk to hold
the large mapped file, which the shell script creates via dd of /dev/zero.

I generally run this on a 16M system with 48M of swap, making the file
approximately 100M; if you have more than this, you will need a bigger
file; the point is to cause thrashing.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message