From owner-cvs-sys  Sat Jun 29 22:17:16 1996
Return-Path: owner-cvs-sys
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id WAA01256
          for cvs-sys-outgoing; Sat, 29 Jun 1996 22:17:16 -0700 (PDT)
Received: (from davidg@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id WAA01238;
          Sat, 29 Jun 1996 22:17:10 -0700 (PDT)
Date: Sat, 29 Jun 1996 22:17:10 -0700 (PDT)
From: David Greenman <davidg>
Message-Id: <199606300517.WAA01238@freefall.freebsd.org>
To: CVS-committers, cvs-all, cvs-sys
Subject: cvs commit:  src/sys/kern vfs_bio.c
Sender: owner-cvs-sys@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

davidg      96/06/29 22:17:09

  Modified:    sys/kern  vfs_bio.c
  Log:
  Fixed a major bug that caused various pmap related panics, hangs, and reboots.
  
  The i386 pmap module uses a special area of kernel virtual memory for mapping
  of page tables pages when it needs to modify another process's virtual
  address space. It's called the 'alternate page table map'. There is only one
  of them and it's expected that only one process will be using it at once and
  that the operation is atomic.
  When the merged VM/buffer cache was implemented over a year ago, it became
  necessary to rundown VM pages at I/O completion. The unfortunate and
  unforeseen side effect of this is that pmap functions are now called at bio
  interrupt time. If there happend to be a process using the alternate page
  table map when this I/O completion occurred, it was possible for a different
  process's address space to be switched into the alternate page table map -
  leaving the current pmap process with the wrong address space mapped when
  the interrupt completed. This resulted in BAD things happening like pages
  being mapped or removed from the wrong address space, etc.. Since a very
  common case of a process modifying another process's address space is during
  fork when the kernel stack is inserted, one of the most common manifestations
  of this bug was the kernel stack not being mapped properly, resulting in a
  silent hang or reboot. This made it VERY difficult to troubleshoot this bug
  (I've been trying to figure out the cause of this for >6 months). Fortunately,
  the set of conditions that must be true before this problem occurs is
  sufficiently rare enough that most people never saw the bug occur. As I/O
  rates increase, however, so does the frequency of the crashes. This problem
  used to kill wcarchive about every 10 days, but in more recent times when
  the traffic exceeded >100GB/day, the machine could barely manage 6 hours of
  uptime.
  The fix is to make certain that no process has the pages mapped that are
  involved in the I/O, before the I/O is started. The pages are made busy, so
  no process will be able to map them, either, until the I/O has finished.
  This side-steps the issue by still allowing the pmap functions to be called
  at interrupt time, but also assuring that the alternate page table map won't
  be switched.
  Unfortunately, this appears to not be the only cause of this problem. :-(
  
  Reviewed by:	dyson
  
  Revision  Changes    Path
  1.94      +2 -2      src/sys/kern/vfs_bio.c