Date: Tue, 17 Apr 2012 09:49:59 -0500 From: Alan Cox <alc@rice.edu> To: Konstantin Belousov <kostikbel@gmail.com> Cc: alc@freebsd.org, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Tony Lanza <tlanza@juniper.net>, Ewart Tempest <etempest@juniper.net> Subject: Re: Corrupted pmap pm_vlist - pmap_remove_pte() Message-ID: <4F8D8317.3060001@rice.edu> In-Reply-To: <20120417094820.GK2358@deviant.kiev.zoral.com.ua> References: <A4C6A166C36F5F40A5767E6F66358FC0BB8B0B9681@EMBX01-WF.jnpr.net> <20120417094820.GK2358@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 4/17/2012 4:48 AM, Konstantin Belousov wrote: > On Mon, Apr 16, 2012 at 03:08:25PM -0400, Ewart Tempest wrote: >> In FreeBSD 6.*, we have been seeing crashes in pmap_remove_pages() that only seem to occur in scaling scenarios: >> >> 2564 #ifdef PMAP_REMOVE_PAGES_CURPROC_ONLY >> 2565 pte = vtopte(pv->pv_va); >> 2566 #else >> 2567 pte = pmap_pte(pmap, pv->pv_va); >> 2568 #endif >> 2569 tpte = *pte;<===================== page fault here >> >> The suspicion is that the pmap's pm_pvlist list is getting corrupted. To this end, I have a question on the following logic in pmap_remove_pte() (see in-line comment): >> >> 1533 static int >> 1534 pmap_remove_pte(pmap_t pmap, pt_entry_t *ptq, vm_offset_t va, pd_entry_t ptepde) >> 1535 { >> 1536 pt_entry_t oldpte; >> 1537 vm_page_t m; >> 1538 >> 1539 PMAP_LOCK_ASSERT(pmap, MA_OWNED); >> 1540 oldpte = pte_load_clear(ptq); >> 1541 if (oldpte& PG_W) >> 1542 pmap->pm_stats.wired_count -= 1; >> 1543 /* >> 1544 * Machines that don't support invlpg, also don't support >> 1545 * PG_G. >> 1546 */ >> 1547 if (oldpte& PG_G) >> 1548 pmap_invalidate_page(kernel_pmap, va); >> 1549 pmap->pm_stats.resident_count -= 1; >> 1550 if (oldpte& PG_MANAGED) { >> 1551 m = PHYS_TO_VM_PAGE(oldpte& PG_FRAME); >> 1552 if (oldpte& PG_M) { >> 1553 #if defined(PMAP_DIAGNOSTIC) >> 1554 if (pmap_nw_modified((pt_entry_t) oldpte)) { >> 1555 printf( >> 1556 "pmap_remove: modified page not writable: va: 0x%lx, pte: 0x%lx\n", >> 1557 va, oldpte); >> 1558 } >> 1559 #endif >> 1560 if (pmap_track_modified(va)) >> 1561 vm_page_dirty(m); >> 1562 } >> 1563 if (oldpte& PG_A) >> 1564 vm_page_flag_set(m, PG_REFERENCED); >> 1565 pmap_remove_entry(pmap, m, va); >> 1566 } >> 1567 return (pmap_unuse_pt(pmap, va, ptepde));<======= *** under what circumstances is it valid to free the page but not remove it from the pmap's pm_vlist? Even the code comment for pmap_unuse_pt() commences "After removing a page table entry ... ". *** > It is valid to not remove pv_entry when no pv_entry exists for the mapping. > The pv_entry is created if the page is managed, see pmap_enter() code. > The block above the return is executed when the page is managed, or at > least pmap thinks so. > > The HEAD code will panic in pmap_pvh_free() if pmap_phv_remove() cannot > find the pv entry for given page and given pmap/va. > >> 1568 } >> >> If the tail end of the above function is changed as follows: >> >> 1565 pmap_remove_entry(pmap, m, va); >> 1565.5 return (pmap_unuse_pt(pmap, va, ptepde)); >> 1566 } >> 1567 return (0); >> >> Then we don't see any crashes ... but is it the right thing to do? > Should be not. Try to test this with some unmanaged mapping, like > /dev/mem pages mapped into the exiting process address space. > > I am too new to know about any nuances of the RELENG_6 code. The RELENG_6 code is doing essentially the same things as newer versions. Crashes in this specific place are usually caused by DRAM errors. Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F8D8317.3060001>