From owner-dev-commits-src-all@freebsd.org Mon Jan 11 20:58:27 2021 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 85A3E4E5EED; Mon, 11 Jan 2021 20:58:27 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DF5dv0w9Zz4rdf; Mon, 11 Jan 2021 20:58:26 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 7719D21165; Mon, 11 Jan 2021 20:58:26 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 10BKwQNT041874; Mon, 11 Jan 2021 20:58:26 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 10BKwQbS041873; Mon, 11 Jan 2021 20:58:26 GMT (envelope-from git) Date: Mon, 11 Jan 2021 20:58:26 GMT Message-Id: <202101112058.10BKwQbS041873@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Konstantin Belousov Subject: git: 4174e45fb432 - main - amd64 pmap: do not sleep in pmap_allocpte_alloc() with zero referenced page table page. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kib X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 4174e45fb4320dc27969c3fcbdd7d06e3b1150b8 Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jan 2021 20:58:30 -0000 The branch main has been updated by kib: URL: https://cgit.FreeBSD.org/src/commit/?id=4174e45fb4320dc27969c3fcbdd7d06e3b1150b8 commit 4174e45fb4320dc27969c3fcbdd7d06e3b1150b8 Author: Konstantin Belousov AuthorDate: 2021-01-02 22:27:20 +0000 Commit: Konstantin Belousov CommitDate: 2021-01-11 20:57:58 +0000 amd64 pmap: do not sleep in pmap_allocpte_alloc() with zero referenced page table page. Otherwise parallel pmap_allocpte_alloc() for nearby va might also fail allocating page table page and free the page under us. The end result is that we could dereference unmapped pte when doing cleanup after sleep. Instead, on allocation failure, first free everything, only then we can drop pmap mutex and sleep safely, right before returning to caller. Split inner non-sleepable part of the pmap_allocpte_alloc() into a new helper pmap_allocpte_nosleep(). Reviewed by: markj Reported and tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27956 --- sys/amd64/amd64/pmap.c | 66 ++++++++++++++++++++++++++------------------------ 1 file changed, 35 insertions(+), 31 deletions(-) diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c index 20aed31a2098..0e1d1c02d1fc 100644 --- a/sys/amd64/amd64/pmap.c +++ b/sys/amd64/amd64/pmap.c @@ -1275,10 +1275,12 @@ static void pmap_update_pde(pmap_t pmap, vm_offset_t va, pd_entry_t *pde, pd_entry_t newpde); static void pmap_update_pde_invalidate(pmap_t, vm_offset_t va, pd_entry_t pde); -static vm_page_t pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, - struct rwlock **lockp, vm_offset_t va); static pd_entry_t *pmap_alloc_pde(pmap_t pmap, vm_offset_t va, vm_page_t *pdpgp, struct rwlock **lockp); +static vm_page_t pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, + struct rwlock **lockp, vm_offset_t va); +static vm_page_t pmap_allocpte_nosleep(pmap_t pmap, vm_pindex_t ptepindex, + struct rwlock **lockp, vm_offset_t va); static vm_page_t pmap_allocpte(pmap_t pmap, vm_offset_t va, struct rwlock **lockp); @@ -4294,7 +4296,7 @@ pmap_allocpte_getpml4(pmap_t pmap, struct rwlock **lockp, vm_offset_t va, pml5index = pmap_pml5e_index(va); pml5 = &pmap->pm_pmltop[pml5index]; if ((*pml5 & PG_V) == 0) { - if (pmap_allocpte_alloc(pmap, pmap_pml5e_pindex(va), lockp, + if (pmap_allocpte_nosleep(pmap, pmap_pml5e_pindex(va), lockp, va) == NULL) return (NULL); allocated = true; @@ -4331,7 +4333,7 @@ pmap_allocpte_getpdp(pmap_t pmap, struct rwlock **lockp, vm_offset_t va, if ((*pml4 & PG_V) == 0) { /* Have to allocate a new pdp, recurse */ - if (pmap_allocpte_alloc(pmap, pmap_pml4e_pindex(va), lockp, + if (pmap_allocpte_nosleep(pmap, pmap_pml4e_pindex(va), lockp, va) == NULL) { if (pmap_is_la57(pmap)) pmap_allocpte_free_unref(pmap, va, @@ -4355,16 +4357,6 @@ pmap_allocpte_getpdp(pmap_t pmap, struct rwlock **lockp, vm_offset_t va, } /* - * This routine is called if the desired page table page does not exist. - * - * If page table page allocation fails, this routine may sleep before - * returning NULL. It sleeps only if a lock pointer was given. - * - * Note: If a page allocation fails at page table level two, three, or four, - * up to three pages may be held during the wait, only to be released - * afterwards. This conservative approach is easily argued to avoid - * race conditions. - * * The ptepindexes, i.e. page indices, of the page table pages encountered * while translating virtual address va are defined as follows: * - for the page table page (last level), @@ -4398,7 +4390,7 @@ pmap_allocpte_getpdp(pmap_t pmap, struct rwlock **lockp, vm_offset_t va, * since it is statically allocated by pmap_pinit() and not by pmap_allocpte(). */ static vm_page_t -pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, +pmap_allocpte_nosleep(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, vm_offset_t va) { vm_pindex_t pml5index, pml4index; @@ -4420,21 +4412,8 @@ pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, * Allocate a page table page. */ if ((m = vm_page_alloc(NULL, ptepindex, VM_ALLOC_NOOBJ | - VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) { - if (lockp != NULL) { - RELEASE_PV_LIST_LOCK(lockp); - PMAP_UNLOCK(pmap); - PMAP_ASSERT_NOT_IN_DI(); - vm_wait(NULL); - PMAP_LOCK(pmap); - } - - /* - * Indicate the need to retry. While waiting, the page table - * page may have been allocated. - */ + VM_ALLOC_WIRED | VM_ALLOC_ZERO)) == NULL) return (NULL); - } if ((m->flags & PG_ZERO) == 0) pmap_zero_page(m); @@ -4509,8 +4488,8 @@ pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, } if ((*pdp & PG_V) == 0) { /* Have to allocate a new pd, recurse */ - if (pmap_allocpte_alloc(pmap, pmap_pdpe_pindex(va), - lockp, va) == NULL) { + if (pmap_allocpte_nosleep(pmap, pmap_pdpe_pindex(va), + lockp, va) == NULL) { pmap_allocpte_free_unref(pmap, va, pmap_pml4e(pmap, va)); vm_page_unwire_noq(m); @@ -4532,7 +4511,32 @@ pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, } pmap_resident_count_inc(pmap, 1); + return (m); +} +/* + * This routine is called if the desired page table page does not exist. + * + * If page table page allocation fails, this routine may sleep before + * returning NULL. It sleeps only if a lock pointer was given. Sleep + * occurs right before returning to the caller. This way, we never + * drop pmap lock to sleep while a page table page has ref_count == 0, + * which prevents the page from being freed under us. + */ +static vm_page_t +pmap_allocpte_alloc(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp, + vm_offset_t va) +{ + vm_page_t m; + + m = pmap_allocpte_nosleep(pmap, ptepindex, lockp, va); + if (m == NULL && lockp != NULL) { + RELEASE_PV_LIST_LOCK(lockp); + PMAP_UNLOCK(pmap); + PMAP_ASSERT_NOT_IN_DI(); + vm_wait(NULL); + PMAP_LOCK(pmap); + } return (m); }