From owner-svn-src-all@freebsd.org Thu Jun 14 22:54:55 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B28C21014361 for ; Thu, 14 Jun 2018 22:54:55 +0000 (UTC) (envelope-from steven.hartland@multiplay.co.uk) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3043968A95 for ; Thu, 14 Jun 2018 22:54:54 +0000 (UTC) (envelope-from steven.hartland@multiplay.co.uk) Received: by mail-wm0-x22a.google.com with SMTP id 69-v6so624577wmf.3 for ; Thu, 14 Jun 2018 15:54:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language; bh=AvYQzJPv54dF/OiYReKiYZWQVSsyTjQohpo5jkPbnjQ=; b=gmyu4ld4HzgOVJfFyZBiJvI8f2Li8d2Oe1j9tWqO63r2dvUh4aclGsaEb+Ren4W0TR Le0M4JD+R7fx5zlYkS+rBEmjp3mqyo9eaQSlzvPdBiBXHSJeyb3dPy+tMBwNZ4XpU7sj 9P9V93TqC9Y882FEE7PrR6g672Z9iGzkEP8klH501xNQBHKn792lJdeLW4F3zcDQ1K1b t1jVPZG66I/DNOc1+7TFjtmho0RB8W0v4OG4HANYnM8QBzLEwZdV110FkSqSnFMamNmq 7XUML4EobYTsIxXfnQ7KOGW5MXAJbzteZqD42QwoSRLoI0lzva2lC2tS90Yp9u3bHDnY 4k9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=AvYQzJPv54dF/OiYReKiYZWQVSsyTjQohpo5jkPbnjQ=; b=fg68ub+xiyQrhpSWj2JodUMg5jkM/M3usvePioQC95kXRZ9S8PtUnKcRqcAkIlGAWQ nZ4J9xaoe/FKKRNgDKGM497HYMaXJRMxHggSvU7i0ubRqLNScMUyqG25MWQBZNRrkpEk eB825MEEziGFfrSWPDOVBtTHfi0ZQftd6eav/jgVqzjOUGWdrFp+WE4CmfGq7+SjFY2/ 3xS65LjGECF4gwytTjpfc6+TkSMY2ieJZRX6gXFemi4a4YXHqozbFqwJIAN8P/YG6CxO GBzwZAsV6OI39XOt5gObr34GaFlQLHqnQrBQCn5dZq9v7kw9Vib6xK9VXgf6/ctnz8Oa X+fA== X-Gm-Message-State: APt69E1im8GmWqSHBCu2JfeSkIj8TaafWvwIrTicYIfDmRpSBoRIGGF+ CxASV7iQRUTw8Kq2EmeHzji8cQ== X-Google-Smtp-Source: ADUXVKKHwH1oQIXBo2JvNaRe26syRZCGiesoWlsRaYa9AqwIqcYSKxPDPNKX55LyTVT1FxWwKwArNQ== X-Received: by 2002:a50:b832:: with SMTP id j47-v6mr4078253ede.272.1529016893767; Thu, 14 Jun 2018 15:54:53 -0700 (PDT) Received: from [10.44.128.75] ([161.12.40.153]) by smtp.gmail.com with ESMTPSA id t3-v6sm2827192edh.53.2018.06.14.15.54.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Jun 2018 15:54:52 -0700 (PDT) Subject: Re: svn commit: r335171 - head/sys/vm To: Konstantin Belousov , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201806141941.w5EJf2qa069373@repo.freebsd.org> From: Steven Hartland Message-ID: Date: Thu, 14 Jun 2018 23:54:52 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <201806141941.w5EJf2qa069373@repo.freebsd.org> Content-Language: en-US Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jun 2018 22:54:55 -0000 Out of interest, how would this exhibit itself? On 14/06/2018 20:41, Konstantin Belousov wrote: > Author: kib > Date: Thu Jun 14 19:41:02 2018 > New Revision: 335171 > URL: https://svnweb.freebsd.org/changeset/base/335171 > > Log: > Handle the race between fork/vm_object_split() and faults. > > If fault started before vmspace_fork() locked the map, and then during > fork, vm_map_copy_entry()->vm_object_split() is executed, it is > possible that the fault instantiate the page into the original object > when the page was already copied into the new object (see > vm_map_split() for the orig/new objects terminology). This can happen > if split found a busy page (e.g. from the fault) and slept dropping > the objects lock, which allows the swap pager to instantiate > read-behind pages for the fault. Then the restart of the scan can see > a page in the scanned range, where it was already copied to the upper > object. > > Fix it by instantiating the read-ahead pages before > swap_pager_getpages() method drops the lock to allocate pbuf. The > object scan would see the whole range prefilled with the busy pages > and not proceed the range. > > Note that vm_fault rechecks the map generation count after the object > unlock, so that it restarts the handling if raced with split, and > re-lookups the right page from the upper object. > > In collaboration with: alc > Tested by: pho > Sponsored by: The FreeBSD Foundation > MFC after: 1 week > > Modified: > head/sys/vm/swap_pager.c > > Modified: head/sys/vm/swap_pager.c > ============================================================================== > --- head/sys/vm/swap_pager.c Thu Jun 14 19:01:40 2018 (r335170) > +++ head/sys/vm/swap_pager.c Thu Jun 14 19:41:02 2018 (r335171) > @@ -1096,21 +1096,24 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma, > int *rahead) > { > struct buf *bp; > - vm_page_t mpred, msucc, p; > + vm_page_t bm, mpred, msucc, p; > vm_pindex_t pindex; > daddr_t blk; > - int i, j, maxahead, maxbehind, reqcount, shift; > + int i, maxahead, maxbehind, reqcount; > > reqcount = count; > > - VM_OBJECT_WUNLOCK(object); > - bp = getpbuf(&nsw_rcount); > - VM_OBJECT_WLOCK(object); > - > - if (!swap_pager_haspage(object, ma[0]->pindex, &maxbehind, &maxahead)) { > - relpbuf(bp, &nsw_rcount); > + /* > + * Determine the final number of read-behind pages and > + * allocate them BEFORE releasing the object lock. Otherwise, > + * there can be a problematic race with vm_object_split(). > + * Specifically, vm_object_split() might first transfer pages > + * that precede ma[0] in the current object to a new object, > + * and then this function incorrectly recreates those pages as > + * read-behind pages in the current object. > + */ > + if (!swap_pager_haspage(object, ma[0]->pindex, &maxbehind, &maxahead)) > return (VM_PAGER_FAIL); > - } > > /* > * Clip the readahead and readbehind ranges to exclude resident pages. > @@ -1132,35 +1135,31 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma, > *rbehind = pindex - mpred->pindex - 1; > } > > + bm = ma[0]; > + for (i = 0; i < count; i++) > + ma[i]->oflags |= VPO_SWAPINPROG; > + > /* > * Allocate readahead and readbehind pages. > */ > - shift = rbehind != NULL ? *rbehind : 0; > - if (shift != 0) { > - for (i = 1; i <= shift; i++) { > + if (rbehind != NULL) { > + for (i = 1; i <= *rbehind; i++) { > p = vm_page_alloc(object, ma[0]->pindex - i, > VM_ALLOC_NORMAL); > - if (p == NULL) { > - /* Shift allocated pages to the left. */ > - for (j = 0; j < i - 1; j++) > - bp->b_pages[j] = > - bp->b_pages[j + shift - i + 1]; > + if (p == NULL) > break; > - } > - bp->b_pages[shift - i] = p; > + p->oflags |= VPO_SWAPINPROG; > + bm = p; > } > - shift = i - 1; > - *rbehind = shift; > + *rbehind = i - 1; > } > - for (i = 0; i < reqcount; i++) > - bp->b_pages[i + shift] = ma[i]; > if (rahead != NULL) { > for (i = 0; i < *rahead; i++) { > p = vm_page_alloc(object, > ma[reqcount - 1]->pindex + i + 1, VM_ALLOC_NORMAL); > if (p == NULL) > break; > - bp->b_pages[shift + reqcount + i] = p; > + p->oflags |= VPO_SWAPINPROG; > } > *rahead = i; > } > @@ -1171,15 +1170,18 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma, > > vm_object_pip_add(object, count); > > - for (i = 0; i < count; i++) > - bp->b_pages[i]->oflags |= VPO_SWAPINPROG; > - > - pindex = bp->b_pages[0]->pindex; > + pindex = bm->pindex; > blk = swp_pager_meta_ctl(object, pindex, 0); > KASSERT(blk != SWAPBLK_NONE, > ("no swap blocking containing %p(%jx)", object, (uintmax_t)pindex)); > > VM_OBJECT_WUNLOCK(object); > + bp = getpbuf(&nsw_rcount); > + /* Pages cannot leave the object while busy. */ > + for (i = 0, p = bm; i < count; i++, p = TAILQ_NEXT(p, listq)) { > + MPASS(p->pindex == bm->pindex + i); > + bp->b_pages[i] = p; > + } > > bp->b_flags |= B_PAGING; > bp->b_iocmd = BIO_READ; >