From owner-svn-src-head@freebsd.org Thu Jun 14 23:13:43 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EBDF61016A72; Thu, 14 Jun 2018 23:13:42 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mx0a-0010f301.pphosted.com (mx0a-0010f301.pphosted.com [148.163.149.254]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.pphosted.com", Issuer "thawte SHA256 SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 82AF76A06A; Thu, 14 Jun 2018 23:13:42 +0000 (UTC) (envelope-from alc@rice.edu) Received: from pps.filterd (m0102855.ppops.net [127.0.0.1]) by mx0b-0010f301.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5ENAYeG019541; Thu, 14 Jun 2018 18:13:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rice.edu; h=content-type : mime-version : subject : from : in-reply-to : date : cc : message-id : references : to; s=ricemail; bh=JPzN9akPw6lyn1qQuBXdrgHBj0ME1UDbu2Pn46nvoB8=; b=BeZkim+2pdCzrZ+7lyjz14rZzWKUVNAOWbMqrjZCD+wW+ywRVSQYAHCdhAw8i6oDxBnV 9dQzQ53hqNKhpfrQQhlm9BzhmtbmxqeplYlZbX5gQWapxS0+LeY80ljvdjiKkcDptvKH nlFId7F34tlAkEMOu6Sn+fwG5RAywMi6yfYQf+arCi8Hj+r8lZrGrGKY5YFa08bf9pB7 vr8xnHU7Kk5zXnnnTCj+ZC46OOY761ARYyxqPhmIKHgWkYkk97cMU+ipZgqXc6kMjsgR wQyjon49IWEO7jvMrjkRYmLelBkJeLYUmWbi9l4G9jm80qoZb8R9bxcDgAOOyjc58UVT zQ== Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx0b-0010f301.pphosted.com with ESMTP id 2jm0xtr1kk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Jun 2018 18:13:40 -0500 Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 245FD4017F; Thu, 14 Jun 2018 18:13:40 -0500 (CDT) Received-X: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 5F888401D6; Thu, 14 Jun 2018 18:13:39 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received-X: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id TtSGAGrCtmOZ; Thu, 14 Jun 2018 18:13:38 -0500 (CDT) Received: from alans-mbp-3.dyndns.rice.edu (alans-mbp-3.dyndns.rice.edu [10.87.78.14]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id CD4D9401D4; Thu, 14 Jun 2018 18:13:38 -0500 (CDT) Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: svn commit: r335171 - head/sys/vm From: Alan Cox In-Reply-To: Date: Thu, 14 Jun 2018 18:13:37 -0500 Cc: Konstantin Belousov , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Message-Id: <2DA5B980-1703-463C-80E1-B8430BFA2A38@rice.edu> References: <201806141941.w5EJf2qa069373@repo.freebsd.org> To: Steven Hartland X-Mailer: Apple Mail (2.3124) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-06-14_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1805220000 definitions=main-1806140255 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jun 2018 23:13:43 -0000 > On Jun 14, 2018, at 6:07 PM, Alan Cox wrote: >=20 >=20 >> On Jun 14, 2018, at 5:54 PM, Steven Hartland = > wrote: >>=20 >> Out of interest, how would this exhibit itself? >>=20 >=20 > A panic in vm_page_insert_after(). >=20 I should add that a non-debug kernel will panic a little later in = vm_radix_insert(). >> On 14/06/2018 20:41, Konstantin Belousov wrote: >>> Author: kib >>> Date: Thu Jun 14 19:41:02 2018 >>> New Revision: 335171 >>> URL: https://svnweb.freebsd.org/changeset/base/335171 = >>>=20 >>> Log: >>> Handle the race between fork/vm_object_split() and faults. >>> =20 >>> If fault started before vmspace_fork() locked the map, and then = during >>> fork, vm_map_copy_entry()->vm_object_split() is executed, it is >>> possible that the fault instantiate the page into the original = object >>> when the page was already copied into the new object (see >>> vm_map_split() for the orig/new objects terminology). This can = happen >>> if split found a busy page (e.g. from the fault) and slept = dropping >>> the objects lock, which allows the swap pager to instantiate >>> read-behind pages for the fault. Then the restart of the scan can = see >>> a page in the scanned range, where it was already copied to the = upper >>> object. >>> =20 >>> Fix it by instantiating the read-ahead pages before >>> swap_pager_getpages() method drops the lock to allocate pbuf. The >>> object scan would see the whole range prefilled with the busy = pages >>> and not proceed the range. >>> =20 >>> Note that vm_fault rechecks the map generation count after the = object >>> unlock, so that it restarts the handling if raced with split, and >>> re-lookups the right page from the upper object. >>> =20 >>> In collaboration with: alc >>> Tested by: pho >>> Sponsored by: The FreeBSD Foundation >>> MFC after: 1 week >>>=20 >>> Modified: >>> head/sys/vm/swap_pager.c >>>=20 >>> Modified: head/sys/vm/swap_pager.c >>> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>> --- head/sys/vm/swap_pager.c Thu Jun 14 19:01:40 2018 = (r335170) >>> +++ head/sys/vm/swap_pager.c Thu Jun 14 19:41:02 2018 = (r335171) >>> @@ -1096,21 +1096,24 @@ swap_pager_getpages(vm_object_t object, = vm_page_t *ma, >>> int *rahead) >>> { >>> struct buf *bp; >>> - vm_page_t mpred, msucc, p; >>> + vm_page_t bm, mpred, msucc, p; >>> vm_pindex_t pindex; >>> daddr_t blk; >>> - int i, j, maxahead, maxbehind, reqcount, shift; >>> + int i, maxahead, maxbehind, reqcount; >>> =20 >>> reqcount =3D count; >>> =20 >>> - VM_OBJECT_WUNLOCK(object); >>> - bp =3D getpbuf(&nsw_rcount); >>> - VM_OBJECT_WLOCK(object); >>> - >>> - if (!swap_pager_haspage(object, ma[0]->pindex, &maxbehind, = &maxahead)) { >>> - relpbuf(bp, &nsw_rcount); >>> + /* >>> + * Determine the final number of read-behind pages and >>> + * allocate them BEFORE releasing the object lock. Otherwise, >>> + * there can be a problematic race with vm_object_split(). >>> + * Specifically, vm_object_split() might first transfer pages >>> + * that precede ma[0] in the current object to a new object, >>> + * and then this function incorrectly recreates those pages as >>> + * read-behind pages in the current object. >>> + */ >>> + if (!swap_pager_haspage(object, ma[0]->pindex, &maxbehind, = &maxahead)) >>> return (VM_PAGER_FAIL); >>> - } >>> =20 >>> /* >>> * Clip the readahead and readbehind ranges to exclude resident = pages. >>> @@ -1132,35 +1135,31 @@ swap_pager_getpages(vm_object_t object, = vm_page_t *ma, >>> *rbehind =3D pindex - mpred->pindex - 1; >>> } >>> =20 >>> + bm =3D ma[0]; >>> + for (i =3D 0; i < count; i++) >>> + ma[i]->oflags |=3D VPO_SWAPINPROG; >>> + >>> /* >>> * Allocate readahead and readbehind pages. >>> */ >>> - shift =3D rbehind !=3D NULL ? *rbehind : 0; >>> - if (shift !=3D 0) { >>> - for (i =3D 1; i <=3D shift; i++) { >>> + if (rbehind !=3D NULL) { >>> + for (i =3D 1; i <=3D *rbehind; i++) { >>> p =3D vm_page_alloc(object, ma[0]->pindex - i, >>> VM_ALLOC_NORMAL); >>> - if (p =3D=3D NULL) { >>> - /* Shift allocated pages to the left. */ >>> - for (j =3D 0; j < i - 1; j++) >>> - bp->b_pages[j] =3D >>> - bp->b_pages[j + shift - i + = 1]; >>> + if (p =3D=3D NULL) >>> break; >>> - } >>> - bp->b_pages[shift - i] =3D p; >>> + p->oflags |=3D VPO_SWAPINPROG; >>> + bm =3D p; >>> } >>> - shift =3D i - 1; >>> - *rbehind =3D shift; >>> + *rbehind =3D i - 1; >>> } >>> - for (i =3D 0; i < reqcount; i++) >>> - bp->b_pages[i + shift] =3D ma[i]; >>> if (rahead !=3D NULL) { >>> for (i =3D 0; i < *rahead; i++) { >>> p =3D vm_page_alloc(object, >>> ma[reqcount - 1]->pindex + i + 1, = VM_ALLOC_NORMAL); >>> if (p =3D=3D NULL) >>> break; >>> - bp->b_pages[shift + reqcount + i] =3D p; >>> + p->oflags |=3D VPO_SWAPINPROG; >>> } >>> *rahead =3D i; >>> } >>> @@ -1171,15 +1170,18 @@ swap_pager_getpages(vm_object_t object, = vm_page_t *ma, >>> =20 >>> vm_object_pip_add(object, count); >>> =20 >>> - for (i =3D 0; i < count; i++) >>> - bp->b_pages[i]->oflags |=3D VPO_SWAPINPROG; >>> - >>> - pindex =3D bp->b_pages[0]->pindex; >>> + pindex =3D bm->pindex; >>> blk =3D swp_pager_meta_ctl(object, pindex, 0); >>> KASSERT(blk !=3D SWAPBLK_NONE, >>> ("no swap blocking containing %p(%jx)", object, = (uintmax_t)pindex)); >>> =20 >>> VM_OBJECT_WUNLOCK(object); >>> + bp =3D getpbuf(&nsw_rcount); >>> + /* Pages cannot leave the object while busy. */ >>> + for (i =3D 0, p =3D bm; i < count; i++, p =3D TAILQ_NEXT(p, = listq)) { >>> + MPASS(p->pindex =3D=3D bm->pindex + i); >>> + bp->b_pages[i] =3D p; >>> + } >>> =20 >>> bp->b_flags |=3D B_PAGING; >>> bp->b_iocmd =3D BIO_READ; >>>=20 >>=20 >=20