From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 8 15:14:51 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1FC7435 for ; Wed, 8 Oct 2014 15:14:51 +0000 (UTC) Received: from mail-lb0-x230.google.com (mail-lb0-x230.google.com [IPv6:2a00:1450:4010:c04::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 470B3D49 for ; Wed, 8 Oct 2014 15:14:51 +0000 (UTC) Received: by mail-lb0-f176.google.com with SMTP id p9so8449926lbv.35 for ; Wed, 08 Oct 2014 08:14:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=69CRMLQJpaxOozFXvDh1pKcyqZaQKHBTH4pCpRrw/bs=; b=hraSYFgCf7/qcoxW3Ftcrns6eNapgVSd7dq6jlTH3/fAJkm1Svc3rVcvmadxCJYu9G 5GpTOSLYq2m2tCT0mG0EV13wWna2GKgO8ibyzohoPX6kWd4kiuQEzj064k3+znSG0LKg WMOZ8HrfXxc8SrhNRY7BWL6O6opXvZE3eoKZ557t+jSGUV8KUXlX7xuIRV9CYP6MtNl0 Hti9p4D1fEAoewZ3Fa9ajcE5b0Y2UxlKHiikJ+13XzVrDaRGOJoD0vwco8U33s8B+jtZ ytWgdpBfDnidPG3nWHXQ5AbHbUak4J8mjFNS7ormPsqg9IVVHT7L9AV85++ZmKZLWsUW GXEg== X-Received: by 10.152.204.231 with SMTP id lb7mr12080406lac.44.1412781288835; Wed, 08 Oct 2014 08:14:48 -0700 (PDT) Received: from ?IPv6:2a02:6b8::408:2c61:a8e:6f08:d86a? ([2a02:6b8:0:408:2c61:a8e:6f08:d86a]) by mx.google.com with ESMTPSA id j1sm122968lag.7.2014.10.08.08.14.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 08 Oct 2014 08:14:48 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: mmap() question From: Dmitry Sivachenko In-Reply-To: <20131012095919.GI41229@kib.kiev.ua> Date: Wed, 8 Oct 2014 19:14:45 +0400 Content-Transfer-Encoding: quoted-printable Message-Id: <5C10922E-7030-4C89-9FD3-DA770E462067@gmail.com> References: <95E0B821-BF9B-4EBF-A1E5-1DDCBB1C3D1B@gmail.com> <20131011051702.GE41229@kib.kiev.ua> <20131012095919.GI41229@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1878.6) Cc: "hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Oct 2014 15:14:51 -0000 On 12 =D0=BE=D0=BA=D1=82. 2013 =D0=B3., at 13:59, Konstantin Belousov = wrote: >=20 > I was not able to reproduce the situation locally. I even tried to = start > a lot of threads accessing the mapped regions, to try to outrun the > pagedaemon. The user threads sleep on the disk read, while pagedaemon > has a lot of time to rebalance the queues. It might be a case when SSD > indeed makes a difference. >=20 > Still, I see how this situation could appear. The code, which triggers > OOM, never fires if there is a free space in the swapfile, so the > absense of swap is neccessary condition to trigger the bug. Next, OOM > calculation does not account for a possibility that almost all pages = on > the queues can be reused. It just fires if free pages depleted too = much > or free target cannot be reached. >=20 > IMO one of the possible solution is to account the queued pages in > addition to the swap space. This is not entirely accurate, since some > pages on the queues cannot be reused, at least transiently. Most = precise > algorithm would count the hold and busy pages globally, and substract > this count from queues length, but it is probably too costly. >=20 > Instead, I think we could rely on the numbers which are counted by > pagedaemon threads during the passes. Due to the transient nature of = the > pagedaemon failures, this should be fine. >=20 > Below is the prototype patch, against HEAD. It is not applicable to > stable, please use HEAD kernel for test. Hello, any chance to commit this patch? Thanks! >=20 > diff --git a/sys/sys/vmmeter.h b/sys/sys/vmmeter.h > index d2ad920..ee5159a 100644 > --- a/sys/sys/vmmeter.h > +++ b/sys/sys/vmmeter.h > @@ -93,9 +93,10 @@ struct vmmeter { > u_int v_free_min; /* (c) pages desired free */ > u_int v_free_count; /* (f) pages free */ > u_int v_wire_count; /* (a) pages wired down */ > - u_int v_active_count; /* (q) pages active */ > + u_int v_active_count; /* (a) pages active */ > u_int v_inactive_target; /* (c) pages desired inactive */ > - u_int v_inactive_count; /* (q) pages inactive */ > + u_int v_inactive_count; /* (a) pages inactive */ > + u_int v_queue_sticky; /* (a) pages on queues but cannot = process */ > u_int v_cache_count; /* (f) pages on cache queue */ > u_int v_cache_min; /* (c) min pages desired on cache queue = */ > u_int v_cache_max; /* (c) max pages in cached obj (unused) = */ > diff --git a/sys/vm/vm_meter.c b/sys/vm/vm_meter.c > index 713a2be..4bb1f1f 100644 > --- a/sys/vm/vm_meter.c > +++ b/sys/vm/vm_meter.c > @@ -316,6 +316,7 @@ VM_STATS_VM(v_active_count, "Active pages"); > VM_STATS_VM(v_inactive_target, "Desired inactive pages"); > VM_STATS_VM(v_inactive_count, "Inactive pages"); > VM_STATS_VM(v_cache_count, "Pages on cache queue"); > +VM_STATS_VM(v_queue_sticky, "Pages which cannot be moved from = queues"); > VM_STATS_VM(v_cache_min, "Min pages on cache queue"); > VM_STATS_VM(v_cache_max, "Max pages on cached queue"); > VM_STATS_VM(v_pageout_free_min, "Min pages reserved for kernel"); > diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h > index 7846702..6943a0e 100644 > --- a/sys/vm/vm_page.h > +++ b/sys/vm/vm_page.h > @@ -226,6 +226,7 @@ struct vm_domain { > long vmd_segs; /* bitmask of the segments */ > boolean_t vmd_oom; > int vmd_pass; /* local pagedaemon pass */ > + int vmd_queue_sticky; /* pages on queues which cannot be = processed */ > struct vm_page vmd_marker; /* marker for pagedaemon private use = */ > }; >=20 > diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c > index 5660b56..a62cf97 100644 > --- a/sys/vm/vm_pageout.c > +++ b/sys/vm/vm_pageout.c > @@ -896,7 +896,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > { > vm_page_t m, next; > struct vm_pagequeue *pq; > - int page_shortage, maxscan, pcount; > + int failed_scan, page_shortage, maxscan, pcount; > int addl_page_shortage; > vm_object_t object; > int act_delta; > @@ -960,6 +960,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > */ > pq =3D &vmd->vmd_pagequeues[PQ_INACTIVE]; > maxscan =3D pq->pq_cnt; > + failed_scan =3D 0; > vm_pagequeue_lock(pq); > queues_locked =3D TRUE; > for (m =3D TAILQ_FIRST(&pq->pq_pl); > @@ -1012,6 +1013,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > vm_page_unlock(m); > VM_OBJECT_WUNLOCK(object); > addl_page_shortage++; > + failed_scan++; > continue; > } >=20 > @@ -1075,6 +1077,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > * loop over the active queue below. > */ > addl_page_shortage++; > + failed_scan++; > goto relock_queues; > } >=20 > @@ -1229,6 +1232,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > */ > if (vm_page_busied(m)) { > vm_page_unlock(m); > + failed_scan++; > goto unlock_and_continue; > } >=20 > @@ -1241,6 +1245,7 @@ vm_pageout_scan(struct vm_domain *vmd, int pass) > vm_page_requeue_locked(m); > if (object->flags & = OBJ_MIGHTBEDIRTY) > vnodes_skipped++; > + failed_scan++; > goto unlock_and_continue; > } > vm_pagequeue_unlock(pq); > @@ -1386,6 +1391,11 @@ relock_queues: > m =3D next; > } > vm_pagequeue_unlock(pq); > + > + atomic_add_int(&cnt.v_queue_sticky, failed_scan - > + vmd->vmd_queue_sticky); > + vmd->vmd_queue_sticky =3D failed_scan; > + > #if !defined(NO_SWAPPING) > /* > * Idle process swapout -- run once per second. > @@ -1433,10 +1443,15 @@ static int vm_pageout_oom_vote; > static void > vm_pageout_mightbe_oom(struct vm_domain *vmd, int pass) > { > + u_int queues_count; > int old_vote; >=20 > - if (pass <=3D 1 || !((swap_pager_avail < 64 && = vm_page_count_min()) || > - (swap_pager_full && vm_paging_target() > 0))) { > + queues_count =3D cnt.v_active_count + cnt.v_inactive_count - > + cnt.v_queue_sticky; > + if (pass <=3D 1 || !((swap_pager_avail < 64 && = vm_page_count_min() && > + queues_count <=3D cnt.v_free_min) || > + (swap_pager_full && vm_paging_target() > 0 && > + queues_count <=3D vm_paging_target()))) { > if (vmd->vmd_oom) { > vmd->vmd_oom =3D FALSE; > atomic_subtract_int(&vm_pageout_oom_vote, 1);