From owner-freebsd-hackers@freebsd.org  Sun Jul  3 00:09:07 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEA76B81222
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 00:09:07 +0000 (UTC)
 (envelope-from nwhitehorn@freebsd.org)
Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CC37D20C3
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 00:09:07 +0000 (UTC)
 (envelope-from nwhitehorn@freebsd.org)
Received: from zeppelin.tachypleus.net ([75.104.66.200]) (authenticated bits=0)
 by d.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id u6308oKv001358
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);
 Sat, 2 Jul 2016 17:08:56 -0700
Subject: Re: Review request: sparse CPU ID maps
To: outro pessoa <outro.pessoa@gmail.com>
References: <57761101.3030101@freebsd.org>
 <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
From: Nathan Whitehorn <nwhitehorn@freebsd.org>
Message-ID: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
Date: Sat, 2 Jul 2016 17:08:54 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
X-Sonic-CAuth: UmFuZG9tSVYKmi8BVaO5vdwkFcy3d3W2aVNoXDMOt/jA28xtNafKqHjwt8UBhPWQ+fp0+rdq8bMkuG8QZshgnKtV8sUFzvflfsO0Q/6f0hY=
X-Sonic-ID: C;TiMlUrJA5hGo5pNwxPCmMQ== M;dKzDVLJA5hGo5pNwxPCmMQ==
X-Spam-Flag: No
X-Sonic-Spam-Details: 0.0/5.0 by cerberusd
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 00:09:08 -0000

A reasonable first pass at checking for this kind of bug is doing grep 
-lR '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the 
following files:
arm/mv/armadaxp/armadaxp_mp.c
arm/include/counter.h
arm/broadcom/bcm2835/bcm2836.c
arm/broadcom/bcm2835/bcm2836_mp.c
arm/freescale/imx/imx6_mp.c
arm/allwinner/aw_mp.c
arm/rockchip/rk30xx_mp.c
arm/amlogic/aml8726/aml8726_mp.c
arm/samsung/exynos/exynos5_mp.c
arm/arm/mp_machdep.c
arm/nvidia/tegra124/tegra124_mp.c
arm64/include/counter.h
arm64/arm64/gic_v3.c
arm64/arm64/gic_v3_its.c
arm64/arm64/gicv3_its.c

All of them should, in some sense, be CPU_FOREACH(), but it may not 
matter. For example, it may not be possible to have sparse CPU IDs on 
some or all of those SOCs. At least the generic ones (counter, 
mp_machdep.c, gic (why are there both gic_v3_its.c and gicv3_its.c?)) 
should be changed, I think.
-Nathan

On 07/02/16 10:31, outro pessoa wrote:
> Nathan,
> What type of ARM hardware do you need?
>
> On Fri, Jul 1, 2016 at 2:43 AM, Nathan Whitehorn 
> <nwhitehorn@freebsd.org <mailto:nwhitehorn@freebsd.org>> wrote:
>
>     I have been working on fixing PR 210106
>     (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210106) and
>     have run into the fact that several pieces of the kernel, notably
>     parts of subr_taskqueue.c, require that CPU IDs be dense in the
>     range [0, mp_ncpus), which the kernel does not guarantee, for
>     example in the case of CPUs with hyperthreading in which the
>     threading is disabled. This is leading to hangs in late boot in
>     -CURRENT.
>
>     I've prepared the following patch, which fixes PR 210106, but I
>     would like a few more eyeballs on it before committing it. It
>     fixes most of the bogus uses of mp_ncpus in the kernel, but not
>     all: doing grep -R '< mp_ncpus' /sys | wc -l gives 52 remaining
>     instances of loops in [0, mp_ncpus) or [1, mp_ncpus), most or all
>     of which should instead be CPU_FOREACH(), but none of which I feel
>     comfortable changing (36 are in ARM code for hardware I don't have
>     access to).
>
>     The patch lives here:
>     http://people.freebsd.org/~nwhitehorn/sparse_cpu_ids.diff
>     <http://people.freebsd.org/%7Enwhitehorn/sparse_cpu_ids.diff>
>     -Nathan
>     _______________________________________________
>     freebsd-hackers@freebsd.org <mailto:freebsd-hackers@freebsd.org>
>     mailing list
>     https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>     To unsubscribe, send any mail to
>     "freebsd-hackers-unsubscribe@freebsd.org
>     <mailto:freebsd-hackers-unsubscribe@freebsd.org>"
>
>


From owner-freebsd-hackers@freebsd.org  Sun Jul  3 02:30:19 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3160BB8FD8E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 02:30:19 +0000 (UTC)
 (envelope-from paul.koch137@gmail.com)
Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com
 [IPv6:2607:f8b0:400e:c03::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0888228AC
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 02:30:19 +0000 (UTC)
 (envelope-from paul.koch137@gmail.com)
Received: by mail-pa0-x22b.google.com with SMTP id zl15so8648703pab.3
 for <freebsd-hackers@freebsd.org>; Sat, 02 Jul 2016 19:30:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=;
 b=kFzUvI8nHwSI+GJ0/geizmv7pFpNtoDzYxiO965F/zvJLoWIFZ0mqNzjkQNOdK1QlO
 AeevwnC5vhojS1ghzv4OuGzSWsHU9tKiJXLhbzBODjehDrLSnAMiZQwsoGGC6KiI0nli
 21U8qN2EoqoFX1yCtpCBBJoRH1XA0QR/ipq2ksqnnGMgiiFFqCeSedkuM5BnX3qljPUZ
 jepSkOAYnDB4ylmwIFvH+HMvrue9nnnnnaxq8E7wMJFaIEwzUaGjTSAhRKRWGStrt8V8
 LLpGc/XZacY/IvD2reoKNTUbWPjeOSvQmfr7U6gTs6993DZSBKT8N/GkEnXUjADHD7bP
 lLpQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=;
 b=QYhEG2O3X9qSfIhDn6nhuTQUZguJ7KntyrjejbAdZoJ8TRmF5xmnI8Pq1cPnhMoXx7
 W6dR9Sq6dUaK0rL1fGtr1Xxa2UebsM12yZAjlabLtuitNYeG/LXFcHQJoO/kWtGdBBsh
 2bUql57/KADXsr5MpUMZyl+LfudM+ZDCHhgeJ62q8lkDfE6MKxTDXy7CAX70dLGtoWL8
 N7Qtwiic1VSqUXfBJzxg+53RDwGCpsijnufoa3ghdKEIPx99bTzJvbITcM9D8F1KieS2
 4mkjAhemDI3XzYBuC54m/Gqx1J/b5r7T9h+jb0RZLYQ6y+OS94EpPV7jInTdUyndGm9K
 y89Q==
X-Gm-Message-State: ALyK8tL6asMcrCUtgocJ11GuYyKKAdUMti2wUf4j0CS7/W+jBbKw0mDr6/y3j6Z+vs4d6w==
X-Received: by 10.66.193.231 with SMTP id hr7mr10761945pac.28.1467513018439;
 Sat, 02 Jul 2016 19:30:18 -0700 (PDT)
Received: from splash.akips.com (CPE-120-146-191-2.static.qld.bigpond.net.au.
 [120.146.191.2])
 by smtp.gmail.com with ESMTPSA id o68sm698740pfb.18.2016.07.02.19.30.16
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sat, 02 Jul 2016 19:30:17 -0700 (PDT)
Date: Sun, 3 Jul 2016 12:30:04 +1000
From: Paul Koch <paul.koch137@gmail.com>
To: Cedric Blancher <cedric.blancher@gmail.com>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: ZFS ARC and mmap/page cache coherency question
Message-ID: <20160703123004.74a7385a@splash.akips.com>
In-Reply-To: <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 02:30:19 -0000


Is there a "long story", or is mmap() performance on ZFS doomed for the
foreseeable future ?

	Paul.

> Short story: ZFS was tacked on the kernel and was never properly
> integrated into the VM page management, which leads to DRAMATIC poor
> performance for anything which uses mmap() for write IO. This was
> solved in Oracle Solaris with the great VM allocator rewrite which
> landed after Opensolaris was made closed source again.
> 
> Without a complete rewrite of the VM system this problem is unsolvable.
> 
> Ced
> 
> On 30 June 2016 at 06:06, Paul Koch <paul.koch137@gmail.com> wrote:
> >
> > Posted this to -stable on the 15th June, but no feedback...
> >
> > We are trying to understand a performance issue when syncing large mmap'ed
> > files on ZFS.
> >
> > Example test box setup:
> >  FreeBSD 10.3-p5
> >  Intel i7-5820K 3.30GHz with 64G RAM
> >  6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe
> >
> > Read performance of a sequentially written large file on the pool is
> > typically around 950Mbytes/sec using dd.
> >
> > Our software mmap's some large database files using MAP_NOSYNC, and we
> > call fsync() every 10 minutes when we know the file system is mostly
> > idle.  In our test setup, the database files are 1.1G, 2G, 1.4G, 12G,
> > 4.7G and ~20 small files (under 10M).  All of the memory pages in the
> > mmap'ed files are updated every minute with new values, so the entire
> > mmap'ed file needs to be synced to disk, not just fragments.
> >
> > When the 10 minute fsync() occurs, gstat typically shows very little disk
> > reads and very high write speeds, which is what we expect.  But, every 80
> > minutes we process the data in the large mmap'ed files and store it in
> > highly compressed blocks of a ~300G file using pread/pwrite (i.e. not
> > mmap'ed). After that, the performance of the next fsync() of the mmap'ed
> > files falls off a cliff.  We are assuming it is because the ARC has
> > thrown away the cached data of the mmap'ed files.  gstat shows lots of
> > read/write contention and lots of things tend to stall waiting for disk.
> >
> > Is this just a lack of ZFS ARC and page cache coherency ??
> >
> > Is there a way to prime the ARC with the mmap'ed files again before we
> > call fsync() ?
> >
> > We've tried cat and read() on the mmap'ed files but doesn't seem to touch
> > the disk at all and the fsync() performance is still poor, so it looks
> > like the ARC is not being filled.  msync() doesn't seem to be much
> > different. mincore() stats show the mmap'ed data is entirely incore and
> > referenced.
> >
> >         Paul.
> > _______________________________________________
> > freebsd-hackers@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to
> > "freebsd-hackers-unsubscribe@freebsd.org"  

From owner-freebsd-hackers@freebsd.org  Sun Jul  3 07:45:28 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62A4AB902AE
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 07:45:28 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 52B3C2D32
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 07:45:27 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467531924887811.5045019290758;
 Sun, 3 Jul 2016 00:45:24 -0700 (PDT)
Date: Sun, 03 Jul 2016 00:45:24 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Paul Koch" <paul.koch137@gmail.com>
Cc: "Cedric Blancher" <cedric.blancher@gmail.com>, 
 "freebsd-hackers" <freebsd-hackers@freebsd.org>
Message-ID: <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
In-Reply-To: <20160703123004.74a7385a@splash.akips.com>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
MIME-Version: 1.0
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 07:45:28 -0000


       =20

       =20
            Cedric greatly overstates the intractability of resolving it. N=
onetheless, since the initial import very little has=C2=A0been done to impr=
ove integration, and I don't know of anyone who is up to the task taking an=
 interest in it. Consequently, mmap() performance is likely "doomed" for th=
e foreseeable future.-M---- On Sat, 02 Jul 2016 19:30:04 -0700  Paul Koch<p=
aul.koch137@gmail.com> wrote ---- Is there a "long story", or is mmap() per=
formance on ZFS doomed for the foreseeable future ?  =C2=A0=C2=A0=C2=A0=C2=
=A0Paul.  > Short story: ZFS was tacked on the kernel and was never properl=
y > integrated into the VM page management, which leads to DRAMATIC poor > =
performance for anything which uses mmap() for write IO. This was > solved =
in Oracle Solaris with the great VM allocator rewrite which > landed after =
Opensolaris was made closed source again. >  > Without a complete rewrite o=
f the VM system this problem is unsolvable. >  > Ced >  > On 30 June 2016 a=
t 06:06, Paul Koch <paul.koch137@gmail.com> wrote: > > > > Posted this to -=
stable on the 15th June, but no feedback... > > > > We are trying to unders=
tand a performance issue when syncing large mmap'ed > > files on ZFS. > > >=
 > Example test box setup: > >  FreeBSD 10.3-p5 > >  Intel i7-5820K 3.30GHz=
 with 64G RAM > >  6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe >=
 > > > Read performance of a sequentially written large file on the pool is=
 > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's s=
ome large database files using MAP_NOSYNC, and we > > call fsync() every 10=
 minutes when we know the file system is mostly > > idle.  In our test setu=
p, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files=
 (under 10M).  All of the memory pages in the > > mmap'ed files are updated=
 every minute with new values, so the entire > > mmap'ed file needs to be s=
ynced to disk, not just fragments. > > > > When the 10 minute fsync() occur=
s, gstat typically shows very little disk > > reads and very high write spe=
eds, which is what we expect.  But, every 80 > > minutes we process the dat=
a in the large mmap'ed files and store it in > > highly compressed blocks o=
f a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the p=
erformance of the next fsync() of the mmap'ed > > files falls off a cliff. =
 We are assuming it is because the ARC has > > thrown away the cached data =
of the mmap'ed files.  gstat shows lots of > > read/write contention and lo=
ts of things tend to stall waiting for disk. > > > > Is this just a lack of=
 ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the AR=
C with the mmap'ed files again before we > > call fsync() ? > > > > We've t=
ried cat and read() on the mmap'ed files but doesn't seem to touch > > the =
disk at all and the fsync() performance is still poor, so it looks > > like=
 the ARC is not being filled.  msync() doesn't seem to be much > > differen=
t. mincore() stats show the mmap'ed data is entirely incore and > > referen=
ced. > > > >         Paul. > > ____________________________________________=
___ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.=
org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to >=
 > "freebsd-hackers-unsubscribe@freebsd.org"   ____________________________=
___________________ freebsd-hackers@freebsd.org mailing list https://lists.=
freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail =
to "freebsd-hackers-unsubscribe@freebsd.org"=20
       =20
       =20

   =20
   =20


From owner-freebsd-hackers@freebsd.org  Sun Jul  3 15:50:58 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96842B8FCEF
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 15:50:58 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3D14F28D7
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 15:50:57 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 9D8A121E527
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 10:43:34 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
Date: Sun, 3 Jul 2016 10:43:19 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms010805070601040608020308"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 15:50:58 -0000

This is a cryptographically signed message in MIME format.

--------------ms010805070601040608020308
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On 7/3/2016 02:45, Matthew Macy wrote:
>        =20
>             Cedric greatly overstates the intractability of resolving i=
t. Nonetheless, since the initial import very little has been done to imp=
rove integration, and I don't know of anyone who is up to the task taking=
 an interest in it. Consequently, mmap() performance is likely "doomed" f=
or the foreseeable future.-M----=20

Wellllll....

I've done a fair bit of work here (see
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the
political issues are at least as bad as the coding ones.

In short what Cedric says about the root of the issue is real.  VM is
really-well implemented for what it handles, but the root of the issue
is that while the UFS data cache is part of VM and thus it "knows" about
it, ZFS is not because it is a "bolt-on."  UMA leads to further (severe)
complications for certain workloads.=20

Finally the underlying ZFS dmu_tx sizing code is just plain wrong and in
fact this is one of the biggest issues as when the system runs into
trouble it can take a bad situation and make it a *lot* worse.  There is
only one write-back cache maintained instead of one per zvol, and that's
flat-out broken.  Being able to re-order async writes to disk (where
fsync() has not been called) and minimizing seek latency is excellent.=20
Sadly rotating media these days sabotages much of this due to opacity
introduced at the drive level (e.g. varying sector counts per track,
etc) but it can still help.  But where things go dramatically wrong is
on a system where a large write-back cache is allocated relative to the
underlying zvol I/O performance (this occurs on moderately-large and
bigger RAM systems) with moderate numbers of modest-performance rotating
media; in this case it is entirely possible for a flush of the write
buffers to require upwards of a *minute* to complete, during which all
other writes block.  If this happens during periods of high RAM demand
and you manage to trigger a page-out at the same time system performance
will go straight into the toilet.  I have seen instances where simply
trying to edit a text file with vi (or a "select" against a database
table) will hang for upwards of a minute leading you to believe the
system has crashed, when it fact it has not.

The interaction of VM with the above can lead to severe pathological
behavior because the VM system has no way to tell the ZFS subsystem to
pare back ARC (and at least as important, perhaps more-so -- unused but
allocated UMA) when memory pressure exists *before* it pages.  ZFS tries
to detect memory pressure and do this itself but it winds up competing
with the VM system.  This leads to demonstrably wrong behavior because
you never want to hold disk cache in preference to RSS; if you have a
block of data from the disk the best case is you avoid one I/O (to
re-read it); if you page you are *guaranteed* to take one I/O (to write
the paged-out RSS to disk) and *might* take two (if you then must read
it back in.)

In short trading the avoidance of one *possible* I/O for a *guaranteed*
I/O and a second possible one is *always* a net lose.

To "fix" all of this "correctly" (for all cases, instead of certain
cases) VM would have to "know" about ARC and its use of UMA, along with
being able to police both.  ZFS also must have the dmu_tx writeback
cache sized per-zvol with its size chosen by the actual I/O performance
characteristics of the disks in the zvol itself.  I've looked into doing
both and it's fairly complex, and what's worse is that it would
effectively "marry" VM and ZFS, removing the "bolt-on" aspect of
things.  This then leads to a lot of maintenance work over time because
any time ZFS code changes (and it does, quite a bit) you then have to go
back through that process in order to become coherent with Illumos.

The PR above resolved (completely) the issues I was having along with a
number of other people on 10.x and before (I've not yet rolled it
forward to 11.) but it's quite clearly a hack of sorts, in that it
detects and treats symptoms (e.g. dynamic TX cache size modification,
etc) rather than integrating VM and ZFS cache management.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms010805070601040608020308
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDMxNTQzMTlaME8GCSqGSIb3DQEJBDFCBECY
/gnxWw2Ru9QcdkEP45S3vFDHKc0DCTSTjQ0/rDnq0wnGcZ7nZvzOcYwUObgkXsJxxiNj3mAW
k5yFS2ELzpI5MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAKEFxS8zS
ezGQK4SJsAWSr1expcp8Abo06jjXZRPsJMlJPu+Pc7LKrjOQzlAtiqq5jhw0X42nmY/NC85y
8hOrB4PBxor36GgWp5+2v/mIgyA1xsE87UGedFZ7WKT9DtlJszM9zqd2uvDpFXK6tsj2ye3K
8XvRi6cfY5HnBwnqhi0Qr8e+60K7QXY1YEnKKeABFRpIRLBB2IzHihRcoL/AhpUnoZzUqUYc
ZVOvI+xK7L7sw0nw95ovvYBOwuxKOTj6CVki58uTiKDpF4rV/SK+v4wXeD+N7dyNH/HR6T6i
uZn2jiLwVVGbluAJHpOKrHBS0/NeD34wCX1QIB3mWVELPRHQpoALwwsBBMEUGyrCVld8siSL
tZM0eq/YLl+7ruc9+dbKcKCKOKYfWZzyy97Y0VAzj/4RDgUJstb6xzRouaMJdFHXCDAWBByn
DQxNCkObmSh8sKtGEJfbLihS0qbEvCZW5f54HkaKLE8i8B1tIAKzpaEFrcI63zpYXAnFx8ZL
UNykcm06JPE9N0BtFkrcj/a1KWdqYxK+m4N10l8UAAaNj6e0rhfhqB7TPap/XNUSTbfwMWMD
+58m5iLOV7WJil+p+rY6KoAkenGkF6tB5sx2ut2dBhOl4evPa5/5KVo0ngsMsxKgMdxS6Txe
D8tDQcpYB4bS7nMKOtX8qoxGVqIAAAAAAAA=
--------------ms010805070601040608020308--

From owner-freebsd-hackers@freebsd.org  Sun Jul  3 19:37:28 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5939B90B8E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 19:37:28 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-io0-x230.google.com (mail-io0-x230.google.com
 [IPv6:2607:f8b0:4001:c06::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 936D521BC;
 Sun,  3 Jul 2016 19:37:28 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: by mail-io0-x230.google.com with SMTP id g13so137407693ioj.1;
 Sun, 03 Jul 2016 12:37:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=BDvzoTm8PMvSDcU2jfKDHIPOKMsfMHdc1yat19v2TpQ=;
 b=0WHFKMVgw+UY27TyyQt2SySIU6TN6gRN2azg8VppIH3Yb75GZraA15/S0j4ewRGVTe
 dnXRd0QmW1a2jUWRngAmgzJPKraF1DEqmcmqr2GNnjKmM6v85csTuFWgwaPNAdF1dorW
 Xf396ee6Ogy6T3oIFV88zAgsIRQ+CpoM0O0nfnRlPOqiROsDJdHGJdDl44Ixzmijzu1l
 EUnaXeXSEzQ9XTdsbsqgqElxJqxpz8vak7UUCCKwXDMbDFZNceNN2l8XO5CaX84rBAsC
 gtgGg+U4GHxuzjfH6Kmhd49LWfzCIQa527GAu9SI9Lw08LwIKlQb661eBOvmDTY71XvH
 Z2IQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=BDvzoTm8PMvSDcU2jfKDHIPOKMsfMHdc1yat19v2TpQ=;
 b=KbLZlwB8PoEgvteTd42ej0f67uLArmoLLrmcepScHr17nqJR3mdKmg611TYpo6yiM2
 swGJp4NReoGp51rzn77IMdBuGQGXiF8wenoOylVWJDHe3lCWTaGZ3XjLT+suxYRImd3j
 KXxuch/IXObZP8udMgMa7MEhSbbiZmQpPe5dEC1L9FCSXub36X3P4EhhPVDxPmYASP/c
 6Eio4RG5mmrmwXCsUZZmLvakYkMnu8mdf70HZVoZ4jI3PYpBBWT1109IEEyi2BewTRtR
 D/z/Fqj5QUySN7x5hV4R1WKQvwxvIcVPWM4NSv9VXA/0uMVMzR/bu4vEqRluJYbdoEiL
 sLUA==
X-Gm-Message-State: ALyK8tLIFAWUXJ7d3EK1bNeVEMstpb/gkvyxs1DE2MWwWHH1uLc0/0jJDKsvgFY68UW+bv+3+5fievJsUzbVGw==
X-Received: by 10.107.144.86 with SMTP id s83mr5793822iod.165.1467574647939;
 Sun, 03 Jul 2016 12:37:27 -0700 (PDT)
MIME-Version: 1.0
Sender: adrian.chadd@gmail.com
Received: by 10.36.210.212 with HTTP; Sun, 3 Jul 2016 12:37:27 -0700 (PDT)
In-Reply-To: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
References: <57761101.3030101@freebsd.org>
 <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
 <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
From: Adrian Chadd <adrian@freebsd.org>
Date: Sun, 3 Jul 2016 12:37:27 -0700
X-Google-Sender-Auth: hmGAQPKUi-aXkQZh1CznYJTojpg
Message-ID: <CAJ-Vmon4kRNc5LiwibtiPi_FQ1v5w_MQEjP+OfcC7J74iTKs0A@mail.gmail.com>
Subject: Re: Review request: sparse CPU ID maps
To: Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc: outro pessoa <outro.pessoa@gmail.com>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 19:37:28 -0000

On 2 July 2016 at 17:08, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote:
> A reasonable first pass at checking for this kind of bug is doing grep -lR
> '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following
> files:
> arm/mv/armadaxp/armadaxp_mp.c
> arm/include/counter.h
> arm/broadcom/bcm2835/bcm2836.c
> arm/broadcom/bcm2835/bcm2836_mp.c
> arm/freescale/imx/imx6_mp.c
> arm/allwinner/aw_mp.c
> arm/rockchip/rk30xx_mp.c
> arm/amlogic/aml8726/aml8726_mp.c
> arm/samsung/exynos/exynos5_mp.c
> arm/arm/mp_machdep.c
> arm/nvidia/tegra124/tegra124_mp.c
> arm64/include/counter.h
> arm64/arm64/gic_v3.c
> arm64/arm64/gic_v3_its.c
> arm64/arm64/gicv3_its.c
>
> All of them should, in some sense, be CPU_FOREACH(), but it may not matter.
> For example, it may not be possible to have sparse CPU IDs on some or all of
> those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are
> there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think.
> -Nathan

I think converting all the users over to the CPU_FOREACH thing is the
right way to go, even if the SOC doesn't require it. People do bring
up new systems by copy/pasta'ing an existing similar system, so we're
best served by having all the consumers migrated.

But, I'd do it in head/12. Early in head/12. :-P

-adrian

From owner-freebsd-hackers@freebsd.org  Sun Jul  3 20:11:01 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4F26B9055E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sun,  3 Jul 2016 20:11:01 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: from mail-it0-x229.google.com (mail-it0-x229.google.com
 [IPv6:2607:f8b0:4001:c0b::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6C8FC20BC
 for <freebsd-hackers@freebsd.org>; Sun,  3 Jul 2016 20:11:01 +0000 (UTC)
 (envelope-from wlosh@bsdimp.com)
Received: by mail-it0-x229.google.com with SMTP id f6so14268444ith.0
 for <freebsd-hackers@freebsd.org>; Sun, 03 Jul 2016 13:11:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bsdimp-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=/qpGdDeYnkbKhUu/2jW0D/26XASzkXIAXyt6OyfN9yc=;
 b=KqshAoCtzHR4UgEwmVQU7WNVDhIp4DiB/LOzsM3e2mL4Vgbn9AEAMF0i6/TBpJ8v1N
 kTibrhgut9PhReo641/VDtyHIQ7NoqupHlK0EVAkNUXevPJIO3Z3yYDblA/d29y9oS3C
 eEoYwkY4jWb2BIiXyVjJ9/E8gMjJgLrTwF6PdLhKKN77Yn7zPkY7IqIVdINn+4FtCjG8
 PFtRbGUUWmVvP+8v3sG95THfg/TqvFD2TgEDQYGQoCk+y+URT/JUZwq4kGZFR3+fenTu
 geu+i1UG9venGzwW3ZQsneK2AjI5dYLDz0S4vHBXzYsG+F/EOx9YKuQjVjY3Dhs6Pzsy
 FGHw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=/qpGdDeYnkbKhUu/2jW0D/26XASzkXIAXyt6OyfN9yc=;
 b=fwmk7cRNNpGzSEeQWRnoZlV7XdgpN4Qf6F5mZvwbfKVk4zRIxjK4Ms8YJ4npI+7kIt
 0+/QiB7qmJk2kj8oovaHIYYzPZshfSGR/YLinJG366HGD5B/78cLIm54a4rXgsne7s47
 GXUySfqUgz0oGBI7Si3Go3jQuVTScNVcD97aOgwaAnRb9QAcHmcfq5FPavUUoCllCgPM
 mdrUZflvdetREw6GOjjAyw0jVjT0hhoSwFmVVzs8t6/Chp7wGX6VA7dUL+S3e4YFSxpk
 O3rhywqfs4TVSwDqh+DHEjhoOPE3EBzfveoYMznWLQEHI16tJT6hJQRTt05N83fhsdIx
 Uarw==
X-Gm-Message-State: ALyK8tIBj97vpcdC/5CFnStUU24/dfh53uWj9+wyu/xmv7RMnt2PBv64GlFq/rHWY67GwZZMJQRwxQZbNYAnxw==
X-Received: by 10.36.41.16 with SMTP id p16mr7184342itp.60.1467576660661; Sun,
 03 Jul 2016 13:11:00 -0700 (PDT)
MIME-Version: 1.0
Sender: wlosh@bsdimp.com
Received: by 10.79.137.131 with HTTP; Sun, 3 Jul 2016 13:11:00 -0700 (PDT)
X-Originating-IP: [69.53.245.200]
In-Reply-To: <CAJ-Vmon4kRNc5LiwibtiPi_FQ1v5w_MQEjP+OfcC7J74iTKs0A@mail.gmail.com>
References: <57761101.3030101@freebsd.org>
 <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
 <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
 <CAJ-Vmon4kRNc5LiwibtiPi_FQ1v5w_MQEjP+OfcC7J74iTKs0A@mail.gmail.com>
From: Warner Losh <imp@bsdimp.com>
Date: Sun, 3 Jul 2016 14:11:00 -0600
X-Google-Sender-Auth: 6dKLTeRARZHGUyjQWLKVe2aJG7k
Message-ID: <CANCZdfpJJLoKxB-ZdMRQyHq9eT1uihg4UGeBvRgBEOOC1pt_Yg@mail.gmail.com>
Subject: Re: Review request: sparse CPU ID maps
To: Adrian Chadd <adrian@freebsd.org>
Cc: Nathan Whitehorn <nwhitehorn@freebsd.org>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 outro pessoa <outro.pessoa@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Jul 2016 20:11:01 -0000

On Sun, Jul 3, 2016 at 1:37 PM, Adrian Chadd <adrian@freebsd.org> wrote:
> On 2 July 2016 at 17:08, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote:
>> A reasonable first pass at checking for this kind of bug is doing grep -lR
>> '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following
>> files:
>> arm/mv/armadaxp/armadaxp_mp.c
>> arm/include/counter.h
>> arm/broadcom/bcm2835/bcm2836.c
>> arm/broadcom/bcm2835/bcm2836_mp.c
>> arm/freescale/imx/imx6_mp.c
>> arm/allwinner/aw_mp.c
>> arm/rockchip/rk30xx_mp.c
>> arm/amlogic/aml8726/aml8726_mp.c
>> arm/samsung/exynos/exynos5_mp.c
>> arm/arm/mp_machdep.c
>> arm/nvidia/tegra124/tegra124_mp.c
>> arm64/include/counter.h
>> arm64/arm64/gic_v3.c
>> arm64/arm64/gic_v3_its.c
>> arm64/arm64/gicv3_its.c
>>
>> All of them should, in some sense, be CPU_FOREACH(), but it may not matter.
>> For example, it may not be possible to have sparse CPU IDs on some or all of
>> those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are
>> there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think.
>> -Nathan
>
> I think converting all the users over to the CPU_FOREACH thing is the
> right way to go, even if the SOC doesn't require it. People do bring
> up new systems by copy/pasta'ing an existing similar system, so we're
> best served by having all the consumers migrated.
>
> But, I'd do it in head/12. Early in head/12. :-P

It is a mergeable change too, since it wouldn't change any APIs.
At least the conversion to CPU_FOREACH. We don't want too many
sweeping changes that can't be merged too early (that way leads to
lots of maintenance issues), but we can do something like this. Merging
would be optional, but possible, for those bits of the tree that need it.
Though, for something like this, there's little against doing a full merge
and a lot for it...

Warner

From owner-freebsd-hackers@freebsd.org  Mon Jul  4 23:45:54 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C369B9135A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Mon,  4 Jul 2016 23:45:54 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7DFF72E23
 for <freebsd-hackers@freebsd.org>; Mon,  4 Jul 2016 23:45:54 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467675943133327.7328959933959;
 Mon, 4 Jul 2016 16:45:43 -0700 (PDT)
Date: Mon, 04 Jul 2016 16:45:43 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Karl Denninger" <karl@denninger.net>
Cc: "" <freebsd-hackers@freebsd.org>
Message-ID: <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
In-Reply-To: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
Subject: Re: ZFS ARC and mmap/page cache coherency question
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jul 2016 23:45:54 -0000


 ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger.net> wrote ---- 
 >  
 > On 7/3/2016 02:45, Matthew Macy wrote: 
 > >          
 > >             Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----  
 >  
 > Wellllll.... 
 >  
 > I've done a fair bit of work here (see 
 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the 
 > political issues are at least as bad as the coding ones. 
 >  
 

Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost.

-M


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 02:26:32 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4733FB92403
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 02:26:32 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 027F129DF
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:26:31 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 49DE7220E9A
 for <freebsd-hackers@freebsd.org>; Mon,  4 Jul 2016 21:26:22 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
To: freebsd-hackers@freebsd.org
From: Karl Denninger <karl@denninger.net>
Message-ID: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
Date: Mon, 4 Jul 2016 21:26:06 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms050507020202060509060304"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 02:26:32 -0000

This is a cryptographically signed message in MIME format.

--------------ms050507020202060509060304
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On 7/4/2016 18:45, Matthew Macy wrote:
>
>
>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger=
=2Enet> wrote ----=20
>  > =20
>  > On 7/3/2016 02:45, Matthew Macy wrote:=20
>  > >         =20
>  > >             Cedric greatly overstates the intractability of resolv=
ing it. Nonetheless, since the initial import very little has been done t=
o improve integration, and I don't know of anyone who is up to the task t=
aking an interest in it. Consequently, mmap() performance is likely "doom=
ed" for the foreseeable future.-M---- =20
>  > =20
>  > Wellllll....=20
>  > =20
>  > I've done a fair bit of work here (see=20
>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the =

>  > political issues are at least as bad as the coding ones.=20
>  > =20
> =20
>
> Strictly speaking, the root of the problem is the ARC. Not ZFS per se. =
Have you ever tried disabling MFU caching to see how much worse LRU only =
is? I'm not really convinced the ARC's benefits justify its cost.
>
> -M
>

The ARC is very useful when it gets a hit as it avoid an I/O that would
otherwise take place.

Where it sucks is when the system evicts working set to preserve ARC.=20
That's always wrong in that you're trading a speculative I/O (if the
cache is hit later) for a *guaranteed* one (to page out) and maybe *two*
(to page back in.)

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms050507020202060509060304
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjI2MDZaME8GCSqGSIb3DQEJBDFCBEB1
WHQWd/psHthhOmx/UBbTVc/rRuUJykgCh15FTom2W0LKTiXE9vmkdvRia04S+F+p55k/9neE
8y9/BNIXfflYMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAXmPT56G/
zjai5XBUmaTMX9oWvJ8MVZbiI7QZkjTW9daqQKWw2zmCuzWn3LanfCeFahHiCnGjl+N+mmdg
tcdM59FD0wG0zEgU84n4fk3A3IowIg7iijKPB7B5lIJ7rby2jZ0ZJalJDePhhZBQAUwhC17t
OaJ5bVrC0qGkwZLbbUqRZwWaH0ADCLav41CrXyVq1JwN+AJcMnq650Hr9m8Jj39rHR1S4r42
fmFz2QGKlE5E9JcfnOWg/RJnMe2KrpMUbwTMVHihyVm60Gi3ovOu73tuawHTgE83Wk/kB02R
GrFc844M9HQm5FZ/jsOk+XxeK925HoK7ifocSHILrGmX3TRb7DYE+QpUVBACtu0RupzB/c8Q
GNNB4MzwJX33x0eAVqRodHqG59F5GKpQWka3/KYMHzInk8jokKd6uxsvrRJ4TzGfOLypYyw2
MJ+qAyw3zGfyJDRh+ii9K+H3F7sK2R+vOUD4n5DrGUUtRYR7udk0TKI5/QS0x37qW3GWepRh
exSAeWBiJSwVKc8NoMDjNgDQPLUwuhL2k4hlPD9osFCXn78m4s/rHMPaxNgqrGmu8JNwRONL
+fJsdHWC2BEHvVFv/BHaRo2Ku0ZBE70e4Wk9R7jqDL+lbqEyricLKbqpxRxC7PxmkOrX7xB+
rK9jJU0+bA6IOhqqwK5SzAEqXEMAAAAAAAA=
--------------ms050507020202060509060304--

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 02:28:29 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 266ABB924BA
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 02:28:29 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6])
 by mx1.freebsd.org (Postfix) with ESMTP id 0993B2B18
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:28:28 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from [10.1.1.2] (unknown [10.1.1.2])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id D92A1DBF5
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:28:27 +0000 (UTC)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
From: Allan Jude <allanjude@freebsd.org>
Message-ID: <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
Date: Mon, 4 Jul 2016 22:28:27 -0400
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 02:28:29 -0000

On 2016-07-04 22:26, Karl Denninger wrote:
> 
> 
> On 7/4/2016 18:45, Matthew Macy wrote:
>>
>>
>>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger.net> wrote ---- 
>>  >  
>>  > On 7/3/2016 02:45, Matthew Macy wrote: 
>>  > >          
>>  > >             Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----  
>>  >  
>>  > Wellllll.... 
>>  >  
>>  > I've done a fair bit of work here (see 
>>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the 
>>  > political issues are at least as bad as the coding ones. 
>>  >  
>>  
>>
>> Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost.
>>
>> -M
>>
> 
> The ARC is very useful when it gets a hit as it avoid an I/O that would
> otherwise take place.
> 
> Where it sucks is when the system evicts working set to preserve ARC. 
> That's always wrong in that you're trading a speculative I/O (if the
> cache is hit later) for a *guaranteed* one (to page out) and maybe *two*
> (to page back in.)
> 

ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_target
that makes sure the ARC is reined in when there is memory pressure, by
ensuring a minimum amount of actually free pages.

-- 
Allan Jude

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 02:33:07 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5D6EB9261C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 02:33:07 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 99EBA2E8D
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:33:07 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 62742220F1C
 for <freebsd-hackers@freebsd.org>; Mon,  4 Jul 2016 21:33:05 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
Date: Mon, 4 Jul 2016 21:32:49 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms010208080905010008010306"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 02:33:07 -0000

This is a cryptographically signed message in MIME format.

--------------ms010208080905010008010306
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 7/4/2016 21:28, Allan Jude wrote:
> On 2016-07-04 22:26, Karl Denninger wrote:
>>
>> On 7/4/2016 18:45, Matthew Macy wrote:
>>>
>>>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denning=
er.net> wrote ----=20
>>>  > =20
>>>  > On 7/3/2016 02:45, Matthew Macy wrote:=20
>>>  > >         =20
>>>  > >             Cedric greatly overstates the intractability of reso=
lving it. Nonetheless, since the initial import very little has been done=
 to improve integration, and I don't know of anyone who is up to the task=
 taking an interest in it. Consequently, mmap() performance is likely "do=
omed" for the foreseeable future.-M---- =20
>>>  > =20
>>>  > Wellllll....=20
>>>  > =20
>>>  > I've done a fair bit of work here (see=20
>>>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and th=
e=20
>>>  > political issues are at least as bad as the coding ones.=20
>>>  > =20
>>> =20
>>>
>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per se=
=2E Have you ever tried disabling MFU caching to see how much worse LRU o=
nly is? I'm not really convinced the ARC's benefits justify its cost.
>>>
>>> -M
>>>
>> The ARC is very useful when it gets a hit as it avoid an I/O that woul=
d
>> otherwise take place.
>>
>> Where it sucks is when the system evicts working set to preserve ARC. =

>> That's always wrong in that you're trading a speculative I/O (if the
>> cache is hit later) for a *guaranteed* one (to page out) and maybe *tw=
o*
>> (to page back in.)
>>
> ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_targe=
t
> that makes sure the ARC is reined in when there is memory pressure, by
> ensuring a minimum amount of actually free pages.
>
Oh, but.....

Again, go read the PR I linked (and the current version of the patch
against 10-STABLE.)  The issues are far more intertwined than that.=20
Specifically, the dmu_tx cache decision (size of the write-back cache)
is flat-out broken and inappropriate in essentially all cases, and the
interaction of UMA and ARC is very destructive under a wide variety of
workloads.  The patch has hack-around for the dmu_tx problem and a
reasonably-effective fix for the UMA issues.  Actually fixing dmu_tx,
however, is nowhere near that easy since it really needs to be computed
per-zvol on an actual bytes moved per-unit-of-time basis.

Note that one of the patches in the set I developed is indeed
arc_free_target (indeed it was the first approach I took) -- but without
addressing the other two issues it doesn't solve the problem.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms010208080905010008010306
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjMyNDlaME8GCSqGSIb3DQEJBDFCBEAV
KMg4AHaoLMMe3So8i486K5oIjQpUdwmN9cYYgIYYLy9rftciot7s+0SuIHL4A7n80GJv/oWg
9uzBwLxM8hbqMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAojhtlyEO
eUqXIVLILPLpP1kqX5MZ+8QUJ+R/qE2CT/k59jwiQpxlfKinfuUoSJ/FIHjSlJX9Ky4x7bHg
9vulhAjnGvogIxmbdDLGGvbLhYb+hmWxTvugcLP8L3PrQ3Jya5hnmraZsU/ty84R40sO54Q1
ANySPdu0MtLhF8eMYPO2RmDk3I23WrSnQJpQlBwaBvO5hwPn5IZEVFJ2Cp3j0U02+O8Mvw2x
MgiwWAE9uHfqVwRXwMVBfP+rnigZw03ocWf/GfLFvu7/Jz/Ce4KVuwx5Xt/jPIS2VRrLmL0F
1PTCH1MVgGsmDGn0EkwGIe6MyXbGoa1Ra/SoCAo9ROpGk/HlH1KPUlwGltJYtn8TzJBIHvWC
nQ1kywEmPqU/8TXB4PgmcXqq6Wn9rR0rSi6cuwJzmswenV/UbD7pMhWHCOXK+23PQXxfv1vM
rMmQKYrXSbyBDn8qfBMjenVwvAYt9S2wcz3JpDGCd+xRKV8AAXUGhRAe3uFmWTXn+qiAPF+k
feMeDzeQ7UbCb2dM4JCuqtuef94qFAICgFzOg/FYcwQbcQu8blKvLDEDimhJEoUIk2nVgGG6
y+kxXbuGR3cNwu/mg4LoOx8DW9C8N9HNP9QSl4U4bG4VsGVSA+Vxd3eOw87TTGWOboavKUZM
yhrT2cK0KifkNEBu5OyNwMEaOssAAAAAAAA=
--------------ms010208080905010008010306--

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 02:36:35 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7357B926E8
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 02:36:35 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6])
 by mx1.freebsd.org (Postfix) with ESMTP id 7F469105D
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:36:35 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from [192.168.1.10] (unknown [192.168.1.10])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id 9633DDC0A
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:36:34 +0000 (UTC)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
 <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
From: Allan Jude <allanjude@freebsd.org>
Message-ID: <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org>
Date: Mon, 4 Jul 2016 22:36:31 -0400
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 02:36:35 -0000

On 2016-07-04 22:32, Karl Denninger wrote:
> On 7/4/2016 21:28, Allan Jude wrote:
>> On 2016-07-04 22:26, Karl Denninger wrote:
>>>
>>> On 7/4/2016 18:45, Matthew Macy wrote:
>>>>
>>>>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger.net> wrote ----
>>>>  >
>>>>  > On 7/3/2016 02:45, Matthew Macy wrote:
>>>>  > >
>>>>  > >             Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----
>>>>  >
>>>>  > Wellllll....
>>>>  >
>>>>  > I've done a fair bit of work here (see
>>>>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the
>>>>  > political issues are at least as bad as the coding ones.
>>>>  >
>>>>
>>>>
>>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost.
>>>>
>>>> -M
>>>>
>>> The ARC is very useful when it gets a hit as it avoid an I/O that would
>>> otherwise take place.
>>>
>>> Where it sucks is when the system evicts working set to preserve ARC.
>>> That's always wrong in that you're trading a speculative I/O (if the
>>> cache is hit later) for a *guaranteed* one (to page out) and maybe *two*
>>> (to page back in.)
>>>
>> ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_target
>> that makes sure the ARC is reined in when there is memory pressure, by
>> ensuring a minimum amount of actually free pages.
>>
> Oh, but.....
>
> Again, go read the PR I linked (and the current version of the patch
> against 10-STABLE.)  The issues are far more intertwined than that.
> Specifically, the dmu_tx cache decision (size of the write-back cache)
> is flat-out broken and inappropriate in essentially all cases, and the
> interaction of UMA and ARC is very destructive under a wide variety of
> workloads.  The patch has hack-around for the dmu_tx problem and a
> reasonably-effective fix for the UMA issues.  Actually fixing dmu_tx,
> however, is nowhere near that easy since it really needs to be computed
> per-zvol on an actual bytes moved per-unit-of-time basis.
>
> Note that one of the patches in the set I developed is indeed
> arc_free_target (indeed it was the first approach I took) -- but without
> addressing the other two issues it doesn't solve the problem.
>

You keep saying per zvol. Do you mean per vdev? I am under the 
impression that no zvol's are involved in the use case this thread is about.

Improving the way ZFS frees memory, specifically UMA and the 'kmem 
caches' will help a lot as well.

In addition, another patch just went in to allow you to change the 
arc_max and arc_min on a running system.

-- 
Allan Jude

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 02:46:47 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52EEFB92892
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 02:46:47 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 017CD151C
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 02:46:46 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 19B8B220FDA
 for <freebsd-hackers@freebsd.org>; Mon,  4 Jul 2016 21:46:44 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
 <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
 <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net>
Date: Mon, 4 Jul 2016 21:46:29 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms010409020104080906010506"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 02:46:47 -0000

This is a cryptographically signed message in MIME format.

--------------ms010409020104080906010506
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 7/4/2016 21:36, Allan Jude wrote:
> On 2016-07-04 22:32, Karl Denninger wrote:
>> On 7/4/2016 21:28, Allan Jude wrote:
>>> On 2016-07-04 22:26, Karl Denninger wrote:
>>>>
>>>> On 7/4/2016 18:45, Matthew Macy wrote:
>>>>>
>>>>>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger
>>>>> <karl@denninger.net> wrote ----
>>>>>  >
>>>>>  > On 7/3/2016 02:45, Matthew Macy wrote:
>>>>>  > >
>>>>>  > >             Cedric greatly overstates the intractability of
>>>>> resolving it. Nonetheless, since the initial import very little
>>>>> has been done to improve integration, and I don't know of anyone
>>>>> who is up to the task taking an interest in it. Consequently,
>>>>> mmap() performance is likely "doomed" for the foreseeable
>>>>> future.-M----
>>>>>  >
>>>>>  > Wellllll....
>>>>>  >
>>>>>  > I've done a fair bit of work here (see
>>>>>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and =
the
>>>>>  > political issues are at least as bad as the coding ones.
>>>>>  >
>>>>>
>>>>>
>>>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per
>>>>> se. Have you ever tried disabling MFU caching to see how much
>>>>> worse LRU only is? I'm not really convinced the ARC's benefits
>>>>> justify its cost.
>>>>>
>>>>> -M
>>>>>
>>>> The ARC is very useful when it gets a hit as it avoid an I/O that
>>>> would
>>>> otherwise take place.
>>>>
>>>> Where it sucks is when the system evicts working set to preserve ARC=
=2E
>>>> That's always wrong in that you're trading a speculative I/O (if the=

>>>> cache is hit later) for a *guaranteed* one (to page out) and maybe
>>>> *two*
>>>> (to page back in.)
>>>>
>>> ZFS is better behaved in 11.x, there is a sysctl
>>> vfs.zfs.arc_free_target
>>> that makes sure the ARC is reined in when there is memory pressure, b=
y
>>> ensuring a minimum amount of actually free pages.
>>>
>> Oh, but.....
>>
>> Again, go read the PR I linked (and the current version of the patch
>> against 10-STABLE.)  The issues are far more intertwined than that.
>> Specifically, the dmu_tx cache decision (size of the write-back cache)=

>> is flat-out broken and inappropriate in essentially all cases, and the=

>> interaction of UMA and ARC is very destructive under a wide variety of=

>> workloads.  The patch has hack-around for the dmu_tx problem and a
>> reasonably-effective fix for the UMA issues.  Actually fixing dmu_tx,
>> however, is nowhere near that easy since it really needs to be compute=
d
>> per-zvol on an actual bytes moved per-unit-of-time basis.
>>
>> Note that one of the patches in the set I developed is indeed
>> arc_free_target (indeed it was the first approach I took) -- but witho=
ut
>> addressing the other two issues it doesn't solve the problem.
>>
>
> You keep saying per zvol. Do you mean per vdev? I am under the
> impression that no zvol's are involved in the use case this thread is
> about.
Sorry, per-vdev.  The problem with dmu_tx is that it's system-wide.=20
This is wildly inappropriate for several reasons -- first, it is
computed on size-of-RAM with a hard cap (which is stupid on its face)
and it entirely insensitive to the performance of the vdev's in
question.  Specifically, it is very common for a system to have very
fast (e.g. SSD) disks, perhaps in a mirror configuration, and then
spinning rust in a RaidZ2 config for bulk storage.  Those are very, very
different performance wise and they should have wildly different
write-back cache sizes.  At present there is exactly one such write-back
cache and it's both system-wide and pays exactly zero attention to the
throughput of the underlying vdevs it is talking to.

This is why you can provoke minute-long stalls on a system with moderate
(e.g. 32GB) amounts of RAM if there are spinning rust devices in the
configuration.

>
> Improving the way ZFS frees memory, specifically UMA and the 'kmem
> caches' will help a lot as well.
>
Well, yeah.  But that means you have to police up the size of the UMA
=2Evs. how much is actually in use in the UMA.  What the PR does is get
pretty aggressive with that whenever RAM is tight, and before the pager
can start playing hell with system performance.

> In addition, another patch just went in to allow you to change the
> arc_max and arc_min on a running system.
>
Yes, the PR I did a long time ago made that "active" on a running
system.... so I've had that for quite some time.  Not that you really
ought to need to play with that (if you feel a need to then you're still
at step 1 or 2 of what I went through with analyzing and working on this
in the 10.x code.....)

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms010409020104080906010506
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjQ2MjlaME8GCSqGSIb3DQEJBDFCBEAD
QidOIbJLVCn4JDVQQmXLjHXBkph1n3i81pzVT6ckttaROoPA/2MTZQH3Bp6qaMZEHVS6RevL
xClQSpCqnvEhMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAC47VkM+m
ZQ2YAs6GfFwHC/bP3nsNN2feyRwnZMJ90eF4AL0Qm2H9KPNhoa0kDoNFQDEWl6AGeVj2gyxL
Rk1HEX3m3f2RqZQqanMdBtIPe8P/AZxqMWOUErWBUES1ee1YMz50mqqAOUEcxBiYNFDMbFCN
vwsqwHlIJdn2Rz+IYoUlUKlanTbSBXaODgKh7UjD4hAi917A7E67bOqwiAb9tp3cDjNRMEo4
dciyujK3tEHyEXmupTYvnXVOqT2kLjDxcxfiPDQF3B7tzTbHcStVCloTHCxSuvpZK3lfZhCB
Xu84S3ZW/MmJF8CCl50b+Te0NWNJbc7yTRKHvS3b1Upb9U1jcXlbJF5OlFNJ3umazSTJoPoB
TYKPkJBS8j3yfTnN4w+v5evrYaYpIFXSQ5KvAuMT87A7dDGUWpVx8EmrisTP2ZMYI4qSAPxb
FwAeUTwxeI2hJ237gukNoNMb+eXDoMyn0FgAz6i4ngp2cpA6YAIghLYjhVYeaRMGSJj3ESSL
d60a1QziYTAl2fbG644SoKBufKmQ43zMTFW0DdprnthW2S07K9NHXCVIDOxV4cun1yZMv54i
3zgGFXEdUaakTjUn4kF3F1vFuskPomi2ipZOyQwXngTH5molosR23Iwj9cSaPWho4jVY4wQG
dRWjt/a65kWboAbhCuw+YMdXkiwAAAAAAAA=
--------------ms010409020104080906010506--

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 03:01:20 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2ABD5B92ADF
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 03:01:20 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6])
 by mx1.freebsd.org (Postfix) with ESMTP id 0B39E20FF
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 03:01:19 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from [192.168.1.10] (unknown [192.168.1.10])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id E363BDC4E
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 03:01:18 +0000 (UTC)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
 <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
 <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org>
 <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net>
From: Allan Jude <allanjude@freebsd.org>
Message-ID: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>
Date: Mon, 4 Jul 2016 23:01:16 -0400
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 03:01:20 -0000

On 2016-07-04 22:46, Karl Denninger wrote:
> On 7/4/2016 21:36, Allan Jude wrote:
>> On 2016-07-04 22:32, Karl Denninger wrote:
>>> On 7/4/2016 21:28, Allan Jude wrote:
>>>> On 2016-07-04 22:26, Karl Denninger wrote:
>>>>>
>>>>> On 7/4/2016 18:45, Matthew Macy wrote:
>>>>>>
>>>>>>  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger
>>>>>> <karl@denninger.net> wrote ----
>>>>>>  >
>>>>>>  > On 7/3/2016 02:45, Matthew Macy wrote:
>>>>>>  > >
>>>>>>  > >             Cedric greatly overstates the intractability of
>>>>>> resolving it. Nonetheless, since the initial import very little
>>>>>> has been done to improve integration, and I don't know of anyone
>>>>>> who is up to the task taking an interest in it. Consequently,
>>>>>> mmap() performance is likely "doomed" for the foreseeable
>>>>>> future.-M----
>>>>>>  >
>>>>>>  > Wellllll....
>>>>>>  >
>>>>>>  > I've done a fair bit of work here (see
>>>>>>  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the
>>>>>>  > political issues are at least as bad as the coding ones.
>>>>>>  >
>>>>>>
>>>>>>
>>>>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per
>>>>>> se. Have you ever tried disabling MFU caching to see how much
>>>>>> worse LRU only is? I'm not really convinced the ARC's benefits
>>>>>> justify its cost.
>>>>>>
>>>>>> -M
>>>>>>
>>>>> The ARC is very useful when it gets a hit as it avoid an I/O that
>>>>> would
>>>>> otherwise take place.
>>>>>
>>>>> Where it sucks is when the system evicts working set to preserve ARC.
>>>>> That's always wrong in that you're trading a speculative I/O (if the
>>>>> cache is hit later) for a *guaranteed* one (to page out) and maybe
>>>>> *two*
>>>>> (to page back in.)
>>>>>
>>>> ZFS is better behaved in 11.x, there is a sysctl
>>>> vfs.zfs.arc_free_target
>>>> that makes sure the ARC is reined in when there is memory pressure, by
>>>> ensuring a minimum amount of actually free pages.
>>>>
>>> Oh, but.....
>>>
>>> Again, go read the PR I linked (and the current version of the patch
>>> against 10-STABLE.)  The issues are far more intertwined than that.
>>> Specifically, the dmu_tx cache decision (size of the write-back cache)
>>> is flat-out broken and inappropriate in essentially all cases, and the
>>> interaction of UMA and ARC is very destructive under a wide variety of
>>> workloads.  The patch has hack-around for the dmu_tx problem and a
>>> reasonably-effective fix for the UMA issues.  Actually fixing dmu_tx,
>>> however, is nowhere near that easy since it really needs to be computed
>>> per-zvol on an actual bytes moved per-unit-of-time basis.
>>>
>>> Note that one of the patches in the set I developed is indeed
>>> arc_free_target (indeed it was the first approach I took) -- but without
>>> addressing the other two issues it doesn't solve the problem.
>>>
>>
>> You keep saying per zvol. Do you mean per vdev? I am under the
>> impression that no zvol's are involved in the use case this thread is
>> about.
> Sorry, per-vdev.  The problem with dmu_tx is that it's system-wide.
> This is wildly inappropriate for several reasons -- first, it is
> computed on size-of-RAM with a hard cap (which is stupid on its face)
> and it entirely insensitive to the performance of the vdev's in
> question.  Specifically, it is very common for a system to have very
> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then
> spinning rust in a RaidZ2 config for bulk storage.  Those are very, very
> different performance wise and they should have wildly different
> write-back cache sizes.  At present there is exactly one such write-back
> cache and it's both system-wide and pays exactly zero attention to the
> throughput of the underlying vdevs it is talking to.
>
> This is why you can provoke minute-long stalls on a system with moderate
> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the
> configuration.
>
>>
>> Improving the way ZFS frees memory, specifically UMA and the 'kmem
>> caches' will help a lot as well.
>>
> Well, yeah.  But that means you have to police up the size of the UMA
> .vs. how much is actually in use in the UMA.  What the PR does is get
> pretty aggressive with that whenever RAM is tight, and before the pager
> can start playing hell with system performance.
>
>> In addition, another patch just went in to allow you to change the
>> arc_max and arc_min on a running system.
>>
> Yes, the PR I did a long time ago made that "active" on a running
> system.... so I've had that for quite some time.  Not that you really
> ought to need to play with that (if you feel a need to then you're still
> at step 1 or 2 of what I went through with analyzing and working on this
> in the 10.x code.....)
>

Have you looked into the the ZFS 'Write Throttle', it seems like it was 
meant to solve the writeback problem you are describing. It starts 
sending back pressure up to the application by introducing larger and 
larger delays in the write() call until your disks can keep up with your 
applications.

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/


-- 
Allan Jude

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 05:26:27 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A52DB855D3
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 05:26:27 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 79E1218E4
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 05:26:27 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 7274FB855D1; Tue,  5 Jul 2016 05:26:27 +0000 (UTC)
Delivered-To: hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 721FBB855D0
 for <hackers@mailman.ysv.freebsd.org>; Tue,  5 Jul 2016 05:26:27 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mail-it0-x22e.google.com (mail-it0-x22e.google.com
 [IPv6:2607:f8b0:4001:c0b::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 45BE718E1
 for <hackers@freebsd.org>; Tue,  5 Jul 2016 05:26:27 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: by mail-it0-x22e.google.com with SMTP id j185so7707757ith.1
 for <hackers@freebsd.org>; Mon, 04 Jul 2016 22:26:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=sippysoft-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:sender:from:date:message-id:subject:to;
 bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=;
 b=OBUujVrFLBLTktJQpJGrjIQMCw7UscFvYmwt7KbLsJT4p4x11E4lHJWuEQ1Dp7QxC+
 oya5gvOmKhwp0EPyuWAMRo/mjSz5Sg8SutyhwpNlLkeUfOrpINIuB8etQgUtY/WhngJF
 wt6HQWusvyDAtmJh1DTT5RxZSBUHlxo5hSQAdaPvb9n4r9JSZaEtVbdPFjDaSbrpcdOk
 DPWFPO440lZfqIzOO3sNzT0Vd136u9YmML6E21WvUzUbjgZFu9WebUbUru5trpfCWK9u
 VeotiqfZW5OtZ8NqSojSf+Z6q/KagZACAxbVorZejJZ2g0HoymUMrjce1FuD1F8kNH6q
 GQBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
 :to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=;
 b=hNfcxSYFCRBwR1PyZfnOSZ5Y8ADW75hle0K1m6sOIpSSUkeu0T4CFqx065insmI14T
 8mCBK5/k1macOPFeByZHV+yrc9F/xGraV21oyig3WHa9MRvGE0eXsXHdl/l+0KZWEaH0
 14koeUqV32GnW2gk12boXFYRd0EvSz5Em8MwSYCl7KbhVxTz3C4Fju4Xp5ik+gK5CWJf
 5dwD9RUfP5dW3b13J6MS75sSUG2hdcFQN173rf6GYmYVYXP3lNuUtmaSUqlGDh4AHYh2
 gYcD5WmZm2XZmaJWqYniLPypoqgxLtjSVPjMDUo5ZiCdT4gM78N6+iDlUYJIzv+ZAK6L
 mvHw==
X-Gm-Message-State: ALyK8tJLSDKS0xyr7b3alrR2X9sI1Fkf+EpqkZEf0P9inGtB1vtTQTihmnhKaSTSFbVH2z9QxlpgyFQn+THcKRQG
X-Received: by 10.36.91.66 with SMTP id g63mr11055580itb.16.1467696386364;
 Mon, 04 Jul 2016 22:26:26 -0700 (PDT)
MIME-Version: 1.0
Sender: sobomax@sippysoft.com
Received: by 10.36.59.193 with HTTP; Mon, 4 Jul 2016 22:26:25 -0700 (PDT)
From: Maxim Sobolev <sobomax@freebsd.org>
Date: Mon, 4 Jul 2016 22:26:25 -0700
X-Google-Sender-Auth: DIV2CM4kakl33DY5WwTrB3JuOKg
Message-ID: <CAH7qZfu=XveZCAgS0+dzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
Subject: A faulty program corrupts some its data preventing correct core
 generation (Failed to write core file for process postgres (error 14))
To: stable@freebsd.org, hackers@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 05:26:27 -0000

Hi all, investigating some random postgresql-9.1.21 server crashes on
FreeBSD 10.3, we've started seeing those after upgrading from postgres
9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
unlikely. I suspect that postgres is at fault, however I am also curious
how could it be that kernel is not capable of generating core file when
application does something silly? Is it that some ELF-related data
structures got corrupted or something else? Are we protecting the page
where ELF header is mapped with R/O flag? I am looking at possibly
recreating this by poking around elf header(s), seeing if I can corrupt it
in a similar manner reliably, any pointers or suggestions are appreciated.

Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
postgres (error 14)
Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
signal 11
Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
postgres (error 14)
Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal
11

#define EFAULT          14              /* Bad address */

The resulting files are truncated and is not really usable for anything.
We've seen the same issue

-rw-------    1 pgsql     wheel     1310720 Jun 27 04:10 postgres.41361.core
-rw-------    1 pgsql     wheel     1310720 Jul  1 05:21 postgres.1722.core

[ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd10.3".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from postgres...(no debugging symbols found)...done.
BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file
size >= 517120000, found: 1310720.
[New LWP 100261]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
(gdb) where
#0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
(gdb) q

-Max

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 11:14:22 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B1ACB21827
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 11:14:22 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0823F1F5A
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 11:14:21 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 8ED22153413;
 Tue,  5 Jul 2016 13:14:18 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.com
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id Vdgp9rAVpD23; Tue,  5 Jul 2016 13:13:49 +0200 (CEST)
Received: from [192.168.10.67] (opteron [192.168.10.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id 3475D15340A
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 13:13:49 +0200 (CEST)
To: FreeBSD Hackers <freebsd-hackers@freebsd.org>
From: Willem Jan Withagen <wjw@digiware.nl>
Subject: Problem during dlopen()
Message-ID: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
Date: Tue, 5 Jul 2016 13:13:42 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 11:14:22 -0000

Hi,

I'm banging my head agains the wall because I cannot seem to get this
working.

The problem is due to changing from automake to cmake building.

But all my dlopens start failing with something like:
load failed dlopen(build/lib/compressor/libceph_snappy.so) or
dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"

If do a lookup for  the name:
nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl

if give me:
                 U _ZN4ceph6buffer4list8iterator7advanceEl

The parent/calling executable however has:
	0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl

Clearly dlopen is not able to match these 2 and succeed.

Question:
So on which part of the building is what switch missing.

Thanx,
--WjW

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 11:45:25 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AB1FB21E74;
 Tue,  5 Jul 2016 11:45:25 +0000 (UTC)
 (envelope-from gahr@FreeBSD.org)
Received: from mail.ptrcrt.ch (gahr.cloud.tilaa.com [84.22.109.158])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5EF211C9F;
 Tue,  5 Jul 2016 11:45:22 +0000 (UTC)
 (envelope-from gahr@FreeBSD.org)
Received: from webmail.ptrcrt.ch (www.gahr.ch [192.168.1.2])
 by mail.ptrcrt.ch (OpenSMTPD) with ESMTPSA id c381a7a4
 TLS version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO;
 Tue, 5 Jul 2016 11:45:14 +0000 (UTC)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Tue, 05 Jul 2016 13:45:13 +0200
From: Pietro Cerutti <gahr@FreeBSD.org>
To: Willem Jan Withagen <wjw@digiware.nl>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 owner-freebsd-hackers@freebsd.org
Subject: Re: Problem during dlopen()
Organization: The FreeBSD Project
In-Reply-To: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
Message-ID: <416028b6b2a1dffe4e010b5792c56100@gahr.ch>
X-Sender: gahr@FreeBSD.org
User-Agent: Roundcube Webmail/1.2.0
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 11:45:25 -0000

On 2016-07-05 13:13, Willem Jan Withagen wrote:
> Hi,
> 
> I'm banging my head agains the wall because I cannot seem to get this
> working.
> 
> The problem is due to changing from automake to cmake building.
> 
> But all my dlopens start failing with something like:
> load failed dlopen(build/lib/compressor/libceph_snappy.so) or
> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"
> 
> If do a lookup for  the name:
> nm build/lib/libceph_snappy.so |grep 
> ceph6buffer4list8iterator7advanceEl
> 
> if give me:
>                  U _ZN4ceph6buffer4list8iterator7advanceEl
> 
> The parent/calling executable however has:
> 	0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl
> 
> Clearly dlopen is not able to match these 2 and succeed.
> 
> Question:
> So on which part of the building is what switch missing.

Wild guess: -Wl,-E linking the executable.

-- 
Pietro Cerutti
gahr@FreeBSD.org

PGP Public Key:
http://gahr.ch/pgp

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 11:48:16 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F91DB21F9C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 11:48:16 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 57DF01DF9
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 11:48:16 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 53318B21F98; Tue,  5 Jul 2016 11:48:16 +0000 (UTC)
Delivered-To: hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52C25B21F97;
 Tue,  5 Jul 2016 11:48:16 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F11141DF8;
 Tue,  5 Jul 2016 11:48:15 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u65Bm9b6022894
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 5 Jul 2016 14:48:09 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u65Bm9b6022894
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u65Bm8AJ022893;
 Tue, 5 Jul 2016 14:48:08 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 5 Jul 2016 14:48:08 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Maxim Sobolev <sobomax@freebsd.org>
Cc: stable@freebsd.org, hackers@freebsd.org
Subject: Re: A faulty program corrupts some its data preventing correct core
 generation (Failed to write core file for process postgres (error 14))
Message-ID: <20160705114808.GN38613@kib.kiev.ua>
References: <CAH7qZfu=XveZCAgS0+dzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAH7qZfu=XveZCAgS0+dzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 11:48:16 -0000

On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote:
> Hi all, investigating some random postgresql-9.1.21 server crashes on
> FreeBSD 10.3, we've started seeing those after upgrading from postgres
> 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
> unlikely. I suspect that postgres is at fault, however I am also curious
> how could it be that kernel is not capable of generating core file when
> application does something silly? Is it that some ELF-related data
> structures got corrupted or something else? Are we protecting the page
> where ELF header is mapped with R/O flag? I am looking at possibly
> recreating this by poking around elf header(s), seeing if I can corrupt it
> in a similar manner reliably, any pointers or suggestions are appreciated.
> 
> Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
> postgres (error 14)
> Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
> signal 11
> Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
> postgres (error 14)
> Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal
> 11
> 
> #define EFAULT          14              /* Bad address */
> 
> The resulting files are truncated and is not really usable for anything.
> We've seen the same issue
> 
> -rw-------    1 pgsql     wheel     1310720 Jun 27 04:10 postgres.41361.core
> -rw-------    1 pgsql     wheel     1310720 Jul  1 05:21 postgres.1722.core
> 
> [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
> GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd10.3".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from postgres...(no debugging symbols found)...done.
> BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file
> size >= 517120000, found: 1310720.
> [New LWP 100261]
> Core was generated by `postgres'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> (gdb) where
> #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
> (gdb) q
> 
https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 11:51:08 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEBE1B21180
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 11:51:08 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5F39F122E
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 11:51:08 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u65BowoR023945
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Tue, 5 Jul 2016 14:50:58 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u65BowoR023945
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u65BovOp023944;
 Tue, 5 Jul 2016 14:50:57 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Tue, 5 Jul 2016 14:50:57 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Willem Jan Withagen <wjw@digiware.nl>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject: Re: Problem during dlopen()
Message-ID: <20160705115057.GO38613@kib.kiev.ua>
References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 11:51:09 -0000

On Tue, Jul 05, 2016 at 01:13:42PM +0200, Willem Jan Withagen wrote:
> Hi,
> 
> I'm banging my head agains the wall because I cannot seem to get this
> working.
> 
> The problem is due to changing from automake to cmake building.
> 
> But all my dlopens start failing with something like:
> load failed dlopen(build/lib/compressor/libceph_snappy.so) or
> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"
> 
> If do a lookup for  the name:
> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl
> 
> if give me:
>                  U _ZN4ceph6buffer4list8iterator7advanceEl
> 
> The parent/calling executable however has:
> 	0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl
Are you sure ?  In which symbol table (dynamic or debug) the referenced
symbol appear ?  Check with nm -D.  If it is in debug table, you need
--export-dynamic linker switch when creating the binary defining the
symbols.

> 
> Clearly dlopen is not able to match these 2 and succeed.
Why are you sure that the dynamic linker at fault ?

> 
> Question:
> So on which part of the building is what switch missing.
> 
> Thanx,
> --WjW
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 11:53:29 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0768BB21348;
 Tue,  5 Jul 2016 11:53:29 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C3B2115B1;
 Tue,  5 Jul 2016 11:53:28 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 128CE1534C7;
 Tue,  5 Jul 2016 13:53:26 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.com
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 5ToSpg2XROY4; Tue,  5 Jul 2016 13:53:16 +0200 (CEST)
Received: from [192.168.10.67] (opteron [192.168.10.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id 3A9B3153413;
 Tue,  5 Jul 2016 13:53:16 +0200 (CEST)
Subject: Re: Problem during dlopen()
To: Pietro Cerutti <gahr@FreeBSD.org>
References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
 <416028b6b2a1dffe4e010b5792c56100@gahr.ch>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 owner-freebsd-hackers@freebsd.org
From: Willem Jan Withagen <wjw@digiware.nl>
Message-ID: <0aa90be4-4a10-9477-e550-a0e399d97216@digiware.nl>
Date: Tue, 5 Jul 2016 13:53:09 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <416028b6b2a1dffe4e010b5792c56100@gahr.ch>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 11:53:29 -0000

On 5-7-2016 13:45, Pietro Cerutti wrote:
> On 2016-07-05 13:13, Willem Jan Withagen wrote:
>> Hi,
>>
>> I'm banging my head agains the wall because I cannot seem to get this
>> working.
>>
>> The problem is due to changing from automake to cmake building.
>>
>> But all my dlopens start failing with something like:
>> load failed dlopen(build/lib/compressor/libceph_snappy.so) or
>> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
>> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"
>>
>> If do a lookup for  the name:
>> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl
>>
>> if give me:
>>                  U _ZN4ceph6buffer4list8iterator7advanceEl
>>
>> The parent/calling executable however has:
>>     0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl
>>
>> Clearly dlopen is not able to match these 2 and succeed.
>>
>> Question:
>> So on which part of the building is what switch missing.
> 
> Wild guess: -Wl,-E linking the executable.
> 

Any guess is a good guess to try. :)
Will give it a shot.

--WjW


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 12:00:36 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5761DB21960;
 Tue,  5 Jul 2016 12:00:36 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1BD9F19DC;
 Tue,  5 Jul 2016 12:00:35 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 7E98A1534C7;
 Tue,  5 Jul 2016 14:00:33 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.com
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id o8hmxt6eZzFK; Tue,  5 Jul 2016 14:00:06 +0200 (CEST)
Received: from [192.168.10.67] (opteron [192.168.10.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id 6868E153413;
 Tue,  5 Jul 2016 14:00:06 +0200 (CEST)
Subject: Re: Problem during dlopen()
To: Pietro Cerutti <gahr@FreeBSD.org>
References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
 <416028b6b2a1dffe4e010b5792c56100@gahr.ch>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 owner-freebsd-hackers@freebsd.org
From: Willem Jan Withagen <wjw@digiware.nl>
Message-ID: <206facb4-cb1c-e52c-b387-3344c41c12e7@digiware.nl>
Date: Tue, 5 Jul 2016 13:59:59 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <416028b6b2a1dffe4e010b5792c56100@gahr.ch>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 12:00:36 -0000

On 5-7-2016 13:45, Pietro Cerutti wrote:
> On 2016-07-05 13:13, Willem Jan Withagen wrote:
>> Hi,
>>
>> I'm banging my head agains the wall because I cannot seem to get this
>> working.
>>
>> The problem is due to changing from automake to cmake building.
>>
>> But all my dlopens start failing with something like:
>> load failed dlopen(build/lib/compressor/libceph_snappy.so) or
>> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
>> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"
>>
>> If do a lookup for  the name:
>> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl
>>
>> if give me:
>>                  U _ZN4ceph6buffer4list8iterator7advanceEl
>>
>> The parent/calling executable however has:
>>     0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl
>>
>> Clearly dlopen is not able to match these 2 and succeed.
>>
>> Question:
>> So on which part of the building is what switch missing.
> 
> Wild guess: -Wl,-E linking the executable.
> 

And bonus point for Pietro.
I did have that switch, but I had it on the lib that needed to be loaded.

So: Too many switches/options/flags in Cmake for my taste.

--WjW


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 12:23:27 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EAD9B715A6
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 12:23:27 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4A44B17CA
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 12:23:27 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id ED757153402;
 Tue,  5 Jul 2016 14:23:15 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.com
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id S7BUwGztjqfR; Tue,  5 Jul 2016 14:22:48 +0200 (CEST)
Received: from [192.168.10.67] (opteron [192.168.10.67])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id D9AA315340A;
 Tue,  5 Jul 2016 14:22:48 +0200 (CEST)
Subject: Re: Problem during dlopen()
To: Konstantin Belousov <kostikbel@gmail.com>
References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl>
 <20160705115057.GO38613@kib.kiev.ua>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
From: Willem Jan Withagen <wjw@digiware.nl>
Message-ID: <3570efcb-f106-95ba-52da-972a55d2fc33@digiware.nl>
Date: Tue, 5 Jul 2016 14:22:42 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <20160705115057.GO38613@kib.kiev.ua>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 12:23:27 -0000

On 5-7-2016 13:50, Konstantin Belousov wrote:
> On Tue, Jul 05, 2016 at 01:13:42PM +0200, Willem Jan Withagen wrote:
>> Hi,
>>
>> I'm banging my head agains the wall because I cannot seem to get this
>> working.
>>
>> The problem is due to changing from automake to cmake building.
>>
>> But all my dlopens start failing with something like:
>> load failed dlopen(build/lib/compressor/libceph_snappy.so) or
>> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so:
>> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl"
>>
>> If do a lookup for  the name:
>> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl
>>
>> if give me:
>>                  U _ZN4ceph6buffer4list8iterator7advanceEl
>>
>> The parent/calling executable however has:
>> 	0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl
> Are you sure ?  In which symbol table (dynamic or debug) the referenced
> symbol appear ?  Check with nm -D.  If it is in debug table, you need
> --export-dynamic linker switch when creating the binary defining the
> symbols.

You are getting the other half of the point that Pietro got.
I did a blunt nm <file> not excluding any tables...

Thanx,
--WjW


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 14:31:12 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D26F8B7382F
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 14:31:12 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 850161967
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 14:31:11 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id A758122073F
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 09:31:08 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org>
 <768b6169-70d9-5500-c455-563d8340972e@denninger.net>
 <b03f73a1-95c9-c753-3464-74fcb45351e5@freebsd.org>
 <ec4685b2-bdaf-c18d-8aff-38b17edf4ebb@denninger.net>
 <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <d00e30d1-7c99-8f30-835e-705fbf3d00e8@denninger.net>
Date: Tue, 5 Jul 2016 09:30:51 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms050604050603000003050505"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 14:31:12 -0000

This is a cryptographically signed message in MIME format.

--------------ms050604050603000003050505
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 7/4/2016 22:01, Allan Jude wrote:
> On 2016-07-04 22:46, Karl Denninger wrote:
>>
>>> You keep saying per zvol. Do you mean per vdev? I am under the
>>> impression that no zvol's are involved in the use case this thread is=

>>> about.
>> Sorry, per-vdev.  The problem with dmu_tx is that it's system-wide.
>> This is wildly inappropriate for several reasons -- first, it is
>> computed on size-of-RAM with a hard cap (which is stupid on its face)
>> and it entirely insensitive to the performance of the vdev's in
>> question.  Specifically, it is very common for a system to have very
>> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then
>> spinning rust in a RaidZ2 config for bulk storage.  Those are very, ve=
ry
>> different performance wise and they should have wildly different
>> write-back cache sizes.  At present there is exactly one such write-ba=
ck
>> cache and it's both system-wide and pays exactly zero attention to the=

>> throughput of the underlying vdevs it is talking to.
>>
>> This is why you can provoke minute-long stalls on a system with modera=
te
>> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the
>> configuration.
>>
>>>
>>> Improving the way ZFS frees memory, specifically UMA and the 'kmem
>>> caches' will help a lot as well.
>>>
>> Well, yeah.  But that means you have to police up the size of the UMA
>> .vs. how much is actually in use in the UMA.  What the PR does is get
>> pretty aggressive with that whenever RAM is tight, and before the page=
r
>> can start playing hell with system performance.
>>
>>> In addition, another patch just went in to allow you to change the
>>> arc_max and arc_min on a running system.
>>>
>> Yes, the PR I did a long time ago made that "active" on a running
>> system.... so I've had that for quite some time.  Not that you really
>> ought to need to play with that (if you feel a need to then you're sti=
ll
>> at step 1 or 2 of what I went through with analyzing and working on th=
is
>> in the 10.x code.....)
>>
>
> Have you looked into the the ZFS 'Write Throttle', it seems like it
> was meant to solve the writeback problem you are describing. It starts
> sending back pressure up to the application by introducing larger and
> larger delays in the write() call until your disks can keep up with
> your applications.
>
> http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/
>
> http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/
>

I believe this has been brought into FreeBSD's implementation; I recall
going through it.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms050604050603000003050505
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNDMwNTFaME8GCSqGSIb3DQEJBDFCBECD
rQB1crTWkeBbJPZtcru08rZBv2y3HIBGXLi38ruOrCBCfXJffBJCfKv+LJJoL5pA1fPPkQEx
sS4V/gDp1k0CMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAkWqI9kAf
JqhCKltoYcw3vBqB11I+6iVlqd568CK7FKreL7vQG94rKOtSgR/gP7b2rpqqiteBS1sPsqpQ
rKdhHo8HM5sVECzMKqbXq7SHSUSAt+UPWH44qcjUyqNNW7HP6EezceFMm6Ree6n+FgNyPa6O
LY+yZVp2vSCg6h115plY6Jeq5fiMKyVNxbycr2M4f597OrwwbNCXGVktIrItgBmSU7jrt8yS
n8OhRd99N37bHJh7wqZ8EnGElTa2ENFQJ0uw0xSGhrV6EtzJdHEaWhSjmVaneY/9MPTQMuFz
P7H7X0P1QA2257RGp3ZZte18De2HwaG2d+uNkBHZrcD9VeOrCDjJiyQGsLiGq1vKiE2C4k+m
qVygGO03+9+9tpQY78tMwl7rHtL7QQ4pVI7toX5UVN3Ny/OMapF6wBx/8OmY4gWg8QmAbMJE
rPzVXad+JjN+11+xr+H53YQWd5fox78I3yO8PKdh3RGJ7Ffgtb4k829OOpM8HOonKti0OhxK
aQIP/KTEx30mx4zIimK9kkW4ETitkyrQhFGjjeonlszqQH4NlVrkwzf1J1Ac5U3Za8wvdj5K
rxrFRTUCinfRodbGHZqu9BsUyAXBjeUu71X2N4arrM3xNLjq8o+5qnYKUXILVT+G4iBeZRWX
ofcCv0D8vbizqSD440lbs0ToWg4AAAAAAAA=
--------------ms050604050603000003050505--

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 14:43:56 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16DDEB73F28
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 14:43:56 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id E733E16B6
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 14:43:55 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id E328DB73F23; Tue,  5 Jul 2016 14:43:55 +0000 (UTC)
Delivered-To: hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2D94B73F22
 for <hackers@mailman.ysv.freebsd.org>; Tue,  5 Jul 2016 14:43:55 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com
 [IPv6:2a00:1450:400c:c09::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6EE4416B4
 for <hackers@freebsd.org>; Tue,  5 Jul 2016 14:43:55 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: by mail-wm0-x22e.google.com with SMTP id a66so155961516wme.0
 for <hackers@freebsd.org>; Tue, 05 Jul 2016 07:43:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=sippysoft-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc;
 bh=2UxvES/Nt7XSNT8IovIwmSfj/iLLw0k+Fi0kBHPpE6I=;
 b=IFLFUwDx8zKlMBfILlOHHuCZXbEzUwnoVgM6s/xCae0Zn6Cs8RctxH8G56cOU3Yl1Q
 JW4KXOS6chooPaGgE7MMdk9+CfEAzIBIBDn7SVxnLApXv7v1Mje6kXVI4q4FU+4roQjI
 2tiJyp2BLf+iRaZ5W+EJYuyY8dGpSjVb8z23HqxR+8QI380vxS62n8oNKH35JC+KtpXO
 y7pyo9L/86xb/ASgSG6q0ANQRUA5fprNfN4zvu1zWaYT3ZI6AmMzkCEBoUQ9QtPn8kCs
 vsEBVGGrSBHqGI8/TmgumYS0nn3q/wU4QcArcSM5Uo1OtOEbojuXeuORV33j2YxcBfnb
 kiaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc;
 bh=2UxvES/Nt7XSNT8IovIwmSfj/iLLw0k+Fi0kBHPpE6I=;
 b=Jr607GVVhqjXPop7ThIuWcnZ+9nuuX4JxDz+Zvfpsgng9CVO+BDce8Xx6RpewJJbXa
 h4/pYdcOIjk20mW5tDtc3cW634Wym8P24LClDAhv5uvmYmjNuaLNyZ6BW/wJ7QJpioed
 bAgZ9t4rGHuywafEOHTnNrQ5khHYuuqI6Qlf3rKKqimbSVkyNwmyi4mt9qlP8R8+it3c
 iw1PMKDvSEwnIP1217312KNUGHjSfdoIl5PTNlEl0yHUMqKn+c3x8/y6AmSai+KHbtXx
 qGzaOABXIhLCvOG7ZSuHTu1/Wt63G9VJS8yNet7sVt1S46z08e+XA8ZfpljaY3fc2XED
 dk3Q==
X-Gm-Message-State: ALyK8tJ+XCb3KwTX6qedYY/A3/2H8ZhlUDgH2BpZflTvqhjYYnNZCGAYc9RcLSVr4mA/4BaS2pL1/ZtQ+3bGaJRI
X-Received: by 10.194.150.167 with SMTP id uj7mr16004821wjb.168.1467729833520; 
 Tue, 05 Jul 2016 07:43:53 -0700 (PDT)
MIME-Version: 1.0
Sender: sobomax@sippysoft.com
Received: by 10.194.96.173 with HTTP; Tue, 5 Jul 2016 07:43:52 -0700 (PDT)
In-Reply-To: <20160705114808.GN38613@kib.kiev.ua>
References: <CAH7qZfu=XveZCAgS0+dzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
 <20160705114808.GN38613@kib.kiev.ua>
From: Maxim Sobolev <sobomax@freebsd.org>
Date: Tue, 5 Jul 2016 07:43:52 -0700
X-Google-Sender-Auth: E-lV_8x0x9_v8XbroxQzrseI7GE
Message-ID: <CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg@mail.gmail.com>
Subject: Re: A faulty program corrupts some its data preventing correct core
 generation (Failed to write core file for process postgres (error 14))
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: stable@freebsd.org, hackers@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 14:43:56 -0000

Seems like candidate for the MFC into releng/10.3 and appropriate errata
entry?

-Max

On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote:
> > Hi all, investigating some random postgresql-9.1.21 server crashes on
> > FreeBSD 10.3, we've started seeing those after upgrading from postgres
> > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
> > unlikely. I suspect that postgres is at fault, however I am also curious
> > how could it be that kernel is not capable of generating core file when
> > application does something silly? Is it that some ELF-related data
> > structures got corrupted or something else? Are we protecting the page
> > where ELF header is mapped with R/O flag? I am looking at possibly
> > recreating this by poking around elf header(s), seeing if I can corrupt
> it
> > in a similar manner reliably, any pointers or suggestions are
> appreciated.
> >
> > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
> > postgres (error 14)
> > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
> > signal 11
> > Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
> > postgres (error 14)
> > Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on
> signal
> > 11
> >
> > #define EFAULT          14              /* Bad address */
> >
> > The resulting files are truncated and is not really usable for anything.
> > We've seen the same issue
> >
> > -rw-------    1 pgsql     wheel     1310720 Jun 27 04:10
> postgres.41361.core
> > -rw-------    1 pgsql     wheel     1310720 Jul  1 05:21
> postgres.1722.core
> >
> > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
> > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
> > Copyright (C) 2016 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html
> > >
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-portbld-freebsd10.3".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> > <http://www.gnu.org/software/gdb/documentation/>.
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from postgres...(no debugging symbols found)...done.
> > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core
> file
> > size >= 517120000, found: 1310720.
> > [New LWP 100261]
> > Core was generated by `postgres'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> > (gdb) where
> > #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
> > (gdb) q
> >
> https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html
>
>

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 17:19:45 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 655BDB72D29
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 17:19:45 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 55E061DDD
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 17:19:44 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467739169055520.7792599844186;
 Tue, 5 Jul 2016 10:19:29 -0700 (PDT)
Date: Tue, 05 Jul 2016 10:19:28 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Karl Denninger" <karl@denninger.net>
Cc: "" <freebsd-hackers@freebsd.org>
Message-ID: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
In-Reply-To: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
Subject: Re: ZFS ARC and mmap/page cache coherency question
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 17:19:45 -0000


 ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger.net> wrote ---- 
 >  
 >  
 > On 7/4/2016 18:45, Matthew Macy wrote: 
 > > 
 > > 
 > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denninger.net> wrote ----  
 > >  >   
 > >  > On 7/3/2016 02:45, Matthew Macy wrote:  
 > >  > >           
 > >  > >             Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----   
 > >  >   
 > >  > Wellllll....  
 > >  >   
 > >  > I've done a fair bit of work here (see  
 > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the  
 > >  > political issues are at least as bad as the coding ones.  
 > >  >   
 > >   
 > > 
 > > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. 
 > > 
 > > -M 
 > > 
 >  
 > The ARC is very useful when it gets a hit as it avoid an I/O that would 
 > otherwise take place. 
 >  
 > Where it sucks is when the system evicts working set to preserve ARC.  
 > That's always wrong in that you're trading a speculative I/O (if the 
 > cache is hit later) for a *guaranteed* one (to page out) and maybe *two* 
 > (to page back in.) 
 
The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. There are a lot of issues stemming from the fact that ZFS is a transactional object store with a POSIX FS on top. One is that it caches disk blocks as opposed to file blocks. However, if one could resolve that and have the page cache manage these blocks life would be much much better. However, you'd lose MFU. Hence my question.

-M


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 17:35:14 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 264E9B73113
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 17:35:14 +0000 (UTC)
 (envelope-from fjwcash@gmail.com)
Received: from mail-qt0-x22e.google.com (mail-qt0-x22e.google.com
 [IPv6:2607:f8b0:400d:c0d::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D131617E2
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 17:35:13 +0000 (UTC)
 (envelope-from fjwcash@gmail.com)
Received: by mail-qt0-x22e.google.com with SMTP id c34so104445492qte.0
 for <freebsd-hackers@freebsd.org>; Tue, 05 Jul 2016 10:35:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=jnvYUDvcX2tBjz/VzmuPZ861AVn6KjURI8n5MId/F6I=;
 b=RmLU+AsQD2KGtVn22pW5mYwDHbUvoap4cVNPkQRXsCmEghE2Z2n/Pcv6suEJZNjgqE
 gWwXIVlQVAictZYBGKa/Vbyt/CRlcSFet6XpEvS2vn/vmoKXuovj/q0ff6dmlhGFMPIq
 Q5icKwweiJE8N1IHi4Gvnz2gT/bXgJkh/3BFvvy7UtmFh1R8bF6Gv5Uf6X3ouBlWzzZR
 Uhygu4p1G5kfzozHLOaAZMf33k1lniOaZLTnmY0BJl1DTerJHTv8uPdfCqqi6tZr60fI
 SS6PKmgkJtKghdyIHkc4m8oIrfreWu/hHdV87DlBsFnLniwSJnbPa3oeiAguuaYPuGj/
 bGWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=jnvYUDvcX2tBjz/VzmuPZ861AVn6KjURI8n5MId/F6I=;
 b=MyzLwKNccu/2bmMT54hAP5wMKrhPUIZH2m5ztG58FbvCCSI9x/E1ACuJg4k6SqfFhy
 levqIBF1Qc3mVyNtxPFfOSQFh8j+aC18p+YbYPel94StVaB/bUqg0NgozHixxxPUveRx
 qL8+e8xIf3h2maXJxRb1IxSNhYlXRsHCEmtp72VnK7g1yPHKnqr+IkmZRT2HaNsHxmv7
 uYey2C1I5eGWTaefcHMJMv8VWMLnDGaSC5xJUJvTJL6e7FtQPZ060KKc90Mbq436GFBb
 xmr9MeRWoRXhmiX831DaI3m9JaV8Za1QImDiv3Zfo6H6PMajB5CGkDEdyRql3sJ35Y6A
 8wHg==
X-Gm-Message-State: ALyK8tLEJ91rC1DNAhGpIkZgv3OERrk/Y0sxu5y6/rZjSgcbk25gX7O2QA5u2pSPMqlkXNrvYKGGeym0gX2S0A==
X-Received: by 10.200.52.197 with SMTP id x5mr28376658qtb.41.1467740112904;
 Tue, 05 Jul 2016 10:35:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.200.56.93 with HTTP; Tue, 5 Jul 2016 10:35:12 -0700 (PDT)
In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
From: Freddie Cash <fjwcash@gmail.com>
Date: Tue, 5 Jul 2016 10:35:12 -0700
Message-ID: <CAOjFWZ6qrE_JZUiXAojU3vOGmKHZB_kq7VAR4EhkkX2u2NSUHA@mail.gmail.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: Matthew Macy <mmacy@nextbsd.org>
Cc: Karl Denninger <karl@denninger.net>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 17:35:14 -0000

On Tue, Jul 5, 2016 at 10:19 AM, Matthew Macy <mmacy@nextbsd.org> wrote:

>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <
> karl@denninger.net> wrote ----
>  > On 7/4/2016 18:45, Matthew Macy wrote:
>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <
> karl@denninger.net> wrote ----
>  > >  >
>  > >  > On 7/3/2016 02:45, Matthew Macy wrote:
>  > >  > >
>  > >  > >             Cedric greatly overstates the intractability of
> resolving it. Nonetheless, since the initial import very little has been
> done to improve integration, and I don't know of anyone who is up to the
> task taking an interest in it. Consequently, mmap() performance is likely
> "doomed" for the foreseeable future.-M----
>  > >  >
>  > >  > Wellllll....
>  > >  >
>  > >  > I've done a fair bit of work here (see
>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and t=
he
>  > >  > political issues are at least as bad as the coding ones.
>  > >
>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per
> se. Have you ever tried disabling MFU caching to see how much worse LRU
> only is? I'm not really convinced the ARC's benefits justify its cost.
>  >
>  > The ARC is very useful when it gets a hit as it avoid an I/O that woul=
d
>  > otherwise take place.
>  >
>  > Where it sucks is when the system evicts working set to preserve ARC.
>  > That's always wrong in that you're trading a speculative I/O (if the
>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *tw=
o*
>  > (to page back in.)
>
> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU.
> There are a lot of issues stemming from the fact that ZFS is a
> transactional object store with a POSIX FS on top. One is that it caches
> disk blocks as opposed to file blocks. However, if one could resolve that
> and have the page cache manage these blocks life would be much much bette=
r.
> However, you'd lose MFU. Hence my question.
>

=E2=80=8BAre you confusing terms here?

Pretty sure the ARC uses MRU (Most Recently Used) and MFU (Most Frequently
Used) caches.  Not LRU (Least Recently Used).

Or am I misunderstanding what you're trying to say?
=E2=80=8B


--=20
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 17:43:06 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96298B7330E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 17:43:06 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6FCCC1C16
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 17:43:06 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467740580472931.4090951466359;
 Tue, 5 Jul 2016 10:43:00 -0700 (PDT)
Date: Tue, 05 Jul 2016 10:43:00 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Freddie Cash" <fjwcash@gmail.com>
Cc: "FreeBSD Hackers" <freebsd-hackers@freebsd.org>, 
 "Karl Denninger" <karl@denninger.net>
Message-ID: <155bc27ea44.c75d1029200540.4499688981397092064@nextbsd.org>
In-Reply-To: <CAOjFWZ6qrE_JZUiXAojU3vOGmKHZB_kq7VAR4EhkkX2u2NSUHA@mail.gmail.com>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
 <CAOjFWZ6qrE_JZUiXAojU3vOGmKHZB_kq7VAR4EhkkX2u2NSUHA@mail.gmail.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 17:43:06 -0000


 ---- On Tue, 05 Jul 2016 10:35:12 -0700 Freddie Cash <fjwcash@gmail.com> w=
rote ----=20
 > On Tue, Jul 5, 2016 at 10:19 AM, Matthew Macy <mmacy@nextbsd.org> wrote:
 >=20
 > >  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <
 > > karl@denninger.net> wrote ----
 > >  > On 7/4/2016 18:45, Matthew Macy wrote:
 > >  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <
 > > karl@denninger.net> wrote ----
 > >  > >  >
 > >  > >  > On 7/3/2016 02:45, Matthew Macy wrote:
 > >  > >  > >
 > >  > >  > >             Cedric greatly overstates the intractability of
 > > resolving it. Nonetheless, since the initial import very little has be=
en
 > > done to improve integration, and I don't know of anyone who is up to t=
he
 > > task taking an interest in it. Consequently, mmap() performance is lik=
ely
 > > "doomed" for the foreseeable future.-M----
 > >  > >  >
 > >  > >  > Wellllll....
 > >  > >  >
 > >  > >  > I've done a fair bit of work here (see
 > >  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) an=
d the
 > >  > >  > political issues are at least as bad as the coding ones.
 > >  > >
 > >  > > Strictly speaking, the root of the problem is the ARC. Not ZFS pe=
r
 > > se. Have you ever tried disabling MFU caching to see how much worse LR=
U
 > > only is? I'm not really convinced the ARC's benefits justify its cost.
 > >  >
 > >  > The ARC is very useful when it gets a hit as it avoid an I/O that w=
ould
 > >  > otherwise take place.
 > >  >
 > >  > Where it sucks is when the system evicts working set to preserve AR=
C.
 > >  > That's always wrong in that you're trading a speculative I/O (if th=
e
 > >  > cache is hit later) for a *guaranteed* one (to page out) and maybe =
*two*
 > >  > (to page back in.)
 > >
 > > The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU.
 > > There are a lot of issues stemming from the fact that ZFS is a
 > > transactional object store with a POSIX FS on top. One is that it cach=
es
 > > disk blocks as opposed to file blocks. However, if one could resolve t=
hat
 > > and have the page cache manage these blocks life would be much much be=
tter.
 > > However, you'd lose MFU. Hence my question.
 > >
 >=20
 > =E2=80=8BAre you confusing terms here?
 >=20
 > Pretty sure the ARC uses MRU (Most Recently Used) and MFU (Most Frequent=
ly
 > Used) caches.  Not LRU (Least Recently Used).
 >=20
 > Or am I misunderstanding what you're trying to say?
=20
If it caches based on MRU, by definition it evicts LRU. I did mix caching p=
olicy with eviction policy in the same sentence which is obviously not corr=
ect. Nonetheless, it should be obvious that I meant MFU+MRU caching vs MRU =
caching only.

Thanks.
-M


From owner-freebsd-hackers@freebsd.org  Tue Jul  5 17:50:37 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36FF7B73515
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 17:50:37 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id D2CD61EEF
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 17:50:36 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 013CB220426
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 12:50:33 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
Date: Tue, 5 Jul 2016 12:50:16 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms040109070705040203000606"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 17:50:37 -0000

This is a cryptographically signed message in MIME format.

--------------ms040109070705040203000606
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On 7/5/2016 12:19, Matthew Macy wrote:
>
>
>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger=
=2Enet> wrote ----=20
>  > =20
>  > =20
>  > On 7/4/2016 18:45, Matthew Macy wrote:=20
>  > >=20
>  > >=20
>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denn=
inger.net> wrote ---- =20
>  > >  >  =20
>  > >  > On 7/3/2016 02:45, Matthew Macy wrote: =20
>  > >  > >          =20
>  > >  > >             Cedric greatly overstates the intractability of r=
esolving it. Nonetheless, since the initial import very little has been d=
one to improve integration, and I don't know of anyone who is up to the t=
ask taking an interest in it. Consequently, mmap() performance is likely =
"doomed" for the foreseeable future.-M----  =20
>  > >  >  =20
>  > >  > Wellllll.... =20
>  > >  >  =20
>  > >  > I've done a fair bit of work here (see =20
>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and=
 the =20
>  > >  > political issues are at least as bad as the coding ones. =20
>  > >  >  =20
>  > >  =20
>  > >=20
>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per=
 se. Have you ever tried disabling MFU caching to see how much worse LRU =
only is? I'm not really convinced the ARC's benefits justify its cost.=20
>  > >=20
>  > > -M=20
>  > >=20
>  > =20
>  > The ARC is very useful when it gets a hit as it avoid an I/O that wo=
uld=20
>  > otherwise take place.=20
>  > =20
>  > Where it sucks is when the system evicts working set to preserve ARC=
=2E =20
>  > That's always wrong in that you're trading a speculative I/O (if the=
=20
>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *=
two*=20
>  > (to page back in.)=20
> =20
> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. T=
here are a lot of issues stemming from the fact that ZFS is a transaction=
al object store with a POSIX FS on top. One is that it caches disk blocks=
 as opposed to file blocks. However, if one could resolve that and have t=
he page cache manage these blocks life would be much much better. However=
, you'd lose MFU. Hence my question.
>
> -M
>
I suspect there's an argument to be made there but the present problems
make determining the impact of that difficult or impossible as those
effects are swamped by the other issues.

I can fairly-easily create workloads on the base code where simply
typing "vi <some file>", making a change and hitting ":w" will result in
a stall of tens of seconds or more while the cache flush that gets
requested is run down.  I've resolved a good part (but not all
instances) of this through my work.

My understanding is that 11- has had additional work done to the base
code, but three underlying issues are not, from what I can see in the
commit logs and discussions, addressed: The VM system will page out
working set while leaving ARC alone, UMA reserved-but-not-in-use space
is not policed adequately when memory pressure exists *before* the pager
starts considering evicting working set and the write-back cache is for
many machine configurations grossly inappropriate and cannot be tuned
adequately by hand (particularly being true on a system with vdevs that
have materially-varying performance levels.)

I have more-or-less stopped work on the tree on a forward basis since I
got to a place with 10.2 that (1) works for my production requirements,
resolving the problems and (2) ran into what I deemed to be intractable
political issues within core on progress toward eradicating the root of
the problem.

I will probably revisit the situation with 11- at some point, as I'll
want to roll my production systems forward.  However, I don't know when
that will be -- right now 11- is stable enough for some of my embedded
work (e.g. on the Raspberry Pi2) but is not on my server and
client-class machines.  Indeed just yesterday I got a lock-order
reversal panic while doing a shutdown after a kernel update on one of my
lab boxes running a just-updated 11- codebase.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms040109070705040203000606
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNzUwMTZaME8GCSqGSIb3DQEJBDFCBEB+
uW3KWU2eWDSXQTUP44BqHki8DdlspeuMs4iJnNFKXBwEb87FP/Qe3cSJk7JA9zPF4h13zPI8
Df2xbeNhsq9JMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAQmw70oJD
QhBLWxXdxGwD1Dws9tblRJ67e7dRElxtME/yJs1Gxtl4o4hwC76qd4mMmJ5wrCMcaZ9qDZwX
TKpC5/fWGU/sqXv4utH6fF18lbimDjm/SywA06DXwklNWHs+Y9k9HU06FXHn+n71wKHjR6t4
lRqF5yt6Uf7MK9quuL3l06HXgwoQZf75IR3WNSCvbrujAgLQDhjaaHLv12HiQPwbKsL5dAS2
PeF4wenKdi46Buil3qZ2EW7jrkoFoe2toUjak9skpZwFUD8X6ddPJf/kaofxq8bO7CJ4+bVx
ypOlRVNxVOEbRN5NNdHyel0hhFyNGVDiuOkrzOzhk1YBxRy0nYAeP/0DkhkZLcEEPyqLX9Kb
HH9Iy3kHEgJvw1vmvA+Jlpxrp1WcE7/pMQYndb2EfLXXNKaoJ0SnLlhD5uva/M00IxU+Rmr2
TolbZP5/pLsUYgiFkujv0jh/ChTOoEvIJFQNn3OELCI+MJPmJG6x9NVNBb4CmaiuP2L5IKNY
/59qJVeS1CwVZAPAHUGRMc900VFi3HS1mLvyZC7NBCI1Fzp5V7Qrw6lh3gNNGr9PolxhaCS0
rRTLk1QrEyhmxCof/WQQHBWJqdhoTRu5TU8hSZoPmRCDbfGIWjphhTfCtXVDetDYJojtXnFn
Aq/qFus05SnoKigpGQhxSEo3dCoAAAAAAAA=
--------------ms040109070705040203000606--

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 18:26:53 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF911B201E9
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 18:26:53 +0000 (UTC)
 (envelope-from nwhitehorn@freebsd.org)
Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id BB1E11ABB;
 Tue,  5 Jul 2016 18:26:53 +0000 (UTC)
 (envelope-from nwhitehorn@freebsd.org)
Received: from zeppelin.tachypleus.net (c-50-139-166-237.hsd1.ma.comcast.net
 [50.139.166.237]) (authenticated bits=0)
 by c.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id u65IQif4011024
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT);
 Tue, 5 Jul 2016 11:26:45 -0700
Subject: Re: Review request: sparse CPU ID maps
To: Warner Losh <imp@bsdimp.com>, Adrian Chadd <adrian@freebsd.org>
References: <57761101.3030101@freebsd.org>
 <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
 <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
 <CAJ-Vmon4kRNc5LiwibtiPi_FQ1v5w_MQEjP+OfcC7J74iTKs0A@mail.gmail.com>
 <CANCZdfpJJLoKxB-ZdMRQyHq9eT1uihg4UGeBvRgBEOOC1pt_Yg@mail.gmail.com>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 outro pessoa <outro.pessoa@gmail.com>
From: Nathan Whitehorn <nwhitehorn@freebsd.org>
Message-ID: <59222776-45b4-640c-b5e4-5f8b8d6c45e5@freebsd.org>
Date: Tue, 5 Jul 2016 11:26:43 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <CANCZdfpJJLoKxB-ZdMRQyHq9eT1uihg4UGeBvRgBEOOC1pt_Yg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Sonic-CAuth: UmFuZG9tSVYJgx+TKnWdwLYYgGpUKMH24lQEzyNrm2WuvLTkkvMD93V4HNeLJsFBsFz+JFeBv5pVObXO6FwdyMW4zdnjoesyEkATbPUrXbI=
X-Sonic-ID: C;1EfABt5C5hG6rZtMTlz00w== M;tjdIB95C5hG6rZtMTlz00w==
X-Spam-Flag: No
X-Sonic-Spam-Details: 0.0/5.0 by cerberusd
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 18:26:53 -0000


On 07/03/16 13:11, Warner Losh wrote:
> On Sun, Jul 3, 2016 at 1:37 PM, Adrian Chadd <adrian@freebsd.org> wrote:
>> On 2 July 2016 at 17:08, Nathan Whitehorn <nwhitehorn@freebsd.org> wrote:
>>> A reasonable first pass at checking for this kind of bug is doing grep -lR
>>> '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following
>>> files:
>>> arm/mv/armadaxp/armadaxp_mp.c
>>> arm/include/counter.h
>>> arm/broadcom/bcm2835/bcm2836.c
>>> arm/broadcom/bcm2835/bcm2836_mp.c
>>> arm/freescale/imx/imx6_mp.c
>>> arm/allwinner/aw_mp.c
>>> arm/rockchip/rk30xx_mp.c
>>> arm/amlogic/aml8726/aml8726_mp.c
>>> arm/samsung/exynos/exynos5_mp.c
>>> arm/arm/mp_machdep.c
>>> arm/nvidia/tegra124/tegra124_mp.c
>>> arm64/include/counter.h
>>> arm64/arm64/gic_v3.c
>>> arm64/arm64/gic_v3_its.c
>>> arm64/arm64/gicv3_its.c
>>>
>>> All of them should, in some sense, be CPU_FOREACH(), but it may not matter.
>>> For example, it may not be possible to have sparse CPU IDs on some or all of
>>> those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are
>>> there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think.
>>> -Nathan
>> I think converting all the users over to the CPU_FOREACH thing is the
>> right way to go, even if the SOC doesn't require it. People do bring
>> up new systems by copy/pasta'ing an existing similar system, so we're
>> best served by having all the consumers migrated.
>>
>> But, I'd do it in head/12. Early in head/12. :-P
> It is a mergeable change too, since it wouldn't change any APIs.
> At least the conversion to CPU_FOREACH. We don't want too many
> sweeping changes that can't be merged too early (that way leads to
> lots of maintenance issues), but we can do something like this. Merging
> would be optional, but possible, for those bits of the tree that need it.
> Though, for something like this, there's little against doing a full merge
> and a lot for it...
>
> Warner
>

That sounds like the right approach. Since the original patch fixes bugs 
in 11, rather than niceties, I will send it to re@ tomorrow. After the 
branch, I'll do a sweep of other obviously wrong code for 12 with an MFC 
timer.
-Nathan

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 18:40:32 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9DD6B2047C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 18:40:32 +0000 (UTC)
 (envelope-from lionelcons1972@gmail.com)
Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com
 [IPv6:2607:f8b0:4002:c05::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A52791226
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 18:40:32 +0000 (UTC)
 (envelope-from lionelcons1972@gmail.com)
Received: by mail-yw0-x230.google.com with SMTP id i12so68519410ywa.1
 for <freebsd-hackers@freebsd.org>; Tue, 05 Jul 2016 11:40:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=;
 b=vBgOQy5Ic2H1eLXr9vQeHdHKc0D3MaSbb2wBkoS34hHFVbySfDlysBU+wU/LRBNIcG
 z3DbsKcJR7l1nTWsdrZebd0vVT3+RsLvJ5KM34hiZ+sE48RQnHfwuUcC3gJdGptLyk3t
 idWPbroBB6VWMbP+S6hEgzXnjW7hmR5+84ix0nowTCc1JUiuW+w0yJsWpoS5aPfOhJd6
 DGybS7JVV/NeyKwAZUdrWpgJC/g7H37sLpURdA14XXYcBdogccNLksxBMdZHRQAqLOmY
 LCGw2ohbP982gLLvhRk+C/0gvOBBXQjwXhtVOi2utVPpUdExlAOa2Pf25cuuBUNPP6zZ
 pSsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=;
 b=GuQ/1vvDZxiLpzxEhiym7ArkS9LGSx0f5g3s7e2yu6ay/wXhRSPCXK8ezTl1KH2VI2
 1u6kpvCn8vckoeKmNiPtvSCh8JuIIjiQY5OB8pTWkXKyg+SwA3e1gdAHKWawTdxzWbrD
 D8i8fiRTUnKImWKklGnQhWOUnbTquKVi1dHRA4YCGVRdrgZUzc541ECsJexmQM0xKWQn
 rX4bNjHvL1AfMJRAARMNJynwQNTmE2L9sBbh/75d9lsVlghuk5wIgDfbUSubc0cJVRUc
 o7lPf42PYbyPwEr2tjwINMl81RIvpcDlcVs9m/rj9qOmJpNJlPFAkvW8qXMzCph6HlNI
 9d2g==
X-Gm-Message-State: ALyK8tIpkSB5wRy0WwfVPm7zk7XBIVFsxMn1GNaYgN/Djr/vGNc35akJCJCPa16PyfqHyFmVAgnYDRuUyREnsQ==
X-Received: by 10.129.50.83 with SMTP id y80mr12096377ywy.305.1467744031740;
 Tue, 05 Jul 2016 11:40:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.193.194 with HTTP; Tue, 5 Jul 2016 11:40:30 -0700 (PDT)
In-Reply-To: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
 <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
From: Lionel Cons <lionelcons1972@gmail.com>
Date: Tue, 5 Jul 2016 20:40:30 +0200
Message-ID: <CAPJSo4VtJ1+txt4s13nKSWrj9fDTv5VsLVyMsX+DarBUVYMbOQ@mail.gmail.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: Karl Denninger <karl@denninger.net>
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 18:40:33 -0000

So what Oracle did (based on work done by SUN for Opensolaris) was to:
1. Modify ZFS to prevent *ANY* double/multi caching [this is
considered a design defect]
2. Introduce a new VM subsystem which scales a lot better and provides
hooks for [1] so there are never two or more copies of the same data
in the system

Given that this was a huge, paid, multiyear effort its not likely
going to happen that the design defects in opensource ZFS will ever go
away.

Lionel

On 5 July 2016 at 19:50, Karl Denninger <karl@denninger.net> wrote:
>
> On 7/5/2016 12:19, Matthew Macy wrote:
>>
>>
>>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denninger.=
net> wrote ----
>>  >
>>  >
>>  > On 7/4/2016 18:45, Matthew Macy wrote:
>>  > >
>>  > >
>>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@denni=
nger.net> wrote ----
>>  > >  >
>>  > >  > On 7/3/2016 02:45, Matthew Macy wrote:
>>  > >  > >
>>  > >  > >             Cedric greatly overstates the intractability of re=
solving it. Nonetheless, since the initial import very little has been done=
 to improve integration, and I don't know of anyone who is up to the task t=
aking an interest in it. Consequently, mmap() performance is likely "doomed=
" for the foreseeable future.-M----
>>  > >  >
>>  > >  > Wellllll....
>>  > >  >
>>  > >  > I've done a fair bit of work here (see
>>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and =
the
>>  > >  > political issues are at least as bad as the coding ones.
>>  > >  >
>>  > >
>>  > >
>>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS per =
se. Have you ever tried disabling MFU caching to see how much worse LRU onl=
y is? I'm not really convinced the ARC's benefits justify its cost.
>>  > >
>>  > > -M
>>  > >
>>  >
>>  > The ARC is very useful when it gets a hit as it avoid an I/O that wou=
ld
>>  > otherwise take place.
>>  >
>>  > Where it sucks is when the system evicts working set to preserve ARC.
>>  > That's always wrong in that you're trading a speculative I/O (if the
>>  > cache is hit later) for a *guaranteed* one (to page out) and maybe *t=
wo*
>>  > (to page back in.)
>>
>> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. Th=
ere are a lot of issues stemming from the fact that ZFS is a transactional =
object store with a POSIX FS on top. One is that it caches disk blocks as o=
pposed to file blocks. However, if one could resolve that and have the page=
 cache manage these blocks life would be much much better. However, you'd l=
ose MFU. Hence my question.
>>
>> -M
>>
> I suspect there's an argument to be made there but the present problems
> make determining the impact of that difficult or impossible as those
> effects are swamped by the other issues.
>
> I can fairly-easily create workloads on the base code where simply
> typing "vi <some file>", making a change and hitting ":w" will result in
> a stall of tens of seconds or more while the cache flush that gets
> requested is run down.  I've resolved a good part (but not all
> instances) of this through my work.
>
> My understanding is that 11- has had additional work done to the base
> code, but three underlying issues are not, from what I can see in the
> commit logs and discussions, addressed: The VM system will page out
> working set while leaving ARC alone, UMA reserved-but-not-in-use space
> is not policed adequately when memory pressure exists *before* the pager
> starts considering evicting working set and the write-back cache is for
> many machine configurations grossly inappropriate and cannot be tuned
> adequately by hand (particularly being true on a system with vdevs that
> have materially-varying performance levels.)
>
> I have more-or-less stopped work on the tree on a forward basis since I
> got to a place with 10.2 that (1) works for my production requirements,
> resolving the problems and (2) ran into what I deemed to be intractable
> political issues within core on progress toward eradicating the root of
> the problem.
>
> I will probably revisit the situation with 11- at some point, as I'll
> want to roll my production systems forward.  However, I don't know when
> that will be -- right now 11- is stable enough for some of my embedded
> work (e.g. on the Raspberry Pi2) but is not on my server and
> client-class machines.  Indeed just yesterday I got a lock-order
> reversal panic while doing a shutdown after a kernel update on one of my
> lab boxes running a just-updated 11- codebase.
>
> --
> Karl Denninger
> karl@denninger.net <mailto:karl@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/


--=20
Lionel

From owner-freebsd-hackers@freebsd.org  Tue Jul  5 19:09:14 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3218BB20EDE
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Tue,  5 Jul 2016 19:09:14 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id E58741AFF
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 19:09:13 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 7C3CC2209A0
 for <freebsd-hackers@freebsd.org>; Tue,  5 Jul 2016 14:09:12 -0500 (CDT)
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: freebsd-hackers@freebsd.org
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
 <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org>
 <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net>
 <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org>
 <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net>
 <CAPJSo4VtJ1+txt4s13nKSWrj9fDTv5VsLVyMsX+DarBUVYMbOQ@mail.gmail.com>
From: Karl Denninger <karl@denninger.net>
Message-ID: <2be70811-add4-d630-7f5a-a5a53ee2a5d4@denninger.net>
Date: Tue, 5 Jul 2016 14:08:55 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <CAPJSo4VtJ1+txt4s13nKSWrj9fDTv5VsLVyMsX+DarBUVYMbOQ@mail.gmail.com>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms070700000603020002030909"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jul 2016 19:09:14 -0000

This is a cryptographically signed message in MIME format.

--------------ms070700000603020002030909
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

You'd get most of the way to what Oracle did, I suspect, if the system:

1. Dynamically resized the write cache on a per-vdev basis so as to
prevent a flush from stalling all write I/O for a material amount of
time (which can and *does* happen now)

2. Made VM aware of UMA "committed-but-free" on an ongoing basis and
policed it on a sliding basis (that is, as RAM pressure rises VM
considers it more-important to reap UMA so as to prevent
marked-used-but-in-fact-free RAM from accumulating when RAM is under
pressure.)

3. Bi-directionally hooked VM so that it initiates and cooperates with
ZFS on ARC size management.  Specifically, if ZFS decides ARC is to be
reaped then it must notify VM so that (1) UMA can be reaped first, if
necessary and then if ARC *still* needs to be reaped it occurs *before*
VM pages anything out.  If and only if ARC is at minimum should the VM
system evict working set to the pagefile.

#1 is entirely within ZFS but is fairly hard to do well, and neither
Illumos or FreeBSD's team have taken a serious crack at it.

#2 I've taken a fairly decent look at but not implemented code on the VM
side to do it.  What I *have* done is implemented code on the ZFS side
to do it within the ZFS paradigm, which is technically in the wrong
place but works pretty well -- so long as the UMA fragmentation is
coming from ZFS.

#3 is a bear, especially if you don't move that code into VM (which
intimately "marries" the ZFS and VM code; that's very bad from a
maintainability perspective.)  What I've implemented is somewhat of a
hack in that regard in that it has ZFS triggering before VM does, it
gets aggressive with reaping its own UMA areas and the writeback cache
when there is RAM pressure and thus *most* of the time avoids the paging
pathology while allowing the ARC to use the truly-free RAM.  It ought to
be in the VM code however, because the pressure sometimes does not come
from ZFS.

This is why one of my production machines looks like right now with the
patch in -- this system runs a quite-active Postgres database along with
a material number of other things at the same time; this doesn't look
bad at all in terms of efficiency.

[karl@NewFS ~]$ zfs-stats -A

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Jul  5 14:05:06 2016
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                29.11m
        Recycle Misses:                         0
        Mutex Misses:                           67.14k
        Evict Skips:                            72.84m

ARC Size:                               72.10%  16.10   GiB
        Target Size: (Adaptive)         83.00%  18.53   GiB
        Min Size (Hard Limit):          12.50%  2.79    GiB
        Max Size (High Water):          8:1     22.33   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       81.84%  15.17   GiB
        Frequently Used Cache Size:     18.16%  3.37    GiB

ARC Hash Breakdown:
        Elements Max:                           1.84m
        Elements Current:               33.47%  614.39k
        Collisions:                             41.78m
        Chain Max:                              6
        Chains:                                 39.45k

------------------------------------------------------------------------

ARC Efficiency:                                 1.88b
        Cache Hit Ratio:                78.45%  1.48b
        Cache Miss Ratio:               21.55%  405.88m
        Actual Hit Ratio:               77.46%  1.46b

        Data Demand Efficiency:         77.97%  1.45b
        Data Prefetch Efficiency:       24.82%  9.07m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             0.52%   7.62m
          Most Recently Used:           8.38%   123.87m
          Most Frequently Used:         90.36%  1.34b
          Most Recently Used Ghost:     0.18%   2.65m
          Most Frequently Used Ghost:   0.56%   8.30m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  76.71%  1.13b
          Prefetch Data:                0.15%   2.25m
          Demand Metadata:              21.82%  322.33m
          Prefetch Metadata:            1.33%   19.58m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  78.91%  320.29m
          Prefetch Data:                1.68%   6.82m
          Demand Metadata:              16.70%  67.79m
          Prefetch Metadata:            2.70%   10.97m

------------------------------------------------------------------------

The system currently has 20Gb wired, ~3Gb free and ~1Gb inactive with a
tiny amount in the cache bucket (~46mb)

On 7/5/2016 13:40, Lionel Cons wrote:
> So what Oracle did (based on work done by SUN for Opensolaris) was to:
> 1. Modify ZFS to prevent *ANY* double/multi caching [this is
> considered a design defect]
> 2. Introduce a new VM subsystem which scales a lot better and provides
> hooks for [1] so there are never two or more copies of the same data
> in the system
>
> Given that this was a huge, paid, multiyear effort its not likely
> going to happen that the design defects in opensource ZFS will ever go
> away.
>
> Lionel
>
> On 5 July 2016 at 19:50, Karl Denninger <karl@denninger.net> wrote:
>> On 7/5/2016 12:19, Matthew Macy wrote:
>>>
>>>  ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl@denning=
er.net> wrote ----
>>>  >
>>>  >
>>>  > On 7/4/2016 18:45, Matthew Macy wrote:
>>>  > >
>>>  > >
>>>  > >  ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl@de=
nninger.net> wrote ----
>>>  > >  >
>>>  > >  > On 7/3/2016 02:45, Matthew Macy wrote:
>>>  > >  > >
>>>  > >  > >             Cedric greatly overstates the intractability of=
 resolving it. Nonetheless, since the initial import very little has been=
 done to improve integration, and I don't know of anyone who is up to the=
 task taking an interest in it. Consequently, mmap() performance is likel=
y "doomed" for the foreseeable future.-M----
>>>  > >  >
>>>  > >  > Wellllll....
>>>  > >  >
>>>  > >  > I've done a fair bit of work here (see
>>>  > >  > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) a=
nd the
>>>  > >  > political issues are at least as bad as the coding ones.
>>>  > >  >
>>>  > >
>>>  > >
>>>  > > Strictly speaking, the root of the problem is the ARC. Not ZFS p=
er se. Have you ever tried disabling MFU caching to see how much worse LR=
U only is? I'm not really convinced the ARC's benefits justify its cost.
>>>  > >
>>>  > > -M
>>>  > >
>>>  >
>>>  > The ARC is very useful when it gets a hit as it avoid an I/O that =
would
>>>  > otherwise take place.
>>>  >
>>>  > Where it sucks is when the system evicts working set to preserve A=
RC.
>>>  > That's always wrong in that you're trading a speculative I/O (if t=
he
>>>  > cache is hit later) for a *guaranteed* one (to page out) and maybe=
 *two*
>>>  > (to page back in.)
>>>
>>> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU.=
 There are a lot of issues stemming from the fact that ZFS is a transacti=
onal object store with a POSIX FS on top. One is that it caches disk bloc=
ks as opposed to file blocks. However, if one could resolve that and have=
 the page cache manage these blocks life would be much much better. Howev=
er, you'd lose MFU. Hence my question.
>>>
>>> -M
>>>
>> I suspect there's an argument to be made there but the present problem=
s
>> make determining the impact of that difficult or impossible as those
>> effects are swamped by the other issues.
>>
>> I can fairly-easily create workloads on the base code where simply
>> typing "vi <some file>", making a change and hitting ":w" will result =
in
>> a stall of tens of seconds or more while the cache flush that gets
>> requested is run down.  I've resolved a good part (but not all
>> instances) of this through my work.
>>
>> My understanding is that 11- has had additional work done to the base
>> code, but three underlying issues are not, from what I can see in the
>> commit logs and discussions, addressed: The VM system will page out
>> working set while leaving ARC alone, UMA reserved-but-not-in-use space=

>> is not policed adequately when memory pressure exists *before* the pag=
er
>> starts considering evicting working set and the write-back cache is fo=
r
>> many machine configurations grossly inappropriate and cannot be tuned
>> adequately by hand (particularly being true on a system with vdevs tha=
t
>> have materially-varying performance levels.)
>>
>> I have more-or-less stopped work on the tree on a forward basis since =
I
>> got to a place with 10.2 that (1) works for my production requirements=
,
>> resolving the problems and (2) ran into what I deemed to be intractabl=
e
>> political issues within core on progress toward eradicating the root o=
f
>> the problem.
>>
>> I will probably revisit the situation with 11- at some point, as I'll
>> want to roll my production systems forward.  However, I don't know whe=
n
>> that will be -- right now 11- is stable enough for some of my embedded=

>> work (e.g. on the Raspberry Pi2) but is not on my server and
>> client-class machines.  Indeed just yesterday I got a lock-order
>> reversal panic while doing a shutdown after a kernel update on one of =
my
>> lab boxes running a just-updated 11- codebase.
>>
>> --
>> Karl Denninger
>> karl@denninger.net <mailto:karl@denninger.net>
>> /The Market Ticker/
>> /[S/MIME encrypted email preferred]/
>
>

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms070700000603020002030909
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxOTA4NTVaME8GCSqGSIb3DQEJBDFCBED7
nX0i7Tq1EG+NZ07b4ciG2M2dlPOJhMp7qWSAVTBZF7zpU1fWik5soqXY+W3tvS8F1b0AM0fw
AItIRDGAAGjnMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAMzDbYeTj
MuQjFIFwt58V8f59IO003Oz6kMDf17uEqhVFg8mr+fd8x01kbb/PVdl5JdY7Yao3xGNUHl3X
/Sy/yAdQQlgCtrpycO/GBrycnkK5tLh8DlluKisxWIarwaHiwwXIwl8xwAgc0KevBkqVuuiW
VYTJMToLwnMbkXFZbLY6AovBUX6aPucjhlROXvUXWl7wG8/+g96rpDZHoHmE6DNK9bhZhekj
UQHcDARuhYa/0aQGZcAPzndpba8RVnPOgY+OqxnL1XJrsTPbVi4pvymcYz4oSKNVdps8vt9L
aZDJUh1vcWTVh+4rDXQWHTPDtarJBUiYKUpErzIQtgPzfClvBtfm0VMm3aGCCFDciD1gndVo
nqo5cH4dyUmxxivWVniLU14CuWBcL/fEbSljRp+Gd5BgGk7/QD8UdAU3uiby6TolZvQ5S0Sk
k0p3edFUQc8OeerZ5BoFU5jD5ogwjzgF+A8ot6qmisq9CcB+2cLHF3L6l+sCDz2grmVu8kGB
iVmIdXrc4qKdIB/yOzjjluNCywvUSrjFsL3FCAJObc/ydEoymDFfSfY2rfyFs120DkNkaQry
3JCriuwUYqfV7ZzEvSK7yjp4fXRhVhi9Ez56iuFJRH/y9A1Ydv7xxyCtCcVzqWB7xHkj43Hg
lrv1CUp++UIuFbt9XCRI1tgAQwoAAAAAAAA=
--------------ms070700000603020002030909--

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 15:18:36 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1C89FB75E29;
 Wed,  6 Jul 2016 15:18:36 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A13B21865;
 Wed,  6 Jul 2016 15:18:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u66FIMY6079547
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 6 Jul 2016 18:18:23 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u66FIMY6079547
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u66FIM1Y079519;
 Wed, 6 Jul 2016 18:18:22 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 6 Jul 2016 18:18:22 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: David Cross <dcrosstech@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
Message-ID: <20160706151822.GC38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 15:18:36 -0000

On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote:
> Ok.. to reply to my own message, I using ktr and debugging printfs I have
> found the culprit.. but I am still at a loss to 'why', or what the
> appropriate fix is.
> 
> Lets go back to the panic (simplified)
> 
> #0 0xffffffff8043f160 at kdb_backtrace+0x60
> #1 0xffffffff80401454 at vpanic+0x124
> #2 0xffffffff804014e3 at panic+0x43
> #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
> #4 0xffffffff80499cc1 at brelse+0x151
> #5 0xffffffff804979b1 at bufwrite+0x81
> #6 0xffffffff80623c80 at ffs_write+0x4b0
> #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> #11 0xffffffff806588e6 at vm_pageout_flush+0x116
> #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> #13 0xffffffff80651519 at vm_object_page_clean+0x179
> #14 0xffffffff80651102 at vm_object_terminate+0xa2
> #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> #16 0xffffffff8062a52f at ufs_reclaim+0x1f
> #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> 
> Via KTR logging I determined that the dangling dependedency was on a
> freshly allocated buf, *after* vinvalbuf in the vgonel() (so in VOP_RECLAIM
> itself), called by the vnode lru cleanup process; I further noticed that it
> was in a newbuf that recycled a bp (unimportant, except it let me narrow
> down my logging to something managable), from there I get this stacktrace
> (simplified)
> 
> #0 0xffffffff8043f160 at kdb_backtrace+0x60
> #1 0xffffffff8049c98e at getnewbuf+0x4be
> #2 0xffffffff804996a0 at getblk+0x830
> #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
> #4 0xffffffff80623b0b at ffs_write+0x33b
> #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> #9 0xffffffff806588e6 at vm_pageout_flush+0x116
> #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> #11 0xffffffff80651519 at vm_object_page_clean+0x179
> #12 0xffffffff80651102 at vm_object_terminate+0xa2
> #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> #14 0xffffffff8062a52f at ufs_reclaim+0x1f
> #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> #16 0xffffffff804b6c6e at vgonel+0x2ee
> #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5
> 
> addr2line on the ffs_balloc_ufs2 gives:
> /usr/src/sys/ufs/ffs/ffs_balloc.c:778:
> 
>                         bp = getblk(vp, lbn, nsize, 0, 0, gbflags);
>                         bp->b_blkno = fsbtodb(fs, newb);
>                         if (flags & BA_CLRBUF)
>                                 vfs_bio_clrbuf(bp);
>                         if (DOINGSOFTDEP(vp))
>                                 softdep_setup_allocdirect(ip, lbn, newb, 0,
>                                     nsize, 0, bp);
> 
> 
> Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM
> handles this, this is after vinvalbuf is called, it expects that everything
> is flushed to disk and its just about releasing structures (is my read of
> the code).
> 
> Now, perhaps this is a good assumption?  the question then is how is this
> buffer hanging out there surviving a a vinvalbuf.  I will note that my
> test-case that finds this runs and terminates *minutes* before... its not
> just hanging out there in a race, its surviving background sync, fsync,
> etc... wtf?  Also, I *can* unmount the FS without an error, so that
> codepath is either ignoring this buffer, or its forcing a sync in a way
> that doesn't panic?
Most typical cause for the buffer dependencies not flushed is a buffer
write error.  At least you could provide the printout of the buffer to
confirm or reject this assumption.

Were there any kernel messages right before the panic ?  Just in case,
did you fsck the volume before using it, after the previous panic ?

> 
> Anyone have next steps?  I am making progress here, but its really slow
> going, this is probably the most complex portion of the kernel and some
> pointers would be helpful.
> 
> On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech@gmail.com> wrote:
> 
> > Ok, I have been trying to trace this down for awhile..I know quite a bit
> > about it.. but there's a lot I don't know, or I would have a patch.  I have
> > been trying to solve this on my own, but bringing in some outside
> > assistance will let me move on with my life.
> >
> > First up:  The stacktrace (from a debugging kernel, with coredump
> >
> > #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
> > #1  0xffffffff8071018a in kern_reboot (howto=260)
> >     at /usr/src/sys/kern/kern_shutdown.c:486
> > #2  0xffffffff80710afc in vpanic (
> >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps
> > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d",
> > ap=0xfffffe023ae5cf40)
> >     at /usr/src/sys/kern/kern_shutdown.c:889
> > #3  0xffffffff807108c0 in panic (
> >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps
> > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d")
> >     at /usr/src/sys/kern/kern_shutdown.c:818
> > #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
> >     bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099
> > #5  0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at
> > buf.h:428
> > #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
> >     at /usr/src/sys/kern/vfs_bio.c:1599
> > #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
> >     at /usr/src/sys/kern/vfs_bio.c:1180
> > #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395
> > #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
> >     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
> > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480,
> >     a=0xfffffe023ae5d2b8) at vnode_if.c:999
> > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
> >     uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000)
> >     at vnode_if.h:413
> > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
> > (vp=0xfffff80077e7a000,
> >     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
> > rtvals=0xfffffe023ae5d580)
> >     at /usr/src/sys/vm/vnode_pager.c:1138
> > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478)
> >     at /usr/src/sys/kern/vfs_default.c:760
> > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218,
> >     a=0xfffffe023ae5d478) at vnode_if.c:2861
> > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000,
> >     m=0xfffffe023ae5d660, count=16384, sync=1, rtvals=0xfffffe023ae5d580,
> >     offset=0) at vnode_if.h:1189
> > #16 0xffffffff80b196f3 in vnode_pager_putpages (object=0xfffff8014a1fce00,
> >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
> >     at /usr/src/sys/vm/vnode_pager.c:1016
> > #17 0xffffffff80b0a605 in vm_pager_put_pages (object=0xfffff8014a1fce00,
> >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
> >     at vm_pager.h:144
> > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660,
> > count=4,
> >     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, eio=0xfffffe023ae5d77c)
> >     at /usr/src/sys/vm/vm_pageout.c:533
> > #19 0xffffffff80afec76 in vm_object_page_collect_flush (
> >     object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1,
> > flags=1,
> >     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
> >     at /usr/src/sys/vm/vm_object.c:971
> > #20 0xffffffff80afe91e in vm_object_page_clean (object=0xfffff8014a1fce00,
> >     start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897
> > #21 0xffffffff80afe1fa in vm_object_terminate (object=0xfffff8014a1fce00)
> >     at /usr/src/sys/vm/vm_object.c:735
> > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000)
> >     at /usr/src/sys/vm/vnode_pager.c:164
> > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000)
> >     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
> > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
> >     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
> > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0,
> >     a=0xfffffe023ae5d968) at vnode_if.c:2019
> > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
> >     td=0xfffff80008931960) at vnode_if.h:830
> > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
> >     at /usr/src/sys/kern/vfs_subr.c:2943
> > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
> >     at /usr/src/sys/kern/vfs_subr.c:882
> > #29 0xffffffff80828ea9 in vnlru_proc () at
> > /usr/src/sys/kern/vfs_subr.c:1000
> > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50
> > <vnlru_proc>,
> >     arg=0x0, frame=0xfffffe023ae5dc00) at
> > /usr/src/sys/kern/kern_fork.c:1027
> > #31 0xffffffff80b21dce in fork_trampoline ()
> >     at /usr/src/sys/amd64/amd64/exception.S:611
> > #32 0x0000000000000000 in ?? ()
> >
> > This is a kernel compiled -O -g, its "almost" GENERIC; the only difference
> > is some removed drivers, I have reproduced this on a few different kernels,
> > including a BHYVE one so I can poke at it and not take out the main
> > machine.  The reproduction as it currently stands needs to have jails
> > running, but I don't believe this is a jail interaction, I think its just
> > that the process that sets up the problem happens to be running in a jail.
> > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the
> > filesystem, when the vnlru forces the problem vnode out the system panics.
> >
> > I made a few modifications to the kernel to spit out information about the
> > buf that causes the issue, but that is it.
> >
> > Information about the buf in question; it has a single softdependency
> > worklist for direct allocation:
> > (kgdb) print *bp->b_dep->lh_first
> > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378},
> >   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
> >
> > The file that maps to that buffer:
> > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
> > -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
> > MOUNTPOINT/jails/mail/var/imap/db/__db.002
> >
> > Any help is appreciated, until then I will keep banging my head against
> > the proverbial wall on this :)
> >
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 15:49:29 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 726DDB75447
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 15:49:29 +0000 (UTC)
 (envelope-from andrew@fubar.geek.nz)
Received: from kif.fubar.geek.nz (kif.fubar.geek.nz [178.62.119.249])
 by mx1.freebsd.org (Postfix) with ESMTP id 31C321812;
 Wed,  6 Jul 2016 15:49:28 +0000 (UTC)
 (envelope-from andrew@fubar.geek.nz)
Received: from zapp (global-5-141.nat-2.net.cam.ac.uk [131.111.5.141])
 by kif.fubar.geek.nz (Postfix) with ESMTPSA id 0F1CBD78E6;
 Wed,  6 Jul 2016 15:49:28 +0000 (UTC)
Date: Wed, 6 Jul 2016 16:49:26 +0100
From: Andrew Turner <andrew@fubar.geek.nz>
To: Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Review request: sparse CPU ID maps
Message-ID: <20160706164926.7c3d116c@zapp>
In-Reply-To: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
References: <57761101.3030101@freebsd.org>
 <CAD9=5Xw-MmVVSSo6nRvSRvGaLbd1Z1YRyVKyF9JfmucNKMGBZg@mail.gmail.com>
 <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 15:49:29 -0000

On Sat, 2 Jul 2016 17:08:54 -0700
Nathan Whitehorn <nwhitehorn@freebsd.org> wrote:

> A reasonable first pass at checking for this kind of bug is doing
> grep -lR '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows
> the following files:
> arm/mv/armadaxp/armadaxp_mp.c
> arm/include/counter.h
> arm/broadcom/bcm2835/bcm2836.c
> arm/broadcom/bcm2835/bcm2836_mp.c
> arm/freescale/imx/imx6_mp.c
> arm/allwinner/aw_mp.c
> arm/rockchip/rk30xx_mp.c
> arm/amlogic/aml8726/aml8726_mp.c
> arm/samsung/exynos/exynos5_mp.c
> arm/arm/mp_machdep.c
> arm/nvidia/tegra124/tegra124_mp.c

I'm planning forcing people to clean up the arm code in 12. I can add
this to the list of things that need to be fixed.

> arm64/include/counter.h
> arm64/arm64/gic_v3.c
> arm64/arm64/gic_v3_its.c
> arm64/arm64/gicv3_its.c

I'll look at these in a few days when the code freeze is lifted.

> 
> All of them should, in some sense, be CPU_FOREACH(), but it may not 
> matter. For example, it may not be possible to have sparse CPU IDs on 
> some or all of those SOCs. At least the generic ones (counter, 
> mp_machdep.c, gic (why are there both gic_v3_its.c and gicv3_its.c?)) 
> should be changed, I think.

On arm it depends on the SoC. As far as I know no arm SoCs support
sparse CPU IDs as they assign the ID based on the internal ID and, on a
single cluster of CPUS, this seems to be contiguous. To boot on all
CPUs on a multi-cluster SoC (e.g. big.LITTLE) we would need to
rework the assignment of cpuids. As such I would expect us to keep
a contiguous space.

The place I would expect us to get a non-contiguous range on arm is if
we grew support to offline CPUs. I think this will be needed on a few
SoCs if we wish to run on many of the mobile chips. This may be needed
for thermal and power reasons as many are only able to run for a short
length of time before thermal throttling.

On arm64 I'm planning on reworking the cpuid allocation code such that
we may get sparse values, however I don't expect to have time for this
in the next few months.

Andrew

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 14:51:30 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F8CFB75771;
 Wed,  6 Jul 2016 14:51:30 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x244.google.com (mail-yw0-x244.google.com
 [IPv6:2607:f8b0:4002:c05::244])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E4401869;
 Wed,  6 Jul 2016 14:51:30 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x244.google.com with SMTP id i12so11388258ywa.0;
 Wed, 06 Jul 2016 07:51:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
 bh=2W18IY4L2vU7WGhVf+mqioV0jxeyxVx64fPGK62GrME=;
 b=iCgXzL70stA25xEQyFCWP3+nAdvJukcQQs/2wUHO5DkQ9IEcDWjzk84DgGpfVa9Ozn
 i2d5iawMcB9mh0HySZxJlAot+5dpANnDjQ1pndjmN52XxtGzdq5wdHFoQ1LNs7MtbEmh
 2lxYR75dPBYe8OQkfClyr1ab7kihFeQaKc5NPkolgYPnTzH63URwFtu0hx3rHDBSTrvR
 oMMrm0cuHeuQnCAvwcnrdLnKZl+t/2zmbJh/evJrpK08lZXCX1mgCi1tDhyC97pzJKjb
 GpcnkFYpqufmUjRo/e8liNrceyO8TWLB6Af10yHnwskeTM1fAtjbe3pKf2uSabCtmOTI
 +ojg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to;
 bh=2W18IY4L2vU7WGhVf+mqioV0jxeyxVx64fPGK62GrME=;
 b=S8oHXY36g1Lz8Z2ZnqUng0181qQrcbFuXTc94l4y/xq+dceO/rNEPHJDvOixd26tuy
 2ujJ8EeRQrB8+RTLL4Lwl80FXEOU6ho/mTKJXpxyGR9VELfxlJUlV3xxLhrbS5qa6iEI
 kbPj6zI/h+jsimkqy9Hk4u8xkoHyVG77ZwTuEwcAyvPBYip2c8a3bSnmCSTEviD28Xe3
 1lFk8SF3+Pm2Nb1wBoI+d7oiyr389acOYLAIMJDnPtRq3hRulLwn50ZaCpKDCKEzP6er
 uHAYvfNdghNwceftJ3Z7H3Vgxx0j/PnuFnCfJYcl4BCINHiYSqnJ8VMD8kr1iQDPNLWX
 lURg==
X-Gm-Message-State: ALyK8tI1ddVDAcxk5v/zAHO7K/J42IhO+O7ZMtHS9NnoxJ8u6zcAPmxX1GmqD9yWvPyCfSSYjtn+I67NlXRvKQ==
X-Received: by 10.129.82.21 with SMTP id g21mr14762175ywb.66.1467816688930;
 Wed, 06 Jul 2016 07:51:28 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 07:51:28 -0700 (PDT)
In-Reply-To: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
From: David Cross <dcrosstech@gmail.com>
Date: Wed, 6 Jul 2016 10:51:28 -0400
Message-ID: <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
To: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Wed, 06 Jul 2016 16:16:01 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 14:51:30 -0000

Ok.. to reply to my own message, I using ktr and debugging printfs I have
found the culprit.. but I am still at a loss to 'why', or what the
appropriate fix is.

Lets go back to the panic (simplified)

#0 0xffffffff8043f160 at kdb_backtrace+0x60
#1 0xffffffff80401454 at vpanic+0x124
#2 0xffffffff804014e3 at panic+0x43
#3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
#4 0xffffffff80499cc1 at brelse+0x151
#5 0xffffffff804979b1 at bufwrite+0x81
#6 0xffffffff80623c80 at ffs_write+0x4b0
#7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
#8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
#9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
#10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
#11 0xffffffff806588e6 at vm_pageout_flush+0x116
#12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
#13 0xffffffff80651519 at vm_object_page_clean+0x179
#14 0xffffffff80651102 at vm_object_terminate+0xa2
#15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
#16 0xffffffff8062a52f at ufs_reclaim+0x1f
#17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142

Via KTR logging I determined that the dangling dependedency was on a
freshly allocated buf, *after* vinvalbuf in the vgonel() (so in VOP_RECLAIM
itself), called by the vnode lru cleanup process; I further noticed that it
was in a newbuf that recycled a bp (unimportant, except it let me narrow
down my logging to something managable), from there I get this stacktrace
(simplified)

#0 0xffffffff8043f160 at kdb_backtrace+0x60
#1 0xffffffff8049c98e at getnewbuf+0x4be
#2 0xffffffff804996a0 at getblk+0x830
#3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
#4 0xffffffff80623b0b at ffs_write+0x33b
#5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
#6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
#7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
#8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
#9 0xffffffff806588e6 at vm_pageout_flush+0x116
#10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
#11 0xffffffff80651519 at vm_object_page_clean+0x179
#12 0xffffffff80651102 at vm_object_terminate+0xa2
#13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
#14 0xffffffff8062a52f at ufs_reclaim+0x1f
#15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
#16 0xffffffff804b6c6e at vgonel+0x2ee
#17 0xffffffff804ba6f5 at vnlru_proc+0x4b5

addr2line on the ffs_balloc_ufs2 gives:
/usr/src/sys/ufs/ffs/ffs_balloc.c:778:

                        bp = getblk(vp, lbn, nsize, 0, 0, gbflags);
                        bp->b_blkno = fsbtodb(fs, newb);
                        if (flags & BA_CLRBUF)
                                vfs_bio_clrbuf(bp);
                        if (DOINGSOFTDEP(vp))
                                softdep_setup_allocdirect(ip, lbn, newb, 0,
                                    nsize, 0, bp);


Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM
handles this, this is after vinvalbuf is called, it expects that everything
is flushed to disk and its just about releasing structures (is my read of
the code).

Now, perhaps this is a good assumption?  the question then is how is this
buffer hanging out there surviving a a vinvalbuf.  I will note that my
test-case that finds this runs and terminates *minutes* before... its not
just hanging out there in a race, its surviving background sync, fsync,
etc... wtf?  Also, I *can* unmount the FS without an error, so that
codepath is either ignoring this buffer, or its forcing a sync in a way
that doesn't panic?

Anyone have next steps?  I am making progress here, but its really slow
going, this is probably the most complex portion of the kernel and some
pointers would be helpful.

On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech@gmail.com> wrote:

> Ok, I have been trying to trace this down for awhile..I know quite a bit
> about it.. but there's a lot I don't know, or I would have a patch.  I have
> been trying to solve this on my own, but bringing in some outside
> assistance will let me move on with my life.
>
> First up:  The stacktrace (from a debugging kernel, with coredump
>
> #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
> #1  0xffffffff8071018a in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:486
> #2  0xffffffff80710afc in vpanic (
>     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps
> b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d",
> ap=0xfffffe023ae5cf40)
>     at /usr/src/sys/kern/kern_shutdown.c:889
> #3  0xffffffff807108c0 in panic (
>     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps
> b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d")
>     at /usr/src/sys/kern/kern_shutdown.c:818
> #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
>     bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099
> #5  0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at
> buf.h:428
> #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
>     at /usr/src/sys/kern/vfs_bio.c:1599
> #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
>     at /usr/src/sys/kern/vfs_bio.c:1180
> #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395
> #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
>     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
> #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480,
>     a=0xfffffe023ae5d2b8) at vnode_if.c:999
> #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
>     uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000)
>     at vnode_if.h:413
> #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
> (vp=0xfffff80077e7a000,
>     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
> rtvals=0xfffffe023ae5d580)
>     at /usr/src/sys/vm/vnode_pager.c:1138
> #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478)
>     at /usr/src/sys/kern/vfs_default.c:760
> #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218,
>     a=0xfffffe023ae5d478) at vnode_if.c:2861
> #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000,
>     m=0xfffffe023ae5d660, count=16384, sync=1, rtvals=0xfffffe023ae5d580,
>     offset=0) at vnode_if.h:1189
> #16 0xffffffff80b196f3 in vnode_pager_putpages (object=0xfffff8014a1fce00,
>     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
>     at /usr/src/sys/vm/vnode_pager.c:1016
> #17 0xffffffff80b0a605 in vm_pager_put_pages (object=0xfffff8014a1fce00,
>     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
>     at vm_pager.h:144
> #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660,
> count=4,
>     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, eio=0xfffffe023ae5d77c)
>     at /usr/src/sys/vm/vm_pageout.c:533
> #19 0xffffffff80afec76 in vm_object_page_collect_flush (
>     object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1,
> flags=1,
>     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
>     at /usr/src/sys/vm/vm_object.c:971
> #20 0xffffffff80afe91e in vm_object_page_clean (object=0xfffff8014a1fce00,
>     start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897
> #21 0xffffffff80afe1fa in vm_object_terminate (object=0xfffff8014a1fce00)
>     at /usr/src/sys/vm/vm_object.c:735
> #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000)
>     at /usr/src/sys/vm/vnode_pager.c:164
> #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000)
>     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
> #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
>     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
> #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0,
>     a=0xfffffe023ae5d968) at vnode_if.c:2019
> #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
>     td=0xfffff80008931960) at vnode_if.h:830
> #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
>     at /usr/src/sys/kern/vfs_subr.c:2943
> #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
>     at /usr/src/sys/kern/vfs_subr.c:882
> #29 0xffffffff80828ea9 in vnlru_proc () at
> /usr/src/sys/kern/vfs_subr.c:1000
> #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50
> <vnlru_proc>,
>     arg=0x0, frame=0xfffffe023ae5dc00) at
> /usr/src/sys/kern/kern_fork.c:1027
> #31 0xffffffff80b21dce in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:611
> #32 0x0000000000000000 in ?? ()
>
> This is a kernel compiled -O -g, its "almost" GENERIC; the only difference
> is some removed drivers, I have reproduced this on a few different kernels,
> including a BHYVE one so I can poke at it and not take out the main
> machine.  The reproduction as it currently stands needs to have jails
> running, but I don't believe this is a jail interaction, I think its just
> that the process that sets up the problem happens to be running in a jail.
> The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the
> filesystem, when the vnlru forces the problem vnode out the system panics.
>
> I made a few modifications to the kernel to spit out information about the
> buf that causes the issue, but that is it.
>
> Information about the buf in question; it has a single softdependency
> worklist for direct allocation:
> (kgdb) print *bp->b_dep->lh_first
> $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378},
>   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
>
> The file that maps to that buffer:
> ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
> -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
> MOUNTPOINT/jails/mail/var/imap/db/__db.002
>
> Any help is appreciated, until then I will keep banging my head against
> the proverbial wall on this :)
>

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 15:30:21 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 334C8B750BB;
 Wed,  6 Jul 2016 15:30:21 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x233.google.com (mail-yw0-x233.google.com
 [IPv6:2607:f8b0:4002:c05::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D78D31E15;
 Wed,  6 Jul 2016 15:30:20 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x233.google.com with SMTP id v77so90432476ywg.0;
 Wed, 06 Jul 2016 08:30:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=2zIgEdwfw7Z1V6OGww1ImYgIa1uYbU7N6obh6XYYBrc=;
 b=SmIhSb1Q+w7zEtvd+HQHM5Jf2FdhefWaJtwsUrujd0SJQPQ/GSwrYx2Ej+50rrx0j+
 zejoHuQ5OCkdAtQdZnKGZel/vG8ByU4C3DGxkbsl4+l2Iezv91Eiz/AF66KbR2guBvGL
 o4HPsnmR10zGDnIEx9mm2w4dIeSdxL0FZjObOI2Zbq14fope0PoG+0hxbaG1JU15XTWu
 MWzDC/KPChrc963y51oiMD3ODba3zJjz17CmEbk7Kezzd5OEvJHy8cWh2jdgRiyDChMs
 vAGt9CB+bMO2cCX7Kqw11Qu3zPHsaiAgxNV4T1Fwa/3ErRq4nGZbtqi9ym6ZsrgjwlPH
 ZQrg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=2zIgEdwfw7Z1V6OGww1ImYgIa1uYbU7N6obh6XYYBrc=;
 b=NUF5ViTbPB4FlQLzdJfWKQKzfdUDM/gbs3NTvoLWQH4BjCypvYMhw8+JCJf9fzjq4C
 M6plTM7uZJHcj9U9Q+zn3/ulilXFsuXf4aPzTklGhXOofPE8pBnJ6IHGZPkKwFDPdrw+
 /tesYSccOJEX6VNX55XgzlAnCPXPorx3vkdgMgqIQtpYuIYhE9QjMWNFUcAz7iJaOo+3
 wW9dkyIfHN00SK1WrxCrYNfeJQIfAKPDlO76ZZ5QtPb7Fi36w83gsB32RDWkQk00qkXA
 1lOs7A+RR5SSUtFAPjH9VYDzMKDJ77SrzpYZFVM+/ej4XJfLCW/9mm25pgn7bBVcgZaW
 kApA==
X-Gm-Message-State: ALyK8tIuoJs71uy2FHLo/dBN0f99pH1lgDg8vOu1OjvbsArcnGfAi0qnvKZs8vC99ZYPpUTOrHE8AKV5B/HxVA==
X-Received: by 10.129.5.215 with SMTP id 206mr16078637ywf.210.1467819019999;
 Wed, 06 Jul 2016 08:30:19 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 08:30:19 -0700 (PDT)
In-Reply-To: <20160706151822.GC38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
From: David Cross <dcrosstech@gmail.com>
Date: Wed, 6 Jul 2016 11:30:19 -0400
Message-ID: <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Wed, 06 Jul 2016 16:23:31 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 15:30:21 -0000

No kernel messages before (if there were I would have written this off a
long time ago);
And as of right now, this is probably the most fsck-ed filesystem on the
planet!.. I have an 'image' that I am going on that is ggate mounted, so I
can access it in a bhyve VM to ease debuging so I am not crashing my real
machine (with the real filesystem) all the time.

One of my initial guesses was that this was a CG allocation error, but a
dumpfs seems to show plenty of blocks in the CG to meet this need.

Quick note on the testcase, I haven't totally isolated it yet, but the
minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a
ktrace on that shows that it unlinks 3 files, opens them, lseeks then,
writes a block, and then mmaps them (but leaves them open).  At process
termination is munmaps, and then closes.  I have tried to write a shorter
reproduction that opens, seeks, mmaps (with the same flags), writes the
mmaped memory, munmaps, closes and exits, but this has been insufficient to
reproduce the issue; There is likely some specific pattern in the bdb5 code
tickling this, and behind the mmap-ed interface it is all opaque, and the
bdb5 code is pretty complex itself

On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote:
> > Ok.. to reply to my own message, I using ktr and debugging printfs I have
> > found the culprit.. but I am still at a loss to 'why', or what the
> > appropriate fix is.
> >
> > Lets go back to the panic (simplified)
> >
> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
> > #1 0xffffffff80401454 at vpanic+0x124
> > #2 0xffffffff804014e3 at panic+0x43
> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
> > #4 0xffffffff80499cc1 at brelse+0x151
> > #5 0xffffffff804979b1 at bufwrite+0x81
> > #6 0xffffffff80623c80 at ffs_write+0x4b0
> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116
> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> > #13 0xffffffff80651519 at vm_object_page_clean+0x179
> > #14 0xffffffff80651102 at vm_object_terminate+0xa2
> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f
> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> >
> > Via KTR logging I determined that the dangling dependedency was on a
> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in
> VOP_RECLAIM
> > itself), called by the vnode lru cleanup process; I further noticed that
> it
> > was in a newbuf that recycled a bp (unimportant, except it let me narrow
> > down my logging to something managable), from there I get this stacktrace
> > (simplified)
> >
> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
> > #1 0xffffffff8049c98e at getnewbuf+0x4be
> > #2 0xffffffff804996a0 at getblk+0x830
> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
> > #4 0xffffffff80623b0b at ffs_write+0x33b
> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116
> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
> > #11 0xffffffff80651519 at vm_object_page_clean+0x179
> > #12 0xffffffff80651102 at vm_object_terminate+0xa2
> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f
> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
> > #16 0xffffffff804b6c6e at vgonel+0x2ee
> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5
> >
> > addr2line on the ffs_balloc_ufs2 gives:
> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778:
> >
> >                         bp = getblk(vp, lbn, nsize, 0, 0, gbflags);
> >                         bp->b_blkno = fsbtodb(fs, newb);
> >                         if (flags & BA_CLRBUF)
> >                                 vfs_bio_clrbuf(bp);
> >                         if (DOINGSOFTDEP(vp))
> >                                 softdep_setup_allocdirect(ip, lbn, newb,
> 0,
> >                                     nsize, 0, bp);
> >
> >
> > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM
> > handles this, this is after vinvalbuf is called, it expects that
> everything
> > is flushed to disk and its just about releasing structures (is my read of
> > the code).
> >
> > Now, perhaps this is a good assumption?  the question then is how is this
> > buffer hanging out there surviving a a vinvalbuf.  I will note that my
> > test-case that finds this runs and terminates *minutes* before... its not
> > just hanging out there in a race, its surviving background sync, fsync,
> > etc... wtf?  Also, I *can* unmount the FS without an error, so that
> > codepath is either ignoring this buffer, or its forcing a sync in a way
> > that doesn't panic?
> Most typical cause for the buffer dependencies not flushed is a buffer
> write error.  At least you could provide the printout of the buffer to
> confirm or reject this assumption.
>
> Were there any kernel messages right before the panic ?  Just in case,
> did you fsck the volume before using it, after the previous panic ?
>
> >
> > Anyone have next steps?  I am making progress here, but its really slow
> > going, this is probably the most complex portion of the kernel and some
> > pointers would be helpful.
> >
> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech@gmail.com>
> wrote:
> >
> > > Ok, I have been trying to trace this down for awhile..I know quite a
> bit
> > > about it.. but there's a lot I don't know, or I would have a patch.  I
> have
> > > been trying to solve this on my own, but bringing in some outside
> > > assistance will let me move on with my life.
> > >
> > > First up:  The stacktrace (from a debugging kernel, with coredump
> > >
> > > #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
> > > #1  0xffffffff8071018a in kern_reboot (howto=260)
> > >     at /usr/src/sys/kern/kern_shutdown.c:486
> > > #2  0xffffffff80710afc in vpanic (
> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling
> deps
> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d",
> > > ap=0xfffffe023ae5cf40)
> > >     at /usr/src/sys/kern/kern_shutdown.c:889
> > > #3  0xffffffff807108c0 in panic (
> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling
> deps
> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d")
> > >     at /usr/src/sys/kern/kern_shutdown.c:818
> > > #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
> > >     bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099
> > > #5  0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at
> > > buf.h:428
> > > #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
> > >     at /usr/src/sys/kern/vfs_bio.c:1599
> > > #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
> > >     at /usr/src/sys/kern/vfs_bio.c:1180
> > > #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395
> > > #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
> > >     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480,
> > >     a=0xfffffe023ae5d2b8) at vnode_if.c:999
> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
> > >     uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000)
> > >     at vnode_if.h:413
> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
> > > (vp=0xfffff80077e7a000,
> > >     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
> > > rtvals=0xfffffe023ae5d580)
> > >     at /usr/src/sys/vm/vnode_pager.c:1138
> > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478)
> > >     at /usr/src/sys/kern/vfs_default.c:760
> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218,
> > >     a=0xfffffe023ae5d478) at vnode_if.c:2861
> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000,
> > >     m=0xfffffe023ae5d660, count=16384, sync=1,
> rtvals=0xfffffe023ae5d580,
> > >     offset=0) at vnode_if.h:1189
> > > #16 0xffffffff80b196f3 in vnode_pager_putpages
> (object=0xfffff8014a1fce00,
> > >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
> > >     at /usr/src/sys/vm/vnode_pager.c:1016
> > > #17 0xffffffff80b0a605 in vm_pager_put_pages
> (object=0xfffff8014a1fce00,
> > >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
> > >     at vm_pager.h:144
> > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660,
> > > count=4,
> > >     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8,
> eio=0xfffffe023ae5d77c)
> > >     at /usr/src/sys/vm/vm_pageout.c:533
> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush (
> > >     object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1,
> > > flags=1,
> > >     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
> > >     at /usr/src/sys/vm/vm_object.c:971
> > > #20 0xffffffff80afe91e in vm_object_page_clean
> (object=0xfffff8014a1fce00,
> > >     start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897
> > > #21 0xffffffff80afe1fa in vm_object_terminate
> (object=0xfffff8014a1fce00)
> > >     at /usr/src/sys/vm/vm_object.c:735
> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/vm/vnode_pager.c:164
> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0,
> > >     a=0xfffffe023ae5d968) at vnode_if.c:2019
> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
> > >     td=0xfffff80008931960) at vnode_if.h:830
> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
> > >     at /usr/src/sys/kern/vfs_subr.c:2943
> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
> > >     at /usr/src/sys/kern/vfs_subr.c:882
> > > #29 0xffffffff80828ea9 in vnlru_proc () at
> > > /usr/src/sys/kern/vfs_subr.c:1000
> > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50
> > > <vnlru_proc>,
> > >     arg=0x0, frame=0xfffffe023ae5dc00) at
> > > /usr/src/sys/kern/kern_fork.c:1027
> > > #31 0xffffffff80b21dce in fork_trampoline ()
> > >     at /usr/src/sys/amd64/amd64/exception.S:611
> > > #32 0x0000000000000000 in ?? ()
> > >
> > > This is a kernel compiled -O -g, its "almost" GENERIC; the only
> difference
> > > is some removed drivers, I have reproduced this on a few different
> kernels,
> > > including a BHYVE one so I can poke at it and not take out the main
> > > machine.  The reproduction as it currently stands needs to have jails
> > > running, but I don't believe this is a jail interaction, I think its
> just
> > > that the process that sets up the problem happens to be running in a
> jail.
> > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the
> > > filesystem, when the vnlru forces the problem vnode out the system
> panics.
> > >
> > > I made a few modifications to the kernel to spit out information about
> the
> > > buf that causes the issue, but that is it.
> > >
> > > Information about the buf in question; it has a single softdependency
> > > worklist for direct allocation:
> > > (kgdb) print *bp->b_dep->lh_first
> > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378},
> > >   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
> > >
> > > The file that maps to that buffer:
> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
> > > -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002
> > >
> > > Any help is appreciated, until then I will keep banging my head against
> > > the proverbial wall on this :)
> > >
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org
> "
>

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 16:02:01 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8688B7591D;
 Wed,  6 Jul 2016 16:02:01 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x22c.google.com (mail-yw0-x22c.google.com
 [IPv6:2607:f8b0:4002:c05::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 932C61F1F;
 Wed,  6 Jul 2016 16:02:01 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x22c.google.com with SMTP id b72so91393961ywa.3;
 Wed, 06 Jul 2016 09:02:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=UrrFRO5z0hKpEZcOOJfaXV2i/GLH5kmJDlYFWxfHa3I=;
 b=Aafi8DHqDhuc+7z7gRdCLn2ReKH2MV6Q5RM6C+ZkAkj1O8P69MIv0ZFkd7hm/8irqt
 SqgBXWU8b8PLkvArerhesHh173Ti+sWa4urHdC4y1M49t+W4bEuMnGplMNKNAXlHKmBW
 wJVBLG1FjWoP34noF541b/905BK3ncs4ip0vWEWJQ7/Lykf1SNHOiWsEMbEA1I33JE5B
 Ksxnad2Ri5GN7ENHPWP/rtRilTA5w3wZdIWVVinJsR0Ff0dRmQxkK3CqqOZpda5n+fcB
 P/fC3dHbbwk22a4zjXxE+msb6Rgfs1V7sN7gVtkkojDvvmxpwe38xFk3MKi0cPUiKroI
 2+cQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=UrrFRO5z0hKpEZcOOJfaXV2i/GLH5kmJDlYFWxfHa3I=;
 b=Qm53g9XgoLFPaBGYQcC8U2dV8pnTnkT9ZA3pgLoZzRswVRVgrg7v5fBR+tqdmGIg7w
 DdyOlTzWxW5HG0ewTRMrNvVXqx6owoh//rYD4tOUgHJZK3l09okrjo1FNNOIe+Bx9yCS
 W1kpaxFgR0qyQeBsQgCSKFRYg34RsBekvcJJ0QSo4+1gMdSAHJTClInWX0BuvI1AidH0
 UwMGaLkztwjnwVPj3/PJ+RC1e2UcsuURmN7zaktzaTqpXAYFdwOzH9KNAIrrCwRrUYBR
 Uj+rbzzrO9TtdXVSLKETpsseJAlXPh2FobLDvpiur2gw2lNvPALF+cya1CmONLfgGbeb
 t4Ww==
X-Gm-Message-State: ALyK8tLO6d+wAhAfnlW3xCfaX06+1tMks4/1IsIFgXqZQlDEggugt5biuqRApBxGzyB0NkflEqA8fdP21FKnhQ==
X-Received: by 10.37.211.136 with SMTP id e130mr15105100ybf.62.1467820920698; 
 Wed, 06 Jul 2016 09:02:00 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 09:02:00 -0700 (PDT)
In-Reply-To: <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
 <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
From: David Cross <dcrosstech@gmail.com>
Date: Wed, 6 Jul 2016 12:02:00 -0400
Message-ID: <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Wed, 06 Jul 2016 16:35:55 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 16:02:01 -0000

Oh, whoops; how do I printout the buffer?

On Wed, Jul 6, 2016 at 11:30 AM, David Cross <dcrosstech@gmail.com> wrote:

> No kernel messages before (if there were I would have written this off a
> long time ago);
> And as of right now, this is probably the most fsck-ed filesystem on the
> planet!.. I have an 'image' that I am going on that is ggate mounted, so I
> can access it in a bhyve VM to ease debuging so I am not crashing my real
> machine (with the real filesystem) all the time.
>
> One of my initial guesses was that this was a CG allocation error, but a
> dumpfs seems to show plenty of blocks in the CG to meet this need.
>
> Quick note on the testcase, I haven't totally isolated it yet, but the
> minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a
> ktrace on that shows that it unlinks 3 files, opens them, lseeks then,
> writes a block, and then mmaps them (but leaves them open).  At process
> termination is munmaps, and then closes.  I have tried to write a shorter
> reproduction that opens, seeks, mmaps (with the same flags), writes the
> mmaped memory, munmaps, closes and exits, but this has been insufficient to
> reproduce the issue; There is likely some specific pattern in the bdb5 code
> tickling this, and behind the mmap-ed interface it is all opaque, and the
> bdb5 code is pretty complex itself
>
> On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov <kostikbel@gmail.com>
> wrote:
>
>> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote:
>> > Ok.. to reply to my own message, I using ktr and debugging printfs I
>> have
>> > found the culprit.. but I am still at a loss to 'why', or what the
>> > appropriate fix is.
>> >
>> > Lets go back to the panic (simplified)
>> >
>> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
>> > #1 0xffffffff80401454 at vpanic+0x124
>> > #2 0xffffffff804014e3 at panic+0x43
>> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a
>> > #4 0xffffffff80499cc1 at brelse+0x151
>> > #5 0xffffffff804979b1 at bufwrite+0x81
>> > #6 0xffffffff80623c80 at ffs_write+0x4b0
>> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
>> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
>> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
>> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91
>> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116
>> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
>> > #13 0xffffffff80651519 at vm_object_page_clean+0x179
>> > #14 0xffffffff80651102 at vm_object_terminate+0xa2
>> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85
>> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f
>> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
>> >
>> > Via KTR logging I determined that the dangling dependedency was on a
>> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in
>> VOP_RECLAIM
>> > itself), called by the vnode lru cleanup process; I further noticed
>> that it
>> > was in a newbuf that recycled a bp (unimportant, except it let me narrow
>> > down my logging to something managable), from there I get this
>> stacktrace
>> > (simplified)
>> >
>> > #0 0xffffffff8043f160 at kdb_backtrace+0x60
>> > #1 0xffffffff8049c98e at getnewbuf+0x4be
>> > #2 0xffffffff804996a0 at getblk+0x830
>> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327
>> > #4 0xffffffff80623b0b at ffs_write+0x33b
>> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4
>> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293
>> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142
>> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91
>> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116
>> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2
>> > #11 0xffffffff80651519 at vm_object_page_clean+0x179
>> > #12 0xffffffff80651102 at vm_object_terminate+0xa2
>> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85
>> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f
>> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142
>> > #16 0xffffffff804b6c6e at vgonel+0x2ee
>> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5
>> >
>> > addr2line on the ffs_balloc_ufs2 gives:
>> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778:
>> >
>> >                         bp = getblk(vp, lbn, nsize, 0, 0, gbflags);
>> >                         bp->b_blkno = fsbtodb(fs, newb);
>> >                         if (flags & BA_CLRBUF)
>> >                                 vfs_bio_clrbuf(bp);
>> >                         if (DOINGSOFTDEP(vp))
>> >                                 softdep_setup_allocdirect(ip, lbn,
>> newb, 0,
>> >                                     nsize, 0, bp);
>> >
>> >
>> > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM
>> > handles this, this is after vinvalbuf is called, it expects that
>> everything
>> > is flushed to disk and its just about releasing structures (is my read
>> of
>> > the code).
>> >
>> > Now, perhaps this is a good assumption?  the question then is how is
>> this
>> > buffer hanging out there surviving a a vinvalbuf.  I will note that my
>> > test-case that finds this runs and terminates *minutes* before... its
>> not
>> > just hanging out there in a race, its surviving background sync, fsync,
>> > etc... wtf?  Also, I *can* unmount the FS without an error, so that
>> > codepath is either ignoring this buffer, or its forcing a sync in a way
>> > that doesn't panic?
>> Most typical cause for the buffer dependencies not flushed is a buffer
>> write error.  At least you could provide the printout of the buffer to
>> confirm or reject this assumption.
>>
>> Were there any kernel messages right before the panic ?  Just in case,
>> did you fsck the volume before using it, after the previous panic ?
>>
>> >
>> > Anyone have next steps?  I am making progress here, but its really slow
>> > going, this is probably the most complex portion of the kernel and some
>> > pointers would be helpful.
>> >
>> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross <dcrosstech@gmail.com>
>> wrote:
>> >
>> > > Ok, I have been trying to trace this down for awhile..I know quite a
>> bit
>> > > about it.. but there's a lot I don't know, or I would have a patch.
>> I have
>> > > been trying to solve this on my own, but bringing in some outside
>> > > assistance will let me move on with my life.
>> > >
>> > > First up:  The stacktrace (from a debugging kernel, with coredump
>> > >
>> > > #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
>> > > #1  0xffffffff8071018a in kern_reboot (howto=260)
>> > >     at /usr/src/sys/kern/kern_shutdown.c:486
>> > > #2  0xffffffff80710afc in vpanic (
>> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling
>> deps
>> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d",
>> > > ap=0xfffffe023ae5cf40)
>> > >     at /usr/src/sys/kern/kern_shutdown.c:889
>> > > #3  0xffffffff807108c0 in panic (
>> > >     fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling
>> deps
>> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d")
>> > >     at /usr/src/sys/kern/kern_shutdown.c:818
>> > > #4  0xffffffff80a7c841 in softdep_deallocate_dependencies (
>> > >     bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099
>> > > #5  0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at
>> > > buf.h:428
>> > > #6  0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148)
>> > >     at /usr/src/sys/kern/vfs_bio.c:1599
>> > > #7  0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148)
>> > >     at /usr/src/sys/kern/vfs_bio.c:1180
>> > > #8  0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395
>> > > #9  0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8)
>> > >     at /usr/src/sys/ufs/ffs/ffs_vnops.c:800
>> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480,
>> > >     a=0xfffffe023ae5d2b8) at vnode_if.c:999
>> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000,
>> > >     uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000)
>> > >     at vnode_if.h:413
>> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages
>> > > (vp=0xfffff80077e7a000,
>> > >     ma=0xfffffe023ae5d660, bytecount=16384, flags=1,
>> > > rtvals=0xfffffe023ae5d580)
>> > >     at /usr/src/sys/vm/vnode_pager.c:1138
>> > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478)
>> > >     at /usr/src/sys/kern/vfs_default.c:760
>> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218,
>> > >     a=0xfffffe023ae5d478) at vnode_if.c:2861
>> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000,
>> > >     m=0xfffffe023ae5d660, count=16384, sync=1,
>> rtvals=0xfffffe023ae5d580,
>> > >     offset=0) at vnode_if.h:1189
>> > > #16 0xffffffff80b196f3 in vnode_pager_putpages
>> (object=0xfffff8014a1fce00,
>> > >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
>> > >     at /usr/src/sys/vm/vnode_pager.c:1016
>> > > #17 0xffffffff80b0a605 in vm_pager_put_pages
>> (object=0xfffff8014a1fce00,
>> > >     m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580)
>> > >     at vm_pager.h:144
>> > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660,
>> > > count=4,
>> > >     flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8,
>> eio=0xfffffe023ae5d77c)
>> > >     at /usr/src/sys/vm/vm_pageout.c:533
>> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush (
>> > >     object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1,
>> > > flags=1,
>> > >     clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c)
>> > >     at /usr/src/sys/vm/vm_object.c:971
>> > > #20 0xffffffff80afe91e in vm_object_page_clean
>> (object=0xfffff8014a1fce00,
>> > >     start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897
>> > > #21 0xffffffff80afe1fa in vm_object_terminate
>> (object=0xfffff8014a1fce00)
>> > >     at /usr/src/sys/vm/vm_object.c:735
>> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject
>> (vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/vm/vnode_pager.c:164
>> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:190
>> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968)
>> > >     at /usr/src/sys/ufs/ufs/ufs_inode.c:219
>> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0,
>> > >     a=0xfffffe023ae5d968) at vnode_if.c:2019
>> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000,
>> > >     td=0xfffff80008931960) at vnode_if.h:830
>> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000)
>> > >     at /usr/src/sys/kern/vfs_subr.c:2943
>> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000)
>> > >     at /usr/src/sys/kern/vfs_subr.c:882
>> > > #29 0xffffffff80828ea9 in vnlru_proc () at
>> > > /usr/src/sys/kern/vfs_subr.c:1000
>> > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50
>> > > <vnlru_proc>,
>> > >     arg=0x0, frame=0xfffffe023ae5dc00) at
>> > > /usr/src/sys/kern/kern_fork.c:1027
>> > > #31 0xffffffff80b21dce in fork_trampoline ()
>> > >     at /usr/src/sys/amd64/amd64/exception.S:611
>> > > #32 0x0000000000000000 in ?? ()
>> > >
>> > > This is a kernel compiled -O -g, its "almost" GENERIC; the only
>> difference
>> > > is some removed drivers, I have reproduced this on a few different
>> kernels,
>> > > including a BHYVE one so I can poke at it and not take out the main
>> > > machine.  The reproduction as it currently stands needs to have jails
>> > > running, but I don't believe this is a jail interaction, I think its
>> just
>> > > that the process that sets up the problem happens to be running in a
>> jail.
>> > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on
>> the
>> > > filesystem, when the vnlru forces the problem vnode out the system
>> panics.
>> > >
>> > > I made a few modifications to the kernel to spit out information
>> about the
>> > > buf that causes the issue, but that is it.
>> > >
>> > > Information about the buf in question; it has a single softdependency
>> > > worklist for direct allocation:
>> > > (kgdb) print *bp->b_dep->lh_first
>> > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378},
>> > >   wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841}
>> > >
>> > > The file that maps to that buffer:
>> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002
>> > > -rw-------  1 cyrus  cyrus    24K Jul  1 20:32
>> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002
>> > >
>> > > Any help is appreciated, until then I will keep banging my head
>> against
>> > > the proverbial wall on this :)
>> > >
>> > _______________________________________________
>> > freebsd-stable@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> > To unsubscribe, send any mail to "
>> freebsd-stable-unsubscribe@freebsd.org"
>>
>
>

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 17:38:04 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 429DBB75D61;
 Wed,  6 Jul 2016 17:38:04 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CF32B151D;
 Wed,  6 Jul 2016 17:38:03 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u66Hbwgs063744
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Wed, 6 Jul 2016 20:37:59 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u66Hbwgs063744
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u66HbwNk063743;
 Wed, 6 Jul 2016 20:37:58 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 6 Jul 2016 20:37:58 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: David Cross <dcrosstech@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
Message-ID: <20160706173758.GF38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
 <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
 <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 17:38:04 -0000

On Wed, Jul 06, 2016 at 12:02:00PM -0400, David Cross wrote:
> Oh, whoops; how do I printout the buffer?

In kgdb, p/x *(struct buf *)address

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 18:21:22 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 21921B758E1;
 Wed,  6 Jul 2016 18:21:22 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com
 [IPv6:2607:f8b0:4002:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D0B001E4D;
 Wed,  6 Jul 2016 18:21:21 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x22b.google.com with SMTP id i12so93695696ywa.1;
 Wed, 06 Jul 2016 11:21:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=ikdf9Rq5KobUSGbfr4AZiaSUJp18yozYm7mOPTwnuUY=;
 b=DEqHrWCXbjcIYFI1pGyosuciVRwkrQH/ArI4mdJjdasWdIr0xdtZkpcO2QXD+o0gnS
 WJAnoNND6plO8Njl2BVvxNjasYKWt986YwtpFF6GCHvzNSwWo1zqeW7EzQcnd6qmMHq9
 YXG7IKYGyh1g0GtgfahKUwnUzPX5c6T7dI9+E23Og7/cj3VnXCtrGc8Q3P0UHcrq/Y5T
 kFAO0StGgoskh2HGWPezYJYTCCcwoYdraqtEoCWw60/ej6cGN9J7ptl+44SP0gP4Qsxk
 AH/WWryVPoJhrweV9ctEkk7Rr8hsUyOB3yRYwCLAlSarduyTrsm0BuUYF5k+ra111xp1
 Yw5g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=ikdf9Rq5KobUSGbfr4AZiaSUJp18yozYm7mOPTwnuUY=;
 b=Lk7j6Afh5UxY+3bMfhdFN1pnRP12mP0tDr46r4X182IMQQcmzIrGXERSXOOprYAtnn
 omdBR/azu5KChkAoKsJKhJFg0zjVYCMxeiW7QQ4Ht2i9DVNAXDAVwpgjcxf2H0NuXwff
 8Dplat+eaD4PJn2Fh4aca+X8oGDVmj8BHKaa7U/jXZ0PnE9Db8hFFazTDJ9g6kBq1Ocl
 lkTkOdE4kPzkqYj5dHZCmi8NECuanjCTVqRO0VVJpmV0189E5/d1/iw70FeOG9ijh40S
 444rcMhrfz+p9V0amPdNyMdC5qF55zYb8VPczP++N8L2ZzQF1fuso8dgMeHG+29QlhYS
 3Opw==
X-Gm-Message-State: ALyK8tIr8B4SmAoSIsmDAFIZfEaoMsXeq1XVYe3MzxjanVoR6pZWrN0u6Q8gz3QCxRffJO4rwwMiHuCp5JuTQQ==
X-Received: by 10.129.102.195 with SMTP id a186mr15678812ywc.76.1467829281073; 
 Wed, 06 Jul 2016 11:21:21 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 11:21:20 -0700 (PDT)
In-Reply-To: <20160706173758.GF38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
 <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
 <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
 <20160706173758.GF38613@kib.kiev.ua>
From: David Cross <dcrosstech@gmail.com>
Date: Wed, 6 Jul 2016 14:21:20 -0400
Message-ID: <CAM9edeOb0yUqaXbTMGBJVFqgJ++yaDr4tGV1TQ_UPOYmv4p2fw@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Wed, 06 Jul 2016 18:22:55 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 18:21:22 -0000

(kgdb) up 5
#5  0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428
428                     (*bioops.io_deallocate)(bp);
Current language:  auto; currently minimal
(kgdb) p/x *(struct buf *)0xfffffe00f77457d0
$1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0,
  b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags =
0x0,
  b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400,
  b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev =
0xfffff80002e884d0},
  b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28,
    tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800,
  b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b,
      lo_flags = 0x6730000, lo_data = 0x0, lo_witness =
0xfffffe0000602f00},
    lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0,
    lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0,
  b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000,
  b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0,
  b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0, b_pager
= {
    pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0,
      tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}},
  b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180,
    0xfffff800b99b31e8, 0x0 <repeats 28 times>}, b_npages = 0x4, b_dep = {
    lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
  b_fsprivate3 = 0x0, b_pin_count = 0x0}


This is the freshly allocated buf that causes the panic; is this what is
needed?  I "know" which vnode will cause the panic on vnlru cleanup, but I
don't know how to walk the memory list without a 'hook'.. as in, i can
setup the kernel in a state that I know will panic when the vnode is
cleaned up, I can force a panic 'early' (kill -9 1), and then I could get
that vnode.. if I could get the vnode list to walk.

On Wed, Jul 6, 2016 at 1:37 PM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Wed, Jul 06, 2016 at 12:02:00PM -0400, David Cross wrote:
> > Oh, whoops; how do I printout the buffer?
>
> In kgdb, p/x *(struct buf *)address
>

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 18:49:54 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B39FB75DCC
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 18:49:54 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 0560A1CFC
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 18:49:53 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 8C21622309B
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 13:49:45 -0500 (CDT)
To: freebsd-hackers@freebsd.org
From: Karl Denninger <karl@denninger.net>
Subject: Huh?
Message-ID: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
Date: Wed, 6 Jul 2016 13:49:26 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms030607060708050606010802"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 18:49:54 -0000

This is a cryptographically signed message in MIME format.

--------------ms030607060708050606010802
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Ok, what did I break...

On my development box with 11-Alpha6:

root@Dbms2:/usr/src # svn update .
Updating '.':
svn: E170013: Unable to connect to a repository at URL
'https://svn.freebsd.org/base/head'
svn: E000065: Error running context: No route to host

svnlite works..... so yeah, that path is good (obviously)


--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030607060708050606010802
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDYxODQ5MjZaME8GCSqGSIb3DQEJBDFCBEBO
tr9OFyd25jV3tctupLH3iTrcJmIz3kCo4EocmjDglPAbe7AWzZ0BnRhl+oaFg3QTbPw+2ZBh
N1E1gxG+3IHRMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAtxVllSp8
jlMA8WwO3gnfCAZuBuBY9PmXXnFkwJgUixoCOG0ZYp1iZtYkesY3KmYCqav0C1gZiZyz7jA5
ogtfdBzOJ6gWmhncK7Z6CqgfuZL9y4xstDTrbRHHMHOFp9EBChUEWWlOte9PuviD4MalzSWW
l9DeM3DQ2k0WnGVoXKLjRmhBCAZD8pVkYyfskkoErECk1K5xY4A4goV5jCu3kYD4IQ104qYV
6dcRpf8VUnGQyx48br7L1DmnNFqdL6LqA351P88yxzqCE4hhVuniKc8LwCtmqja4y2QUOi5b
+7Lw9s05jJNhzh4qn/Z0Vdqk2vtEDqnWwO14cn3anjHRS0eCTLLJZ7m+EqkCDpBCbQEkSF5W
F4iXifegikzK7suubY/qu/2ZDDIhqGGAoapjHXFACqUnUiKgMG5A1Bx8RcTKHf+roh1N0O6A
ZKvS1a0EQfKH0lV91uXwZJkb1ODvUIutH6+wks0wegaREq5+C+7ygP47xho2RdyU/hvdM6Ac
/IT2ziLNLpCmAoK1Pwgz3RB7PeUF65TjgPJiu+MZW6wFlCRt48Sbw+1zoiUFM7SWSYL+OZHO
pPvQqq1Nuxcc069mMOLTo8GoHrmR7FBeUrb+HBo6Z6t5YF4A7yfoNrrmFGmlE/a8Al1TzKfT
pYNeeexO/dYsArjStO+wKjoLBk4AAAAAAAA=
--------------ms030607060708050606010802--

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 19:52:07 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05D14B75C0E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 19:52:07 +0000 (UTC)
 (envelope-from yaneurabeya@gmail.com)
Received: from mail-qt0-x233.google.com (mail-qt0-x233.google.com
 [IPv6:2607:f8b0:400d:c0d::233])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B70B51986
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 19:52:06 +0000 (UTC)
 (envelope-from yaneurabeya@gmail.com)
Received: by mail-qt0-x233.google.com with SMTP id m2so122191195qtd.1
 for <freebsd-hackers@freebsd.org>; Wed, 06 Jul 2016 12:52:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=0I+Pibr4lZIuxzTF/6utmkLulpZ2CRz3WbaWfv7lj6g=;
 b=Vx3s9mjLmko1Nzq9TdbVXbEZN9HY6eJef5ykBthiVtrkRtt7duH11G/zRB6zcnP5UQ
 RNXqyQFD0Sl55mwbh0eCdK1jtHflOIZSApu3PBO1tc2xneZP4bgNGUyXRzZhA+2m+ca+
 Ai+EyspSieqGcWmSTB8shuxTo3Im3lvzUjIXCQyj2VdgY2kvktvIUkd50SOIul/oysmD
 jT2Py3oXbWTuq19f/ii2aLjf/JHGSNN0Cd3AU4o7lIcQx/99VkQNlsYv+gVxWmPUlPHC
 LF54gScWUtI8A8D4KecavLPBQ8wVMHSyJsAbkB91Erp05qTZuaEOmxZxbniYZD0UBS81
 xjYw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=0I+Pibr4lZIuxzTF/6utmkLulpZ2CRz3WbaWfv7lj6g=;
 b=WdEFSoPGSwrGPi9S5htbdf2BwGJr8jQrqSw+kKPdK/uxAdM13taLgZbwEK6CRinola
 tg+wvXujWxSbrlVXg/vlda3lMQYlcncN9Ke2tKVIjAmeqEp0dplrB4SQnShWQfyOyK7+
 4yumEH7toU/zp9FIlMRNleywbIsVqrrzG+uSVLgHYlodaq+LfVRKFC2wAAtaBo68d8ID
 6YKF/N6NGSTMkvIWYeDz8CxOEJmv/9iKhjqBbGYXwuI99Omo7HvLFSVG3a10bJWgRyRP
 iAZ2QyXZUzgUOxsJYwz7JR0MUdl3G2NVDzOiOcuq8mojcKCY6Yn/2YuqAa427wy53uRi
 GG9A==
X-Gm-Message-State: ALyK8tK/6uYI9eAyiMNyLn46Del7OfTjm17FKvDXRMuvWB27X66huPCgxTrW52aWD91Su8PwfTqmpCQw1No0Fg==
X-Received: by 10.200.34.157 with SMTP id f29mr38321468qta.46.1467834725869;
 Wed, 06 Jul 2016 12:52:05 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.55.148.131 with HTTP; Wed, 6 Jul 2016 12:52:05 -0700 (PDT)
In-Reply-To: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
References: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
From: Ngie Cooper <yaneurabeya@gmail.com>
Date: Wed, 6 Jul 2016 12:52:05 -0700
Message-ID: <CAGHfRMA5RvmRG=_Xh9KMjG+eYq+zR4u7KYJst32LbHrc1sb-Xg@mail.gmail.com>
Subject: Re: Huh?
To: Karl Denninger <karl@denninger.net>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 19:52:07 -0000

On Wed, Jul 6, 2016 at 11:49 AM, Karl Denninger <karl@denninger.net> wrote:
> Ok, what did I break...
>
> On my development box with 11-Alpha6:
>
> root@Dbms2:/usr/src # svn update .
> Updating '.':
> svn: E170013: Unable to connect to a repository at URL
> 'https://svn.freebsd.org/base/head'
> svn: E000065: Error running context: No route to host
>
> svnlite works..... so yeah, that path is good (obviously)

Could you please run "svn --version" and put the output here?
Thanks,
-Ngie

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 20:24:55 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7A48B751C8
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 20:24:55 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6])
 by mx1.freebsd.org (Postfix) with ESMTP id 9FE1018B6
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 20:24:55 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from [10.1.1.2] (unknown [10.1.1.2])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id 7D096D169
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 20:24:54 +0000 (UTC)
Subject: Re: Huh?
To: freebsd-hackers@freebsd.org
References: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
From: Allan Jude <allanjude@freebsd.org>
Message-ID: <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org>
Date: Wed, 6 Jul 2016 16:24:54 -0400
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 20:24:55 -0000

On 2016-07-06 14:49, Karl Denninger wrote:
> Ok, what did I break...
> 
> On my development box with 11-Alpha6:
> 
> root@Dbms2:/usr/src # svn update .
> Updating '.':
> svn: E170013: Unable to connect to a repository at URL
> 'https://svn.freebsd.org/base/head'
> svn: E000065: Error running context: No route to host
> 
> svnlite works..... so yeah, that path is good (obviously)
> 
> 

Do you have broken ipv6?

svn will try v6 first, and you maybe don't have a route.

-- 
Allan Jude

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 20:29:08 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 049C6B7531E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 20:29:08 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CB91D1AC3
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 20:29:06 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.denninger.net (Postfix) with ESMTPSA id 8D47F22379F
 for <freebsd-hackers@freebsd.org>; Wed,  6 Jul 2016 15:29:05 -0500 (CDT)
Subject: Re: Huh?
To: freebsd-hackers@freebsd.org
References: <fc54f394-c90d-00c8-4214-ceeb43de97ac@denninger.net>
 <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org>
From: Karl Denninger <karl@denninger.net>
Message-ID: <beceab7f-16d7-ca8a-965c-93958ea8ac65@denninger.net>
Date: Wed, 6 Jul 2016 15:28:46 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org>
Content-Type: multipart/signed; protocol="application/pkcs7-signature";
 micalg=sha-512; boundary="------------ms020404030302000605070109"
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 20:29:08 -0000

This is a cryptographically signed message in MIME format.

--------------ms020404030302000605070109
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


On 7/6/2016 15:24, Allan Jude wrote:
> On 2016-07-06 14:49, Karl Denninger wrote:
>> Ok, what did I break...
>>
>> On my development box with 11-Alpha6:
>>
>> root@Dbms2:/usr/src # svn update .
>> Updating '.':
>> svn: E170013: Unable to connect to a repository at URL
>> 'https://svn.freebsd.org/base/head'
>> svn: E000065: Error running context: No route to host
>>
>> svnlite works..... so yeah, that path is good (obviously)
>>
>>
> Do you have broken ipv6?
>
> svn will try v6 first, and you maybe don't have a route.
>

Actually it tries ipv6 first and never tries ipv4!

I have no Ipv6 service here so the "no route" is correct.  However, the
resolver DID return an Ipv4 address as well (I snooped the line with
tcpdump) but svn never attempts the v4 connection.

That sure looks broken to me.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms020404030302000605070109
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDYyMDI4NDZaME8GCSqGSIb3DQEJBDFCBEDX
0CyfbJDxIMbalCj8CkYKBKUi60cO1xiIaDL086rEmQsjkrvr/s70mf8ZZ5g+FpVoEvR8Jx+t
tyCt+ghD4sO/MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAYiMch17L
7HrMNUByEE9sKGg0J4Savu6OeIv7u9Sr5GPmXdqZDgWOJmJTUuvfpKtkFzpIZiD1pToAojtl
sLKuYlnQ9viUWdrXOlDA5ra3tvG7B1B5EoS+EEba/3wR5xwWYjQq+1bmKyoG6TtOuhbsDZwJ
DAZKnPfh+fFVzIj7EzD/tzFQOlOnaEu+E+VxEp0IBEjtarE7ghbt4arHe8AUm4+qYD/8Yd//
GSshmoAaZspwk0qBWShra2D7C1fwg+rdRIGWP1DUhLWQTTNmdjc4ZGZmiivS0Fk5cYjDqtVP
MBBjgM+rw/MCIu8G8NiByMf/nDIF8b1QDXdhT0nTLHDLmBWs8i0WC0noQGeOjvMNWId/QHWL
OGq4ITVj6Agu7kYpBjDdTGlLg9NTv6RsF1YD73BeEhFfm00pi/FI7MnnfgAIKH4DiyhsPyF3
1iCr255KWCSrjzmmOxQHo7Y1CqMfhotbUmabLltcVHQ4QI/zAOL+1sioy0Htp3LhFp1+b7Q0
vedEuAaPtlCP/dpaVV+qPEQOPhXZyDR9QmwjAB5t7MxSuvFa9dbi46Qodyg/C0OsU4B7815j
itnV06BWCYM41DIBCkx7pZSAGwLousVn9GOyVaxrRILdZmMA9VxxfP7s8HBO61u1iWzNGBoX
DhdHayUmo2doWi2ogWzBABQ4lF8AAAAAAAA=
--------------ms020404030302000605070109--

From owner-freebsd-hackers@freebsd.org  Wed Jul  6 22:13:07 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CA42B75D3C
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed,  6 Jul 2016 22:13:07 +0000 (UTC) (envelope-from yuri@rawbw.com)
Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 5E655160A
 for <freebsd-hackers@FreeBSD.org>; Wed,  6 Jul 2016 22:13:07 +0000 (UTC)
 (envelope-from yuri@rawbw.com)
Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net
 [24.5.143.190]) (authenticated bits=0)
 by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u66MD0oo015679
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO)
 for <freebsd-hackers@FreeBSD.org>; Wed, 6 Jul 2016 15:13:01 -0700 (PDT)
 (envelope-from yuri@rawbw.com)
X-Authentication-Warning: shell1.rawbw.com: Host
 c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be
 yuri.doctorlan.com
From: Yuri <yuri@rawbw.com>
Subject: Why kinfo_getvmmap is sometimes so expensive?
To: Freebsd hackers list <freebsd-hackers@FreeBSD.org>
Message-ID: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
Date: Wed, 6 Jul 2016 15:12:59 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jul 2016 22:13:07 -0000

The function getProcessSizeBytes, calculating the total size of the 
process, runs once per second. I have two processes of the same kind, 
but with the different run history.

Process #1 didn't do much work, its total size is 1.5 GB, google 
perftools library says that it currently has 1.2GB allocated.

Process #2 did a lot of work, its total size is 6.9 GB, but most of the 
used memory was freed, and google perftools library also says that it 
currently has only 1.2GB allocated.

Both processes have about 140 lines in /proc/<pid>/map.


What bothers me is that getProcessSizeBytes run once per second makes 
process #1 to consume ~0.5% CPU, and process #2 to consume ~14% CPU. 
When I stop running getProcessSizeBytes, CPU times of both processes go 
to zero.


Obviously, google perftools doesn't unmap the memory, and the totals of 
block sizes in /proc/<pid>/map is much higher for process #2 with about 
the same block count. But why does this cause 14% of CPU consumption? 
And why another, similar process that goes through about the same number 
of blocks only has 0.5% CPU consumption?


uint64_t getProcessSizeBytes() {
   int i, cnt = 0;
   struct kinfo_vmentry *kvm0, *kvm;
   m_uint64_t memSz = 0;

   kvm0 = ::kinfo_getvmmap(::getpid(), &cnt);

   for (i = 0, kvm = kvm0; i<cnt; i++, kvm++)
     memSz += (kvm->kve_end-kvm->kve_start);

   free(kvm0);
   return (memSz);
}


FreeBSD 10.3


Yuri

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 00:12:29 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4B63B758D1;
 Thu,  7 Jul 2016 00:12:29 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 857D5126D;
 Thu,  7 Jul 2016 00:12:29 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u670CLT1011655
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Thu, 7 Jul 2016 03:12:21 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u670CLT1011655
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u670CIZE011654;
 Thu, 7 Jul 2016 03:12:18 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 7 Jul 2016 03:12:18 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: David Cross <dcrosstech@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
Message-ID: <20160707001218.GI38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
 <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
 <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
 <20160706173758.GF38613@kib.kiev.ua>
 <CAM9edeOb0yUqaXbTMGBJVFqgJ++yaDr4tGV1TQ_UPOYmv4p2fw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAM9edeOb0yUqaXbTMGBJVFqgJ++yaDr4tGV1TQ_UPOYmv4p2fw@mail.gmail.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 00:12:30 -0000

On Wed, Jul 06, 2016 at 02:21:20PM -0400, David Cross wrote:
> (kgdb) up 5
> #5  0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428
> 428                     (*bioops.io_deallocate)(bp);
> Current language:  auto; currently minimal
> (kgdb) p/x *(struct buf *)0xfffffe00f77457d0
> $1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0,
>   b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags =
> 0x0,
>   b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400,
>   b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev =
> 0xfffff80002e884d0},
>   b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28,
>     tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800,
>   b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b,
>       lo_flags = 0x6730000, lo_data = 0x0, lo_witness =
> 0xfffffe0000602f00},
>     lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0,
>     lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0,
>   b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000,
>   b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0,
>   b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0, b_pager
> = {
>     pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0,
>       tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}},
>   b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180,
>     0xfffff800b99b31e8, 0x0 <repeats 28 times>}, b_npages = 0x4, b_dep = {
>     lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
>   b_fsprivate3 = 0x0, b_pin_count = 0x0}
> 
> 
> This is the freshly allocated buf that causes the panic; is this what is
> needed?  I "know" which vnode will cause the panic on vnlru cleanup, but I
> don't know how to walk the memory list without a 'hook'.. as in, i can
> setup the kernel in a state that I know will panic when the vnode is
> cleaned up, I can force a panic 'early' (kill -9 1), and then I could get
> that vnode.. if I could get the vnode list to walk.

Was the state printed after the panic occured ?  What is strange is that
buffer was not even tried for i/o, AFAIS.  Apart from empty b_error/b_iocmd,
the b_lblkno is zero, which means that the buffer was never allocated on
the disk.

The b_blkno looks strangely high.  Can you print *(bp->b_vp) ?  If it is
UFS vnode, do p *(struct inode)(<vnode>->v_data).  I am esp. interested
in the vnode size.

Can you reproduce the problem on HEAD ?

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 00:19:19 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0D8E5B75B38
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 00:19:19 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 834E615A5
 for <freebsd-hackers@FreeBSD.org>; Thu,  7 Jul 2016 00:19:18 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u670JEoX012886
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Thu, 7 Jul 2016 03:19:14 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u670JEoX012886
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u670JDvO012885;
 Thu, 7 Jul 2016 03:19:13 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 7 Jul 2016 03:19:13 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Yuri <yuri@rawbw.com>
Cc: Freebsd hackers list <freebsd-hackers@FreeBSD.org>
Subject: Re: Why kinfo_getvmmap is sometimes so expensive?
Message-ID: <20160707001913.GJ38613@kib.kiev.ua>
References: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 00:19:19 -0000

On Wed, Jul 06, 2016 at 03:12:59PM -0700, Yuri wrote:
> The function getProcessSizeBytes, calculating the total size of the 
> process, runs once per second. I have two processes of the same kind, 
> but with the different run history.
> 
> Process #1 didn't do much work, its total size is 1.5 GB, google 
> perftools library says that it currently has 1.2GB allocated.
> 
> Process #2 did a lot of work, its total size is 6.9 GB, but most of the 
> used memory was freed, and google perftools library also says that it 
> currently has only 1.2GB allocated.
> 
> Both processes have about 140 lines in /proc/<pid>/map.
> 
> 
> What bothers me is that getProcessSizeBytes run once per second makes 
> process #1 to consume ~0.5% CPU, and process #2 to consume ~14% CPU. 
> When I stop running getProcessSizeBytes, CPU times of both processes go 
> to zero.
> 
> 
> Obviously, google perftools doesn't unmap the memory, and the totals of 
> block sizes in /proc/<pid>/map is much higher for process #2 with about 
> the same block count. But why does this cause 14% of CPU consumption? 
> And why another, similar process that goes through about the same number 
> of blocks only has 0.5% CPU consumption?
To calculate residency count for the process map entries, kernel has to
iterate over all pages.  This operation was somewhat optimized in 10.3
and HEAD, particularly for the large sparce mappings.  But for large populated
mappings there is no other way then to check each page.

You may confirm my hypothesis by setting sysctl
kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU
consumption changed.  Of course, you will not get the resident count
in the returned structure, after the knob is tweaked.

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 01:29:59 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19032B21ACA
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 01:29:59 +0000 (UTC)
 (envelope-from freebsd-hackers@m.gmane.org)
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B88A2118A
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 01:29:58 +0000 (UTC)
 (envelope-from freebsd-hackers@m.gmane.org)
Received: from list by plane.gmane.org with local (Exim 4.69)
 (envelope-from <freebsd-hackers@m.gmane.org>) id 1bKy8S-00021F-GS
 for freebsd-hackers@freebsd.org; Thu, 07 Jul 2016 03:29:48 +0200
Received: from ip184-189-249-34.sb.sd.cox.net ([184.189.249.34])
 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Thu, 07 Jul 2016 03:29:48 +0200
Received: from julian by ip184-189-249-34.sb.sd.cox.net with local (Gmexim 0.1
 (Debian)) id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Thu, 07 Jul 2016 03:29:48 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-hackers@freebsd.org
From: Julian Hsiao <julian@hsiao.email>
Subject: Re: ggatel(8) extension for binding multiple files
Date: Thu, 7 Jul 2016 01:29:34 +0000 (UTC)
Lines: 913
Message-ID: <loom.20160707T032107-768@post.gmane.org>
References: <nl2eii$ukl$1@ger.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: sea.gmane.org
User-Agent: Loom/3.14 (http://gmane.org/)
X-Loom-IP: 184.189.249.34 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11;
 rv:48.0) Gecko/20100101 Firefox/48.0)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 01:29:59 -0000

Here's an updated patch, with some minor refactors, and addresses some
known issues:

> - I use alloca(3) instead of malloc(3) in map_bundle() because using the
>   latter causes incorrect behavior somehow. It's probably buffer
>   overruns and / or UBs somewhere in my code.

Fixed: I was using realloc(3) incorrectly elsewhere.

> - Both ggatel(8) and md(4) implement BIO_DELETE by zeroing the requested
>   range [...] I didn't implement it.

This is now implemented. By default BIO_DELETE will write zeros, and this
can be disabled with the -n option.

Incidentally, while testing, I found out that ggatel(8)'s BIO_DELETE code
is actually broken. It works in my extension, but this patch doesn't fix
the original code. I've filed a bug report[0] outlining the issue.

Lastly, I've added a license block so the few people who might find this
feature useful are free to use it.

Julian Hsiao

[0] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210864

Index: sbin/ggate/ggatel/Makefile
===================================================================
diff --git a/stable/10/sbin/ggate/ggatel/Makefile
b/stable/10/sbin/ggate/ggatel/Makefile
--- a/stable/10/sbin/ggate/ggatel/Makefile	(revision 302332)
+++ b/stable/10/sbin/ggate/ggatel/Makefile	(working copy)
@@ -4,7 +4,7 @@
 
 PROG=	ggatel
 MAN=	ggatel.8
-SRCS=	ggatel.c ggate.c
+SRCS=	ggatel.c ggatel2.c ggate.c 
 
 CFLAGS+= -DLIBGEOM
 CFLAGS+= -I${.CURDIR}/../shared
Index: sbin/ggate/ggatel/ggatel.c
===================================================================
diff --git a/stable/10/sbin/ggate/ggatel/ggatel.c
b/stable/10/sbin/ggate/ggatel/ggatel.c
--- a/stable/10/sbin/ggate/ggatel/ggatel.c	(revision 302332)
+++ b/stable/10/sbin/ggate/ggatel/ggatel.c	(working copy)
@@ -46,6 +46,10 @@
 #include <geom/gate/g_gate.h>
 #include "ggate.h"
 
+int check_divs(const char *const, unsigned int *const, size_t *const,
+    size_t *const);
+void g_gatel_serve_bundle(const int , const unsigned int, const size_t,
+    const size_t, const int, const int, const unsigned int);
 
 static enum { UNSET, CREATE, DESTROY, LIST, RESCUE } action = UNSET;
 
@@ -55,12 +59,13 @@
 static int force = 0;
 static unsigned sectorsize = 0;
 static unsigned timeout = G_GATE_TIMEOUT;
+static unsigned delete_zero = 1;
 
 static void
 usage(void)
 {
 
-	fprintf(stderr, "usage: %s create [-v] [-o <ro|wo|rw>] "
+	fprintf(stderr, "usage: %s create [-v] [-n] [-o <ro|wo|rw>] "
 	    "[-s sectorsize] [-t timeout] [-u unit] <path>\n", getprogname());
 	fprintf(stderr, "       %s rescue [-v] [-o <ro|wo|rw>] <-u unit> "
 	    "<path>\n", getprogname());
@@ -149,6 +154,11 @@
 			}
 			break;
 		case BIO_DELETE:
+			if (!delete_zero) {
+				error = EOPNOTSUPP;
+				break;
+			}
+			// FIXME: Bug 210864
 		case BIO_WRITE:
 			if (pwrite(fd, ggio.gctl_data, ggio.gctl_length,
 			    ggio.gctl_offset) == -1) {
@@ -168,17 +178,39 @@
 g_gatel_create(void)
 {
 	struct g_gate_ctl_create ggioc;
-	int fd;
+	int fd, isdir = -1;
+	size_t div_size, num_divs;
 
 	fd = open(path, g_gate_openflags(flags) | O_DIRECT | O_FSYNC);
-	if (fd == -1)
-		err(EXIT_FAILURE, "Cannot open %s", path);
+	if (fd == -1) {
+		if (errno == EISDIR) {
+			isdir = 1;
+		} else {
+			err(EXIT_FAILURE, "Cannot open %s", path);
+		}
+	} else {
+		struct stat sb;
+		if (fstat(fd, &sb) == -1) {
+			err(EXIT_FAILURE, "stat(%s) failed", path);
+		}
+		isdir = S_ISDIR(sb.st_mode);
+	}
+	assert(isdir != -1);
+
 	memset(&ggioc, 0, sizeof(ggioc));
 	ggioc.gctl_version = G_GATE_VERSION;
 	ggioc.gctl_unit = unit;
-	ggioc.gctl_mediasize = g_gate_mediasize(fd);
-	if (sectorsize == 0)
-		sectorsize = g_gate_sectorsize(fd);
+	if (isdir) {
+		if (fd != -1 && close(fd) == -1) {
+			err(EXIT_FAILURE, "close(%s) failed", path);
+		}	
+		fd = check_divs(path, &sectorsize, &div_size, &num_divs);
+		ggioc.gctl_mediasize = (off_t) div_size * num_divs;
+	} else {
+		ggioc.gctl_mediasize = g_gate_mediasize(fd);
+		if (sectorsize == 0)
+			sectorsize = g_gate_sectorsize(fd);
+	}
 	ggioc.gctl_sectorsize = sectorsize;
 	ggioc.gctl_timeout = timeout;
 	ggioc.gctl_flags = flags;
@@ -188,7 +220,12 @@
 	if (unit == -1)
 		printf("%s%u\n", G_GATE_PROVIDER_NAME, ggioc.gctl_unit);
 	unit = ggioc.gctl_unit;
-	g_gatel_serve(fd);
+	if (isdir) {
+		g_gatel_serve_bundle(fd, sectorsize, div_size, num_divs, unit,
+			g_gate_openflags(flags), delete_zero);
+	} else {
+		g_gatel_serve(fd);
+	}
 }
 
 static void
@@ -230,7 +267,7 @@
 	for (;;) {
 		int ch;
 
-		ch = getopt(argc, argv, "fo:s:t:u:v");
+		ch = getopt(argc, argv, "fo:s:t:u:vn");
 		if (ch == -1)
 			break;
 		switch (ch) {
@@ -280,6 +317,11 @@
 				usage();
 			g_gate_verbose++;
 			break;
+		case 'n':
+			if (action != CREATE)
+				usage();
+			delete_zero = 0;
+			break;
 		default:
 			usage();
 		}
Index: sbin/ggate/ggatel/ggatel2.c
===================================================================
diff --git a/stable/10/sbin/ggate/ggatel/ggatel2.c
b/stable/10/sbin/ggate/ggatel/ggatel2.c
new file mode 10644
--- /dev/null	(nonexistent)
+++ b/stable/10/sbin/ggate/ggatel/ggatel2.c	(working copy)
@@ -0,0 +1,724 @@
+/* 
+Copyright (c) 2016, Julian Hsiao <julian@hsiao.email>
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#include <math.h>
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <setjmp.h>
+#include <signal.h>
+#include <assert.h>
+#include <stdint.h>
+#include <limits.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <inttypes.h>
+
+#include <err.h>
+#include <time.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/bio.h>
+#include <sys/disk.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/syslog.h>
+
+#include <geom/gate/g_gate.h>
+#include "ggate.h"
+
+/* ======== */
+
+/*
+  Doesn't work with 3.4.1; clang-devel is currently 3.9.0, and I couldn't be
+  bothered to find out which version support was first added.
+*/
+#if defined(__clang__) && \
+    (__clang_major__ > 3 || \
+    (__clang_major__ == 3 && __clang_minor__ >= 9))
+#pragma clang diagnostic push
+#pragma clang diagnostic error "-Weverything"
+#endif
+
+/* ======== */
+
+int check_divs(const char *const, unsigned int *const, size_t *const,
+    size_t *const);
+void g_gatel_serve_bundle(const int , const unsigned int, const size_t,
+    const size_t, const int, const int, const unsigned int);
+
+/* ======== */
+
+static void
+g_gate_verbose_log(const int v, const int p, const char *const m, ...)
+{
+    if (g_gate_verbose >= v) {
+        va_list ap;
+        va_start(ap, m);
+        g_gate_vlog(p, m, ap);
+        va_end(ap);
+    }
+}
+
+#ifdef NDEBUG
+__attribute__((unused))
+#endif
+static inline bool
+mul_overflow(const size_t a, const size_t b)
+{
+    return(a != 0 && (a * b) / a != b);
+}
+
+static unsigned int
+MINDIV_SIZE(void)
+{
+    static unsigned int ps;
+    if (ps == 0) {
+        const long ps2 = sysconf(_SC_PAGESIZE);
+        static_assert(sizeof(size_t) >= sizeof(long), "");
+        assert(ps2 > 0);
+        assert(ps2 <= UINT_MAX);
+        ps = (unsigned int) ps2;
+    }
+    return(ps);
+}
+
+static size_t
+DIV_NAME_BUFSIZE(void)
+{
+    static size_t dnbs;
+    if (dnbs == 0) {
+        dnbs = (size_t) (ceil(log(SIZE_MAX) / log(16)));
+        ++dnbs;
+        dnbs *= sizeof(char);
+
+    }
+    return(dnbs);
+}
+
+/* ======== */
+
+static void
+numtohexstr(const size_t num, char *const buf, const size_t buflen)
+{
+#ifdef NDEBUG
+    __attribute__((unused))
+#endif
+    const int r = snprintf(buf, buflen, "%zx", num);
+    assert(r > 0);
+    assert((unsigned int) r < buflen);
+}
+
+int
+check_divs(const char *const bundle, unsigned int *const blk_size,
+    size_t *const div_size, size_t *const num_divs)
+{
+    assert(blk_size != NULL);
+    assert(div_size != NULL);
+    assert(num_divs != NULL);
+
+    int dfd;
+    if ((dfd = open(bundle, O_RDONLY | O_DIRECTORY | O_CLOEXEC)) == -1) {
+        err(5, "open(%s) failed", bundle);
+    }
+    char *buf = malloc(DIV_NAME_BUFSIZE());
+    if (buf == NULL) {
+        err(5, "malloc(DIV_NAME_BUFSIZE) failed");
+    }
+
+    for (*num_divs = 0; *num_divs < SIZE_MAX; ++(*num_divs)) {
+        int fd;
+        struct stat sb = { .st_dev = 0 };
+
+        numtohexstr(*num_divs, buf, DIV_NAME_BUFSIZE());
+        if ((fd = openat(dfd, buf, O_RDONLY | O_CLOEXEC)) == -1) {
+            if (errno == ENOENT) {
+                break;
+            }
+            err(5, "open(%s/%s) failed", bundle, buf);
+        }
+        if (fstat(fd, &sb) == -1) {
+            err(5, "fstat(%s/%s) failed", bundle, buf);
+        }
+
+        if (S_ISCHR(sb.st_mode)) {
+            if (ioctl(fd, DIOCGMEDIASIZE, &sb.st_size) == -1) {
+                err(5, "ioctl(%s/%s, DIOCGMEDIASIZE) failed", bundle, buf);
+            }
+
+            unsigned int bs;
+            if (ioctl(fd, DIOCGSECTORSIZE, &bs) == -1) {
+                err(5, "ioctl(%s/%s, DIOCGSECTORSIZE) failed", bundle, buf);
+            }
+            if (*blk_size == 0) {
+                *blk_size = bs;
+            } else if (*blk_size != bs) {
+                errx(5, "sector size of %s/%s (%u bytes) is not the same as "
+                        "requested size or that of other divs (%u bytes).",
+                    bundle, buf, bs, *blk_size);
+            }
+        } else if (!S_ISREG(sb.st_mode)) {
+            errx(5, "%s/%s must be a file or character device.", bundle, buf);
+        }
+
+        if (close(fd) == -1) {
+            err(5, "close(%s/%s) failed", bundle, buf);
+        }
+
+        assert(sb.st_size > 0);
+        static_assert(sizeof(size_t) >= sizeof(sb.st_size), "");
+        const size_t st_size = (size_t) sb.st_size;
+
+        if (st_size < MINDIV_SIZE()) {
+            errx(5, "size of %s/%s is less than %u bytes.",
+                bundle, buf, MINDIV_SIZE());
+        }
+        if (st_size % MINDIV_SIZE() != 0) {
+            errx(5, "size of %s/%s is not a multiple of %u.",
+                bundle, buf, MINDIV_SIZE());
+        }
+
+        if (*num_divs == 0) {
+            *div_size = st_size;
+        } else if (st_size != *div_size) {
+            errx(5, "%s/%s is not the same size as other divs (%zu bytes).",
+                bundle, buf, *div_size);
+        }
+    }
+
+    if (*num_divs == 0) {
+        errx(5, "No divs found in %s.", bundle);
+    }
+
+    *blk_size = (*blk_size == 0) ? MINDIV_SIZE() : *blk_size;
+    if (*blk_size < MINDIV_SIZE()) {
+        errx(5, "sector size must be at least %u bytes.", MINDIV_SIZE());
+    }
+    if (*blk_size % MINDIV_SIZE() != 0) {
+        errx(5, "sector size must be a multiple of %u bytes.", MINDIV_SIZE());
+    }
+    if (*blk_size > *div_size) {
+        errx(5, "sector size cannot be greater than div size (%zu bytes).",
+            *div_size);
+    }
+
+    struct g_gate_ctl_create ggcc = { .gctl_mediasize = 0 };
+    static_assert(sizeof(long) == sizeof(ggcc.gctl_mediasize), "");
+    assert(!mul_overflow(*div_size, *num_divs));
+    assert(*div_size * *num_divs <= LONG_MAX);
+    static_assert(sizeof(unsigned int) == sizeof(ggcc.gctl_sectorsize), "");
+    g_gate_verbose_log(1, LOG_DEBUG, "blk_size = %u", *blk_size);
+    g_gate_verbose_log(1, LOG_DEBUG, "div_size = %zu", *div_size);
+    g_gate_verbose_log(1, LOG_DEBUG, "num_divs = %zu", *num_divs);
+
+    free(buf);
+    return(dfd);
+}
+
+static void
+map_fd(void *const addr, const int fd, const int prot,
+#ifdef NDEBUG
+    __attribute__((unused))
+#endif
+    const size_t div_size)
+{
+    assert(div_size != 0);
+
+    struct stat sb;
+    if (fstat(fd, &sb) == -1) {
+        err(1, "fstat() failed");
+    }
+
+    assert(sb.st_size > 0);
+    static_assert(sizeof(size_t) >= sizeof(sb.st_size), "");
+    const size_t st_size = (size_t) sb.st_size; 
+    assert(st_size == div_size);
+    assert(st_size % MINDIV_SIZE() == 0);
+
+    void *m;
+    const int flags = MAP_SHARED | MAP_FIXED | MAP_NOCORE/* | MAP_NOSYNC*/;
+    if ((m = mmap(addr, st_size, prot, flags, fd, 0)) == MAP_FAILED) {
+        err(1, "mmap() failed");
+    }
+}
+
+static void
+map_bundle(const uintptr_t base, const uintptr_t addr, const size_t div_size,
+    const int bundlefd, const int open_flags)
+{
+    assert(base % MINDIV_SIZE() == 0);
+    assert(addr % MINDIV_SIZE() == 0);
+    assert(addr >= base);
+    assert((addr - base) % MINDIV_SIZE() == 0);
+
+    int divfd;
+
+    static char *div;
+    if (div == NULL) {
+        div = malloc(DIV_NAME_BUFSIZE());
+        assert(div != NULL);
+    }
+    numtohexstr((addr - base) / div_size, div, DIV_NAME_BUFSIZE());
+
+    g_gate_verbose_log(3, LOG_DEBUG,
+        "->   [ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx; %s",
+        addr, addr + div_size, div_size, div);
+
+    if ((divfd = openat(bundlefd, div, open_flags | O_CLOEXEC)) == -1) {
+        err(6, "open(%s) failed", div);
+    }
+
+    int prot;
+    switch (open_flags & O_ACCMODE) {
+    case O_RDWR:
+        prot = PROT_READ | PROT_WRITE;
+        break;
+    case O_RDONLY:
+        prot = PROT_READ;
+        break;
+    case O_WRONLY:
+        prot = PROT_WRITE;
+        break;
+    default:
+        errx(6, "unknown open() flags: %d", open_flags);
+        break;
+    }
+    map_fd((void *) addr, divfd, prot, div_size);
+    if (close(divfd) == -1) {
+        err(6, "close(%s) failed", div);
+    }
+}
+
+static void *
+resv_vaddr(void *const addr, const size_t len)
+{
+    void *p;
+    const int prot = PROT_NONE;
+    const int flags = MAP_PRIVATE | MAP_ANON |
+        ((addr == NULL) ? 0 : MAP_FIXED);
+    if ((p = mmap(addr, len, prot, flags, -1, 0)) == MAP_FAILED) {
+        err(2, "mmap() failed");
+    }
+
+    return(p);
+}
+
+/* ======== */
+
+static sigjmp_buf memfault_env;
+static siginfo_t memfault_info;
+
+__attribute__((noreturn)) static void
+memfault_hdl(int sig, siginfo_t *info, __attribute__((unused)) void *uap)
+{
+    memfault_info = *info;
+    siglongjmp(memfault_env, sig);
+}
+
+static void
+install_memfault_hdl()
+{
+    struct sigaction a = {
+        .sa_sigaction = memfault_hdl,
+        .sa_flags = SA_SIGINFO
+    };
+    if (sigaction(SIGSEGV, &a, NULL) == -1) {
+        err(3, "sigaction(SIGSEGV) failed");
+    }
+    struct sigaction b = {
+        .sa_sigaction = memfault_hdl,
+        .sa_flags = SA_SIGINFO
+    };
+    if (sigaction(SIGBUS, &b, NULL) == -1) {
+        err(3, "sigaction(SIGBUS) failed");
+    }
+
+    // Use uncatchable signal as memfault_hdl installed flag
+    memfault_info.si_signo = SIGKILL;
+}
+
+static void
+check_expected_memfault(const void *const as, const void *const ae)
+{
+    if (memfault_info.si_signo == SIGKILL) {
+        errx(4, "memfault_info not initialized!");
+    } else if (memfault_info.si_signo != SIGSEGV &&
+               memfault_info.si_signo != SIGBUS) {
+        errx(4, "unexpected %s", strsignal(memfault_info.si_signo));
+    } else if (
+        !((as <= memfault_info.si_addr) && (memfault_info.si_addr < ae))
+    ) {
+        errx(4, "unexpected address %p", memfault_info.si_addr);
+    }
+}
+
+/* ======== */
+
+static void
+msync2(const void *const a, const size_t n, const int f)
+{
+    const uintptr_t b = (uintptr_t) a;
+    const size_t    m = b % MINDIV_SIZE();
+    const uintptr_t c = b - m;
+    const size_t   nn = n + m;
+
+    g_gate_verbose_log(3, LOG_DEBUG,
+        " msync(0x%09" PRIxPTR ",      0x%08lx)", c, nn);
+
+    if (nn > 0 && msync((void *) c, nn, f) == -1) {
+        err(8, "msync() failed");
+    }
+}
+
+__attribute__((unused)) static void *
+memcpy_msync(void *const d, const void *const s, const size_t n)
+{
+    const uintptr_t dd = (uintptr_t) d;
+    g_gate_verbose_log(3, LOG_DEBUG,
+        "memcpy(0x%09" PRIxPTR ", ..., 0x%08lx)", dd, n);
+
+    memcpy(d, s, n);
+    msync2(d, n, MS_SYNC);
+    return(d);
+}
+
+/* ======== */
+
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wpadded"
+struct bundle_spec {
+    size_t resv;
+    uintptr_t as;
+    uintptr_t ae;
+
+    int bundlefd;
+    size_t div_size;
+    size_t num_divs;
+    unsigned int blk_size;
+
+    int open_flags;
+    bool bio_delete;
+};
+#pragma clang diagnostic pop
+
+/* Too lazy to do LRU */
+#define mapped_addrs_size 100
+static void *mapped_addrs[mapped_addrs_size];
+
+static void
+update_mapped_addrs(const size_t div_size, void *const a)
+{
+    assert(mapped_addrs_size <= UINT_MAX);
+
+    const size_t i = arc4random_uniform((unsigned int) mapped_addrs_size);
+    if (mapped_addrs[i] != NULL) {
+        resv_vaddr(mapped_addrs[i], div_size);
+
+        const uintptr_t mai = (uintptr_t) mapped_addrs[i];
+        g_gate_verbose_log(3, LOG_DEBUG,
+            "<-   [ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx",
+            mai, mai + div_size, div_size);
+    }
+    mapped_addrs[i] = a;
+}
+
+static void
+do_read(const struct bundle_spec *const bspec,
+    struct g_gate_ctl_io *const ggio)
+{
+    static_assert(sizeof(ggio->gctl_length) <= sizeof(size_t), "");
+    static_assert(sizeof(ggio->gctl_offset) <= sizeof(uintptr_t), "");
+    assert(ggio->gctl_length >= 0);
+    assert(ggio->gctl_offset >= 0);
+
+    assert(memfault_info.si_signo == SIGKILL);
+
+    assert(!mul_overflow(bspec->div_size, mapped_addrs_size));
+    const size_t max_len = bspec->div_size * mapped_addrs_size;
+    if ((size_t) ggio->gctl_length > max_len) {
+        ggio->gctl_error = ENOMEM;
+        return;
+    }
+
+    const uintptr_t a = bspec->as + (uintptr_t) ggio->gctl_offset;
+    assert(a >= bspec->as);
+    assert(a < bspec->ae);
+    const uintptr_t b = a + (uintptr_t) ggio->gctl_length;
+    assert(b <= bspec->ae);
+    const size_t m = ((size_t) ggio->gctl_offset) % bspec->div_size;
+
+    for (;;) {
+        volatile uintptr_t c = (volatile uintptr_t) NULL;
+        if (sigsetjmp(memfault_env, 1) == 0) {
+            for (c = a - m; c < b; c += bspec->div_size) {
+                static_assert(CHAR_BIT == 8, "");
+                volatile unsigned char *d1 = (volatile unsigned char *) c;
+                __attribute__((unused)) volatile unsigned char d2 = *d1;
+            }
+            break;
+        } else {
+            check_expected_memfault((void *) bspec->as, (void *) bspec->ae);
+            const uintptr_t si_addr = (uintptr_t) memfault_info.si_addr;
+            memfault_info.si_signo = SIGKILL;
+
+            assert((void *) c != NULL);
+            assert(si_addr == c);
+
+            // UB??
+            assert(bspec->as <= si_addr);
+            const size_t n = (si_addr - bspec->as) % bspec->div_size;
+            const uintptr_t d = si_addr - n;
+            assert(d <= si_addr);
+            assert(si_addr < d + 2 * bspec->div_size);
+
+            assert(d >= bspec->as);
+            assert(d < bspec->ae);
+            assert((d - bspec->as) % bspec->div_size == 0);
+
+            update_mapped_addrs(bspec->div_size, (void *) d);
+
+            assert(d % bspec->blk_size == 0);
+            map_bundle(bspec->as, d, bspec->div_size, bspec->bundlefd,
+                bspec->open_flags);
+        }
+    }
+
+    g_gate_verbose_log(2, LOG_DEBUG, "do_read(0x%lx, 0x%lx)",
+        ggio->gctl_offset, ggio->gctl_length);
+    ggio->gctl_data = (void *) a;
+    ggio->gctl_error = 0;
+}
+
+static void
+do_write(const struct bundle_spec *const bspec,
+    struct g_gate_ctl_io *const ggio, const bool zero)
+{
+    static void *zs_;
+    if (zero && zs_ == NULL) {
+        zs_ = calloc(1, bspec->blk_size);
+        assert(zs_ != NULL);
+    }
+    const void *const zeros = zs_;
+
+    static_assert(sizeof(ggio->gctl_length) <= sizeof(size_t), "");
+    static_assert(sizeof(ggio->gctl_offset) <= sizeof(uintptr_t), "");
+    assert(ggio->gctl_length >= 0);
+    assert(ggio->gctl_offset >= 0);
+
+    assert(memfault_info.si_signo == SIGKILL);
+    assert(((size_t) ggio->gctl_length) % bspec->blk_size == 0);
+
+    uintptr_t d = bspec->as + (uintptr_t) ggio->gctl_offset;
+    assert(d >= bspec->as);
+    assert(d < bspec->ae);
+    uintptr_t s = (uintptr_t) ggio->gctl_data;
+    size_t len = (size_t) ggio->gctl_length;
+
+    for (;;) {
+        if (sigsetjmp(memfault_env, 1) == 0) {
+            g_gate_verbose_log(3, LOG_DEBUG,
+                "memcpy(0x%09" PRIxPTR ", ..., 0x%08lx)", d, len);
+            if (zero) {
+                assert(zeros != NULL);
+                for (size_t i = 0; i < len; i += bspec->blk_size) {
+                    memcpy((void *) (d + i), zeros, bspec->blk_size);
+                }            
+            } else {
+                memcpy((void *) d, (void *) s, len);            
+            }
+            break;
+        } else {
+            check_expected_memfault((void *) bspec->as, (void *) bspec->ae);
+            const uintptr_t si_addr = (uintptr_t) memfault_info.si_addr;
+            memfault_info.si_signo = SIGKILL;
+
+            // UB??
+            assert(bspec->as <= si_addr);
+            const size_t m = (si_addr - bspec->as) % bspec->div_size;
+            const uintptr_t a = si_addr - m;
+            assert(a <= si_addr);
+            assert(si_addr < a + 2 * bspec->div_size);
+
+            update_mapped_addrs(bspec->div_size, (void *) a);
+
+            assert(a % bspec->blk_size == 0);
+            map_bundle(bspec->as, a, bspec->div_size, bspec->bundlefd,
+                bspec->open_flags);
+
+            // More UB??
+            assert(d <= si_addr);
+            assert(len >= si_addr - d);
+            len = len - (si_addr - d);
+            s = s + (si_addr - d);
+            d = si_addr;
+        }
+    }
+
+    g_gate_verbose_log(2, LOG_DEBUG, "do_write(0x%lx, 0x%lx)",
+        ggio->gctl_offset, ggio->gctl_length);
+    ggio->gctl_error = 0;
+}
+
+static void
+do_delete(const struct bundle_spec *const bspec,
+    struct g_gate_ctl_io *const ggio)
+{
+    if (bspec->bio_delete) {
+        g_gate_verbose_log(2, LOG_DEBUG, "do_delete() => do_write()");
+        do_write(bspec, ggio, true);
+    } else {
+        g_gate_verbose_log(2, LOG_DEBUG, "do_delete() => EOPNOTSUPP");
+        ggio->gctl_error = EOPNOTSUPP;
+    }
+}
+
+static void
+do_flush(const struct bundle_spec *const bspec,
+    __attribute__((unused)) const struct g_gate_ctl_io *const ggio)
+{
+    if (g_gate_verbose >= 4) {
+        for (size_t i = 0; i < mapped_addrs_size; ++i) {
+            const void *ma = mapped_addrs[i];
+            if (ma != NULL) {
+                g_gate_verbose_log(4, LOG_DEBUG,
+                    "0x%09" PRIxPTR, (uintptr_t) ma);
+            }
+        }
+    }
+
+    g_gate_verbose_log(2, LOG_DEBUG, "do_flush()");
+    msync2((void *) bspec->as, bspec->resv, MS_SYNC);
+}
+
+__attribute__((noreturn)) void
+g_gatel_serve_bundle(const int dfd_, const unsigned int ss_, const size_t ds_,
+    const size_t nd_, const int unit, const int fs_, const unsigned int dz_)
+{
+    freopen("/dev/null", "r", stdin);
+    if (g_gate_verbose == 0) {
+        if (daemon(0, 0) == -1) {
+            g_gate_destroy(unit, 1);
+            err(EXIT_FAILURE, "Cannot daemonize");
+        }
+        freopen("/dev/null", "w", stdout);
+        freopen("/dev/null", "w", stderr);
+    }
+    g_gate_verbose_log(1, LOG_DEBUG, "Worker created: %u.", getpid());
+
+    assert(dfd_ != -1);
+    assert(!mul_overflow(ds_, nd_));
+    struct bundle_spec bspec = {
+        .resv = ds_ * nd_,
+        .bundlefd = dfd_,
+        .div_size = ds_,
+        .num_divs = nd_,
+        .blk_size = ss_,
+        .open_flags = fs_,
+        .bio_delete = (bool) dz_
+    };
+    bspec.as = (uintptr_t) (resv_vaddr(NULL, bspec.resv));
+    bspec.ae = bspec.as + bspec.resv;
+
+    g_gate_verbose_log(1, LOG_DEBUG,
+        "[ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx",
+        bspec.as, bspec.ae, bspec.resv);
+
+    assert(bspec.blk_size > 0);
+    static_assert(sizeof(bspec.blk_size) <= sizeof(size_t), "");
+    void *ggio_data_buf;
+    size_t ggio_data_buf_len = (size_t) bspec.blk_size;
+    if ((ggio_data_buf = malloc(ggio_data_buf_len)) == NULL) {
+        err(EXIT_FAILURE, "malloc() failed");
+    }
+
+    install_memfault_hdl();
+
+    for (;;) {
+        struct g_gate_ctl_io ggio = {
+            .gctl_version = G_GATE_VERSION,
+            .gctl_unit = unit,
+            .gctl_data = (void *) ggio_data_buf,
+            .gctl_length = (off_t) ggio_data_buf_len
+        };
+
+        g_gate_ioctl(G_GATE_CMD_START, &ggio);
+
+        switch (ggio.gctl_error) {
+        case 0:
+            break;
+        case ECANCELED:
+            g_gate_close_device();
+            if (close(bspec.bundlefd) == -1) {
+                err(EXIT_FAILURE, "close() failed");
+            }
+            do_flush(&bspec, &ggio);
+            free(ggio_data_buf);
+            g_gate_verbose_log(1, LOG_DEBUG, "Finished.");
+            exit(EXIT_SUCCESS);
+        case ENOMEM:
+            assert(ggio.gctl_cmd == BIO_WRITE
+                || (ggio.gctl_cmd == BIO_DELETE && bspec.bio_delete));
+            assert(ggio.gctl_length > 0);
+            static_assert(sizeof(ggio.gctl_length) <=
+                          sizeof(ggio_data_buf_len), "");
+            ggio_data_buf_len = (size_t) ggio.gctl_length;
+            ggio_data_buf = realloc(ggio_data_buf, ggio_data_buf_len);
+            if (ggio_data_buf == NULL) {
+                err(EXIT_FAILURE, "realloc() failed");
+            }
+            continue;
+        case ENXIO:
+        default:
+            g_gate_xlog("ioctl(/dev/%s): %s.", G_GATE_CTL_NAME,
+                strerror(ggio.gctl_error));
+        }
+
+        switch(ggio.gctl_cmd) {
+        case BIO_READ:
+            do_read(&bspec, &ggio);
+            break;
+        case BIO_WRITE:
+            do_write(&bspec, &ggio, false);
+            break;
+        case BIO_DELETE:
+            do_delete(&bspec, &ggio);
+            break;
+        case BIO_FLUSH:
+            do_flush(&bspec, &ggio);
+            break;
+        default:
+            g_gate_verbose_log(1, LOG_DEBUG, "unsupported: %d", ggio.gctl_cmd);
+            ggio.gctl_error = EOPNOTSUPP;
+            break;
+        }
+
+        g_gate_ioctl(G_GATE_CMD_DONE, &ggio);
+        g_gate_verbose_log(3, LOG_DEBUG, "========");
+    }
+}

Property changes on: stable/10/sbin/ggate/ggatel/ggatel2.c
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 01:55:58 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1388B7561F
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 01:55:58 +0000 (UTC) (envelope-from yuri@rawbw.com)
Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42])
 by mx1.freebsd.org (Postfix) with ESMTP id B11AE15F7
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 01:55:58 +0000 (UTC)
 (envelope-from yuri@rawbw.com)
Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net
 [24.5.143.190]) (authenticated bits=0)
 by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u671tvPj042010
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO);
 Wed, 6 Jul 2016 18:55:57 -0700 (PDT) (envelope-from yuri@rawbw.com)
X-Authentication-Warning: shell1.rawbw.com: Host
 c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be
 yuri.doctorlan.com
Subject: Re: Why kinfo_getvmmap is sometimes so expensive?
To: Konstantin Belousov <kostikbel@gmail.com>
References: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
 <20160707001913.GJ38613@kib.kiev.ua>
Cc: Freebsd hackers list <freebsd-hackers@freebsd.org>
From: Yuri <yuri@rawbw.com>
Message-ID: <0b5c9018-2b12-e993-a6df-06ecad6a7b07@rawbw.com>
Date: Wed, 6 Jul 2016 18:55:56 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <20160707001913.GJ38613@kib.kiev.ua>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 01:55:58 -0000

On 07/06/2016 17:19, Konstantin Belousov wrote:
> To calculate residency count for the process map entries, kernel has to
> iterate over all pages.  This operation was somewhat optimized in 10.3
> and HEAD, particularly for the large sparce mappings.  But for large populated
> mappings there is no other way then to check each page.
>
> You may confirm my hypothesis by setting sysctl
> kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU
> consumption changed.  Of course, you will not get the resident count
> in the returned structure, after the knob is tweaked.


Yes, this explains it. kern.proc_vmmap_skip_resident_count=0 made CPU 
consumption to go down.

So, it is better to parse /proc/<pid>/map to get the process size.


Thank you,

Yuri


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 04:34:08 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B84CB75CD4;
 Thu,  7 Jul 2016 04:34:08 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 214D21CB0;
 Thu,  7 Jul 2016 04:34:07 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467866046079240.9597104517618;
 Wed, 6 Jul 2016 21:34:06 -0700 (PDT)
Date: Wed, 06 Jul 2016 21:34:06 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Message-ID: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
Subject: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 04:34:08 -0000

As a first step towards managing linux user space in a chrooted /compat/linux, initially for i915 testing with intel gpu tools, later on to get widevine and steam to work I'm trying to get apt to work. I've fixed a number of issues to date in pseudofs/linprocfs but now I'm running in to a bug caused by differences in SIGCHLD handling between Linux and FreeBSD. The situation is that apt will spawn dpkg and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt causes a short read on the pipe which lets apt then continue. On FreeBSD a SIGCHLD is silently ignored. I've even experimented with doing a kill -20 <apt pid> to no effect.
 
It would be easy enough to check sysvec against linux in pipe_read and break out of the loop when it's awakened from msleep (assuming there aren't deeper issues with signal propagation for anything other than SIGINT/SIGKILL) and then do a short read. However, I'm assuming that anyone who has worked in this area probably has a cleaner solution.

Thanks in advance.

-M


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 04:44:04 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2605DB7502B;
 Thu,  7 Jul 2016 04:44:04 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0727D1360;
 Thu,  7 Jul 2016 04:44:04 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u674hsgK007808;
 Wed, 6 Jul 2016 21:43:58 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201607070443.u674hsgK007808@gw.catspoiler.org>
Date: Wed, 6 Jul 2016 21:43:54 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD
 breaks apt
To: mmacy@nextbsd.org
cc: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org
In-Reply-To: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 04:44:04 -0000

On  6 Jul, Matthew Macy wrote:
> As a first step towards managing linux user space in a chrooted
> /compat/linux, initially for i915 testing with intel gpu tools, later
> on to get widevine and steam to work I'm trying to get apt to work.
> I've fixed a number of issues to date in pseudofs/linprocfs but now
> I'm running in to a bug caused by differences in SIGCHLD handling
> between Linux and FreeBSD. The situation is that apt will spawn dpkg
> and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt
> causes a short read on the pipe which lets apt then continue. On
> FreeBSD a SIGCHLD is silently ignored. I've even experimented with
> doing a kill -20 <apt pid> to no effect.
>  
> It would be easy enough to check sysvec against linux in pipe_read and
> break out of the loop when it's awakened from msleep (assuming there
> aren't deeper issues with signal propagation for anything other than
> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that
> anyone who has worked in this area probably has a cleaner solution.

It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't
be in this case.


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 04:52:25 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8A85B7546D;
 Thu,  7 Jul 2016 04:52:25 +0000 (UTC)
 (envelope-from kmacybsd@gmail.com)
Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com
 [IPv6:2607:f8b0:4001:c06::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 6DF22196D;
 Thu,  7 Jul 2016 04:52:25 +0000 (UTC)
 (envelope-from kmacybsd@gmail.com)
Received: by mail-io0-x22e.google.com with SMTP id i186so11787215iof.1;
 Wed, 06 Jul 2016 21:52:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc; bh=Qr7zrWnkMf4gX8LnCeEGEl5SSsZupuh3hdIGcY8xfso=;
 b=szWSVLGT1dtnjhZy5L0sQ669CCPbdpz24VTJg1Sj17NekpzwUxfk4jgMKGPIX6aJ7T
 JuTX8SWqYrGkqVxdS9cYP5Z3rxr1fnWgoCU+mkis3a7HyWwMZZF8uH9IniIVZuohLuHm
 nrcEIU8Dy3fuaFjwUK73Jqxr/msX+3x+KHQkkgZ606dsDEwj5mm787Xs8eOFqxkAtZ0y
 8tU9NEnOXTxyFfxlVBU77FuGpPVumFdwKHTYDelcX7cqeLfABP/c2W78pMfUumR3jewo
 zJlqPsBCbt7/OucV57DdK0Wehymuc+EJTtTUcOe73p3L8Rvy7GjwOe8dAjFA+vzRnko4
 eq2A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:date
 :message-id:subject:from:to:cc;
 bh=Qr7zrWnkMf4gX8LnCeEGEl5SSsZupuh3hdIGcY8xfso=;
 b=JeoNsZwJjDdHP6O9tgEkQp7t2aq+xpMHEztytsZ13XAV6Bxgni3Cvn2cRXlz+XRhG4
 ox9GKj780BiKUJ9XucWRVPMPws9mfL7cdayCWEALzC//bJ1Y/Nsl0z9jo6dFILzCKeuA
 azLXLf61eTK8koat7+gotRl+e6QAjnqqZcdLAbjinWZS6+9o02upwcBGjrY3Ob/5fNpL
 Vy8Mq4QAGIdk0iRjDy3uelYM0fE8Nzt8Ttp2SdVKlAi0M/YiHbtWVFb1cgWwMNfAejFO
 Ua34RlaQ5ynnEnnasAGV6YZQ7Oo8Pkg3WSjFrko1zjpsIc05Nu6WslXZxrAF0iicKBmb
 4YAQ==
X-Gm-Message-State: ALyK8tKzx8zTAYzHFnRqyUZYSu7NjXuneyI9tqFOIrSr3kbtjDuQBlb3kBKKx8caM2nKhNAUIzSXxQ9YaCsLdw==
MIME-Version: 1.0
X-Received: by 10.107.162.65 with SMTP id l62mr701724ioe.138.1467867144731;
 Wed, 06 Jul 2016 21:52:24 -0700 (PDT)
Sender: kmacybsd@gmail.com
Received: by 10.107.134.218 with HTTP; Wed, 6 Jul 2016 21:52:24 -0700 (PDT)
In-Reply-To: <201607070443.u674hsgK007808@gw.catspoiler.org>
References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
 <201607070443.u674hsgK007808@gw.catspoiler.org>
Date: Wed, 6 Jul 2016 21:52:24 -0700
X-Google-Sender-Auth: XwvOobuQKCFHqshmFbpAnOyFxzo
Message-ID: <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
From: "K. Macy" <kmacy@freebsd.org>
To: Don Lewis <truckman@freebsd.org>
Cc: "mmacy@nextbsd.org" <mmacy@nextbsd.org>, 
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 04:52:25 -0000

On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> wrote:

> On  6 Jul, Matthew Macy wrote:
> > As a first step towards managing linux user space in a chrooted
> > /compat/linux, initially for i915 testing with intel gpu tools, later
> > on to get widevine and steam to work I'm trying to get apt to work.
> > I've fixed a number of issues to date in pseudofs/linprocfs but now
> > I'm running in to a bug caused by differences in SIGCHLD handling
> > between Linux and FreeBSD. The situation is that apt will spawn dpkg
> > and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt
> > causes a short read on the pipe which lets apt then continue. On
> > FreeBSD a SIGCHLD is silently ignored. I've even experimented with
> > doing a kill -20 <apt pid> to no effect.
> >
> > It would be easy enough to check sysvec against linux in pipe_read and
> > break out of the loop when it's awakened from msleep (assuming there
> > aren't deeper issues with signal propagation for anything other than
> > SIGINT/SIGKILL) and then do a short read. However, I'm assuming that
> > anyone who has worked in this area probably has a cleaner solution.
>
> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't
> be in this case.


Good point.

Thinking more about it, this seems like a bug in FreeBSD. Not a valid
behavioral difference.

-M

>
> _______________________________________________
> freebsd-current@freebsd.org <javascript:;> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org
> <javascript:;>"
>

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 05:15:14 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26C7DB75CE1;
 Thu,  7 Jul 2016 05:15:14 +0000 (UTC) (envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu
 [18.7.68.35])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B4E1F1CE5;
 Thu,  7 Jul 2016 05:15:13 +0000 (UTC) (envelope-from kaduk@mit.edu)
X-AuditID: 12074423-cafff70000006b63-be-577de42b8390
Received: from mailhub-auth-1.mit.edu ( [18.9.21.35])
 (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by  (Symantec Messaging Gateway) with SMTP id F4.B1.27491.B24ED775;
 Thu,  7 Jul 2016 01:10:03 -0400 (EDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])
 by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id u675A2RY001487;
 Thu, 7 Jul 2016 01:10:03 -0400
Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37])
 (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
 by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id u6759xsK021350
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
 Thu, 7 Jul 2016 01:10:02 -0400
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
 id u6759wPw002618; Thu, 7 Jul 2016 01:09:58 -0400 (EDT)
Date: Thu, 7 Jul 2016 01:09:58 -0400 (EDT)
From: Benjamin Kaduk <bjk@FreeBSD.org>
X-X-Sender: kaduk@multics.mit.edu
To: freebsd-hackers@FreeBSD.org
cc: freebsd-current@FreeBSD.org
Subject: Last call for 2016Q2 quarterly status reports
In-Reply-To: <alpine.GSO.1.10.1606202359140.18480@multics.mit.edu>
Message-ID: <alpine.GSO.1.10.1607070103000.5272@multics.mit.edu>
References: <alpine.GSO.1.10.1606202359140.18480@multics.mit.edu>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGIsWRmVeSWpSXmKPExsUixCmqrKv9pDbcYO0ha4s5bz4wWWzf/I/R
 gcljxqf5LAGMUVw2Kak5mWWpRfp2CVwZZz6cYS3o5q34vKKRrYHxAlcXIyeHhICJxIKJG5i7
 GLk4hATamCRenPrBAuFsYJT4fusxVOYgk8SE9VeBHA4gp17i2LMQkG4WAS2JH79mMYHYbAJq
 EutXXGOGmKoosfnUJDBbREBeYl/Te3YQmxnI3rJ6MhvIGGEBM4nbR1lAwpwCThKPpv0GK+EV
 cJA48eE7mC0k4CjxoeMiK4gtKqAjsXr/FBaIGkGJkzOfsECM1JJYPn0bywRGwVlIUrOQpBYw
 Mq1ilE3JrdLNTczMKU5N1i1OTszLSy3SNdPLzSzRS00p3cQICk52F+UdjC/7vA8xCnAwKvHw
 /sirDRdiTSwrrsw9xCjJwaQkyrvnLlCILyk/pTIjsTgjvqg0J7X4EKMEB7OSCO/eR0A53pTE
 yqrUonyYlDQHi5I4LyMDA4OQQHpiSWp2ampBahFMVoaDQ0mCl/UxUKNgUWp6akVaZk4JQpqJ
 gxNkOA/QcDaQGt7igsTc4sx0iPwpRkUpcV4vkIQASCKjNA+uF5w8djOpvmIUB3pFmHc3yG08
 wMQD1/0KaDAT0OCfLtUgg0sSEVJSDYwTYuTnlUfGLD4Rv6ve6NOBw90b25a+kngUvLHvfYNa
 WFH1wph/jKz5Ild0y4o4q3yOc4dqvrn1frnX/CdcsYnSwatXL1f95NYUlizcVydlq8f2+r9O
 xtR5CaZ/vrJ29D2PWef/wGSWc9XavKCOdR8OGq16WpZy8uAVxQTVs5NDUkTi39ws8lViKc5I
 NNRiLipOBADNoPxK+QIAAA==
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 05:15:14 -0000

Reminder: we're still looking for more submissions for the 2015Q2 status
report!  Please let us know if you wish to write an entry, even if it will
not be finished by today.

Thanks,

Ben (for the monthly@ team)

On Tue, 21 Jun 2016, Benjamin Kaduk wrote:

> Dear FreeBSD Community,
>
> The deadline for the next FreeBSD Quarterly Status update is July 7,
> 2016, for work done in April through June.
>
> Status report submissions do not have to be very long.  They may be about
> anything happening in the FreeBSD project and community, and provide a
> great way to inform FreeBSD users and developers about what you're working
> on.  Submission of reports is not restricted to committers.  Anyone doing
> anything interesting and FreeBSD-related can -- and should -- write one!
>
> The preferred and easiest submission method is to use the XML generator
> [1] with the results emailed to the status report team at monthly at
> FreeBSD.org .  There is also an XML template [2] which can be filled out
> manually and attached if preferred.  For the expected content and style,
> please study our guidelines on how to write a good status report [3].
> You can also review previous issues [4][5] for ideas on the style and
> format.
>
> We are looking forward to all of your 2016Q2 reports!
>
> Thanks,
>
> Ben (on behalf of monthly@)
>
> [1] http://www.freebsd.org/cgi/monthly.cgi
> [2] http://www.freebsd.org/news/status/report-sample.xml
> [3] http://www.freebsd.org/news/status/howto.html
> [4] http://www.freebsd.org/news/status/report-2015-10-2015-12.html
> [4] http://www.freebsd.org/news/status/report-2016-01-2016-03.html
>
>

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 06:28:50 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D0CBB76D6D
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 06:28:50 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com
 [209.85.215.52])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0B5B91ECE
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 06:28:49 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: by mail-lf0-f52.google.com with SMTP id h129so4893520lfh.1
 for <freebsd-hackers@freebsd.org>; Wed, 06 Jul 2016 23:28:49 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=EdmFlZ1aF8fhmSd8eppdAdvgKXWYd1kULvgaG+xbvvg=;
 b=XnvbpfAM65c/PFerBXshQgjP1lIazTEYWCmPP59fxDonCnLxiXlhDefLsfUP1SmYdK
 91BZ63g6t5aQkC4x9lUdmXPXqxFMKy0OCuCLOKxo5DwJBEFWSw7+EqodE57GYfVq9jZs
 E25aWc3B3ui38LRzJ2D0P23Uphp7gMmvNgnOKT9HepEE+jV6Rk+R2q/SqPFTMTwjm+zv
 d3Jf6x65szJtP8f8c2n1fg1wvHLmY2VfMe68H39wZHgfbHxifQOco7H7BJOhIDcrfbs0
 kQLGcFwnYBY3Dwh04vbCjDp7dXiQ/3gUFqSyVy/2FpHfk/i2t8R5gY0UVSpKMUgnaAVk
 CnMw==
X-Gm-Message-State: ALyK8tJDWkdSoE0TUfwJSWqhjq2B4UrdBBQtduX2kralrl7jjY6A2XrTMrP8T4sXScWPvg==
X-Received: by 10.25.21.16 with SMTP id l16mr5460245lfi.99.1467872922271;
 Wed, 06 Jul 2016 23:28:42 -0700 (PDT)
Received: from [192.168.1.2] ([89.169.173.68])
 by smtp.gmail.com with ESMTPSA id a199sm705946lfe.35.2016.07.06.23.28.41
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 06 Jul 2016 23:28:41 -0700 (PDT)
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
To: "K. Macy" <kmacy@freebsd.org>, Don Lewis <truckman@freebsd.org>
References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
 <201607070443.u674hsgK007808@gw.catspoiler.org>
 <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
Cc: "mmacy@nextbsd.org" <mmacy@nextbsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
From: Andrey Chernov <ache@freebsd.org>
Message-ID: <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org>
Date: Thu, 7 Jul 2016 09:28:40 +0300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 06:28:50 -0000

On 07.07.2016 7:52, K. Macy wrote:
> On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> wrote:
> 
>> On  6 Jul, Matthew Macy wrote:
>>> As a first step towards managing linux user space in a chrooted
>>> /compat/linux, initially for i915 testing with intel gpu tools, later
>>> on to get widevine and steam to work I'm trying to get apt to work.
>>> I've fixed a number of issues to date in pseudofs/linprocfs but now
>>> I'm running in to a bug caused by differences in SIGCHLD handling
>>> between Linux and FreeBSD. The situation is that apt will spawn dpkg
>>> and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt
>>> causes a short read on the pipe which lets apt then continue. On
>>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with
>>> doing a kill -20 <apt pid> to no effect.
>>>
>>> It would be easy enough to check sysvec against linux in pipe_read and
>>> break out of the loop when it's awakened from msleep (assuming there
>>> aren't deeper issues with signal propagation for anything other than
>>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that
>>> anyone who has worked in this area probably has a cleaner solution.
>>
>> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't
>> be in this case.
> 
> 
> 
> Good point.
> 
> Thinking more about it, this seems like a bug in FreeBSD. Not a valid
> behavioral difference.

You better need consult with POSIX before fixing things toward any
Linuxisms blindly in our native code. I don't have a time now to see, is
it really a bug according to POSIX, but please read or just find all
SIGCHLD there:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html
it explain SIGCHLD actions in deep details.
And that one too:
http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 06:40:59 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CB49B75645;
 Thu,  7 Jul 2016 06:40:59 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3DA14179B;
 Thu,  7 Jul 2016 06:40:58 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467873657345708.1038340425864;
 Wed, 6 Jul 2016 23:40:57 -0700 (PDT)
Date: Wed, 06 Jul 2016 23:40:57 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Andrey Chernov" <ache@freebsd.org>
Cc: "K. Macy" <kmacy@freebsd.org>, "Don Lewis" <truckman@freebsd.org>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, 
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
Message-ID: <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org>
In-Reply-To: <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org>
References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
 <201607070443.u674hsgK007808@gw.catspoiler.org>
 <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
 <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 06:40:59 -0000


 ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov <ache@freebsd.org> wrote ---- 
 > On 07.07.2016 7:52, K. Macy wrote: 
 > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> wrote: 
 > >  
 > >> On  6 Jul, Matthew Macy wrote: 
 > >>> As a first step towards managing linux user space in a chrooted 
 > >>> /compat/linux, initially for i915 testing with intel gpu tools, later 
 > >>> on to get widevine and steam to work I'm trying to get apt to work. 
 > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now 
 > >>> I'm running in to a bug caused by differences in SIGCHLD handling 
 > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg 
 > >>> and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt 
 > >>> causes a short read on the pipe which lets apt then continue. On 
 > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with 
 > >>> doing a kill -20 <apt pid> to no effect. 
 > >>> 
 > >>> It would be easy enough to check sysvec against linux in pipe_read and 
 > >>> break out of the loop when it's awakened from msleep (assuming there 
 > >>> aren't deeper issues with signal propagation for anything other than 
 > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that 
 > >>> anyone who has worked in this area probably has a cleaner solution. 
 > >> 
 > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't 
 > >> be in this case. 
 > >  
 > >  
 > >  
 > > Good point. 
 > >  
 > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid 
 > > behavioral difference. 
 >  
 > You better need consult with POSIX before fixing things toward any 
 > Linuxisms blindly in our native code. I don't have a time now to see, is 
 > it really a bug according to POSIX, but please read or just find all 
 > SIGCHLD there: 
 > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html 
 > it explain SIGCHLD actions in deep details. 
 > And that one too: 
 > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html 


I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs. And I was asking for help with that, not a link to SUSv3 or POSIX. 

On closer reading of the man pages it looks like linux's clone is supposed to change the disposition for the exit signal. I'll have to write a test program to reproduce this behavior in isolation.

-M


 >  
 >  
 > _______________________________________________ 
 > freebsd-hackers@freebsd.org mailing list 
 > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers 
 > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" 
 > 


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 06:48:57 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88366B75A32
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 06:48:57 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com
 [209.85.215.52])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 161D21CE3
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 06:48:56 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: by mail-lf0-f52.google.com with SMTP id q132so5109104lfe.3
 for <freebsd-hackers@freebsd.org>; Wed, 06 Jul 2016 23:48:56 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=EtdIF6uNxMw8QIZlVINtfPUXzm2xUBy+KZBCTastsVQ=;
 b=MNN8TkpohFDgUr8xmB28SWmxt4X5D7imdouGOMWbMpOS4ZinC8PckWxbKxEn9j71R+
 gwxfQL+N4iwTI17oaZdaTl5jbIGxkB2cICr+5f5//Tf5TRqS10yI9XiapIYblIth8ET3
 Fc4x+ficDH1dXcygwglUn7YToZYBU2EOWeSmilduFiaew9azmXmWGKbR2UxQM0euYZ77
 mzs3G2mZLnZiJNG1MwZ/d5WY/pr5f3wDeKWjOvhGn7k+Ll5ym85fsOVQx+Y1EiCLU2nE
 LUL2dv3mH4ko/7p/36cNirswGwggCG3qj4XbMV1ysejFvTsawXHyGBhHsdJvxkOHFMf9
 VgxQ==
X-Gm-Message-State: ALyK8tJGXf9ZaF8r8UPKj6YkXnsAsS/uvvu0UPWRpNMefDgEdK1xRTbsedyTxZQT84+lhQ==
X-Received: by 10.25.133.87 with SMTP id h84mr5758688lfd.210.1467874134926;
 Wed, 06 Jul 2016 23:48:54 -0700 (PDT)
Received: from [192.168.1.2] ([89.169.173.68])
 by smtp.gmail.com with ESMTPSA id n7sm7077524lfb.31.2016.07.06.23.48.53
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 06 Jul 2016 23:48:54 -0700 (PDT)
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
To: Matthew Macy <mmacy@nextbsd.org>
References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
 <201607070443.u674hsgK007808@gw.catspoiler.org>
 <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
 <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org>
 <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org>
Cc: "K. Macy" <kmacy@freebsd.org>, Don Lewis <truckman@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
From: Andrey Chernov <ache@freebsd.org>
Message-ID: <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org>
Date: Thu, 7 Jul 2016 09:48:53 +0300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 06:48:57 -0000

On 07.07.2016 9:40, Matthew Macy wrote:
> 
> 
> 
>  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov <ache@freebsd.org> wrote ---- 
>  > On 07.07.2016 7:52, K. Macy wrote: 
>  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> wrote: 
>  > >  
>  > >> On  6 Jul, Matthew Macy wrote: 
>  > >>> As a first step towards managing linux user space in a chrooted 
>  > >>> /compat/linux, initially for i915 testing with intel gpu tools, later 
>  > >>> on to get widevine and steam to work I'm trying to get apt to work. 
>  > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now 
>  > >>> I'm running in to a bug caused by differences in SIGCHLD handling 
>  > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg 
>  > >>> and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt 
>  > >>> causes a short read on the pipe which lets apt then continue. On 
>  > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with 
>  > >>> doing a kill -20 <apt pid> to no effect. 
>  > >>> 
>  > >>> It would be easy enough to check sysvec against linux in pipe_read and 
>  > >>> break out of the loop when it's awakened from msleep (assuming there 
>  > >>> aren't deeper issues with signal propagation for anything other than 
>  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that 
>  > >>> anyone who has worked in this area probably has a cleaner solution. 
>  > >> 
>  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't 
>  > >> be in this case. 
>  > >  
>  > >  
>  > >  
>  > > Good point. 
>  > >  
>  > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid 
>  > > behavioral difference. 
>  >  
>  > You better need consult with POSIX before fixing things toward any 
>  > Linuxisms blindly in our native code. I don't have a time now to see, is 
>  > it really a bug according to POSIX, but please read or just find all 
>  > SIGCHLD there: 
>  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html 
>  > it explain SIGCHLD actions in deep details. 
>  > And that one too: 
>  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html 
> 
> 
> 
> I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs.

Of course, but in case it is FreeBSD bug, it should be fixed in our
native code first before making any changes in Linuxator.

> And I was asking for help with that, not a link to SUSv3 or POSIX. 

In case I was not helpful, sorry for that. Before you try to change
something in Linuxator you need to be sure that FreeBSD does it right
(or wrong, then fix FreeBSD native code first). I am just insisting on
proper steps of fixing it.


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 06:59:46 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93A79B75E33;
 Thu,  7 Jul 2016 06:59:46 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 73EF015F1;
 Thu,  7 Jul 2016 06:59:46 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467874783786638.9918064901652;
 Wed, 6 Jul 2016 23:59:43 -0700 (PDT)
Date: Wed, 06 Jul 2016 23:59:43 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Andrey Chernov" <ache@freebsd.org>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, 
 "Don Lewis" <truckman@freebsd.org>, 
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, 
 "K. Macy" <kmacy@freebsd.org>
Message-ID: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org>
In-Reply-To: <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org>
References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org>
 <201607070443.u674hsgK007808@gw.catspoiler.org>
 <CAHM0Q_O3Apy86DZqDBXnJJ6axMhVPu0xHngoJoHQqKe1QN4+gA@mail.gmail.com>
 <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org>
 <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org>
 <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 06:59:46 -0000


 ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov <ache@freebsd.org> wrote ---- 
 > On 07.07.2016 9:40, Matthew Macy wrote: 
 > >  
 > >  
 > >  
 > >  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov <ache@freebsd.org> wrote ----  
 > >  > On 07.07.2016 7:52, K. Macy wrote:  
 > >  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> wrote:  
 > >  > >   
 > >  > >> On  6 Jul, Matthew Macy wrote:  
 > >  > >>> As a first step towards managing linux user space in a chrooted  
 > >  > >>> /compat/linux, initially for i915 testing with intel gpu tools, later  
 > >  > >>> on to get widevine and steam to work I'm trying to get apt to work.  
 > >  > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now  
 > >  > >>> I'm running in to a bug caused by differences in SIGCHLD handling  
 > >  > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg  
 > >  > >>> and wait on a pipe read. On Linux when dpkg exits the  SIGCHLD to apt  
 > >  > >>> causes a short read on the pipe which lets apt then continue. On  
 > >  > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with  
 > >  > >>> doing a kill -20 <apt pid> to no effect.  
 > >  > >>>  
 > >  > >>> It would be easy enough to check sysvec against linux in pipe_read and  
 > >  > >>> break out of the loop when it's awakened from msleep (assuming there  
 > >  > >>> aren't deeper issues with signal propagation for anything other than  
 > >  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that  
 > >  > >>> anyone who has worked in this area probably has a cleaner solution.  
 > >  > >>  
 > >  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't  
 > >  > >> be in this case.  
 > >  > >   
 > >  > >   
 > >  > >   
 > >  > > Good point.  
 > >  > >   
 > >  > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid  
 > >  > > behavioral difference.  
 > >  >   
 > >  > You better need consult with POSIX before fixing things toward any  
 > >  > Linuxisms blindly in our native code. I don't have a time now to see, is  
 > >  > it really a bug according to POSIX, but please read or just find all  
 > >  > SIGCHLD there:  
 > >  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html  
 > >  > it explain SIGCHLD actions in deep details.  
 > >  > And that one too:  
 > >  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html  
 > >  
 > >  
 > >  
 > > I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs. 
 >  
 > Of course, but in case it is FreeBSD bug, it should be fixed in our 
 > native code first before making any changes in Linuxator. 
 >  
 > > And I was asking for help with that, not a link to SUSv3 or POSIX.  
 >  
 > In case I was not helpful, sorry for that. Before you try to change 
 > something in Linuxator you need to be sure that FreeBSD does it right 
 > (or wrong, then fix FreeBSD native code first). I am just insisting on 
 > proper steps of fixing it. 
 >  


I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD to deliberately interrupt a pipe read is such a weird idiom. I'll test fork vs clone on Linux and see how OS X responds to a SIGCHLD during a pipe read.

Thanks.
-M


 > _______________________________________________ 
 > freebsd-hackers@freebsd.org mailing list 
 > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers 
 > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" 
 > 


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 07:15:03 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6DB66B212D4;
 Thu,  7 Jul 2016 07:15:03 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 47D0F1DC5;
 Thu,  7 Jul 2016 07:15:03 +0000 (UTC)
 (envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
 by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u677EqVx008159;
 Thu, 7 Jul 2016 00:14:56 -0700 (PDT)
 (envelope-from truckman@FreeBSD.org)
Message-Id: <201607070714.u677EqVx008159@gw.catspoiler.org>
Date: Thu, 7 Jul 2016 00:14:52 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD
 breaks apt
To: mmacy@nextbsd.org
cc: ache@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org,
 kmacy@freebsd.org
In-Reply-To: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 07:15:03 -0000

On  6 Jul, Matthew Macy wrote:
> 
> 
> 
>  ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov
>  <ache@freebsd.org> wrote ----
>  > On 07.07.2016 9:40, Matthew Macy wrote: 
>  > >  
>  > >  
>  > >  
>  > >  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov
>  > >  <ache@freebsd.org> wrote ----
>  > >  > On 07.07.2016 7:52, K. Macy wrote:  
>  > >  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org>
>  > >  > > wrote:
>  > >  > >   
>  > >  > >> On  6 Jul, Matthew Macy wrote:  
>  > >  > >>> As a first step towards managing linux user space in a
>  > >  > >>> chrooted
>  > >  > >>> /compat/linux, initially for i915 testing with intel gpu
>  > >  > >>> tools, later on to get widevine and steam to work I'm
>  > >  > >>> trying to get apt to work. I've fixed a number of issues
>  > >  > >>> to date in pseudofs/linprocfs but now I'm running in to
>  > >  > >>> a bug caused by differences in SIGCHLD handling between
>  > >  > >>> Linux and FreeBSD. The situation is that apt will spawn
>  > >  > >>> dpkg and wait on a pipe read. On Linux when dpkg exits
>  > >  > >>> the  SIGCHLD to apt causes a short read on the pipe
>  > >  > >>> which lets apt then continue. On FreeBSD a SIGCHLD is
>  > >  > >>> silently ignored. I've even experimented with doing a
>  > >  > >>> kill -20 <apt pid> to no effect.
>  > >  > >>>  
>  > >  > >>> It would be easy enough to check sysvec against linux in
>  > >  > >>> pipe_read and break out of the loop when it's awakened
>  > >  > >>> from msleep (assuming there aren't deeper issues with
>  > >  > >>> signal propagation for anything other than 
>  > >  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm
>  > >  > >>> assuming that anyone who has worked in this area
>  > >  > >>> probably has a cleaner solution.
>  > >  > >>  
>  > >  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD
>  > >  > >> but shouldn't be in this case.
>  > >  > >   
>  > >  > >   
>  > >  > >   
>  > >  > > Good point.  
>  > >  > >   
>  > >  > > Thinking more about it, this seems like a bug in FreeBSD.
>  > >  > > Not a valid behavioral difference.
>  > >  >   
>  > >  > You better need consult with POSIX before fixing things toward
>  > >  > any Linuxisms blindly in our native code. I don't have a
>  > >  > time now to see, is it really a bug according to POSIX, but
>  > >  > please read or just find all SIGCHLD there:
>  > >  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html  
>  > >  > it explain SIGCHLD actions in deep details.  
>  > >  > And that one too:  
>  > >  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html  
>  > >  
>  > >  
>  > >  
>  > > I was pretty clear in my initial email that I'm only interested
>  > > in changing behavior for Linux programs.
>  >  
>  > Of course, but in case it is FreeBSD bug, it should be fixed in our 
>  > native code first before making any changes in Linuxator. 
>  >  
>  > > And I was asking for help with that, not a link to SUSv3 or POSIX.  
>  >  
>  > In case I was not helpful, sorry for that. Before you try to change 
>  > something in Linuxator you need to be sure that FreeBSD does it
>  > right (or wrong, then fix FreeBSD native code first). I am just
>  > insisting on proper steps of fixing it.
>  >  
> 
> 
> I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD
> to deliberately interrupt a pipe read is such a weird idiom. I'll test
> fork vs clone on Linux and see how OS X responds to a SIGCHLD during a
> pipe read.

It really depends on how signal handling has been set up.  From my
understanding of the FreeBSD man pages and the Open Group documents, the
default handling for SIGCHLD is to just ignore it, in which case it
shouldn't interrupt the pipe read.  If the process has set up a SIGCHLD
signal handler, then what happens with the read should depend on whether
or not SA_RESTART was passed to sigaction().  I would expect that Linux
would be the same as FreeBSD and the Open Group specs.

How does apt set up its handling of SIGCHLD?


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 07:32:09 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97185B2196E
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 07:32:09 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com
 [209.85.215.41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E00F1A56
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 07:32:09 +0000 (UTC)
 (envelope-from mailing-machine@vniz.net)
Received: by mail-lf0-f41.google.com with SMTP id l188so5727152lfe.2
 for <freebsd-hackers@freebsd.org>; Thu, 07 Jul 2016 00:32:09 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=rp2PWzViNTn2dxaIhWKfuSPG+jg9eEQjyzxatq+deAs=;
 b=my/nEhFNk8hdGFBskhXRYw3s6q9dN5zZGUaLbrzkcOpntJnUPqTE7CoQNCqE6fjoAP
 3NdJfCw4pmwN7ZW1h8Tol22x31qmjRaKc/hpi5U6vV61Hddu3U4GnOPPfSgMohLsSinZ
 7ps15qiAdwdLQdVfASC59VlcAbgmtas4Y55n3N/OueWjLmbSXfMUJgJlV4HgUCywauxk
 e2QSLCuZLa+RMj7Ow/gNA2Z1GjPYdfhoapbcc12AVwcMLZf75JBO+umJjJx2DhYE9ap4
 FZr2Tdrw4unhWblOaAG0vNV5wMeSHFMVrp/dvq5UBjw2vB2hAYoInIRF2JuMRKzix+yb
 eAnw==
X-Gm-Message-State: ALyK8tKqpaD8pTkb7HBRjTL6R4RFRdWAHXXzT+91niGyHLpH18763ha11hToRBqQdir8BQ==
X-Received: by 10.25.21.106 with SMTP id l103mr6391802lfi.27.1467876726567;
 Thu, 07 Jul 2016 00:32:06 -0700 (PDT)
Received: from [192.168.1.2] ([89.169.173.68])
 by smtp.gmail.com with ESMTPSA id a199sm750772lfe.35.2016.07.07.00.32.05
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 07 Jul 2016 00:32:06 -0700 (PDT)
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
To: Don Lewis <truckman@FreeBSD.org>, mmacy@nextbsd.org
References: <201607070714.u677EqVx008159@gw.catspoiler.org>
Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, kmacy@freebsd.org
From: Andrey Chernov <ache@freebsd.org>
Message-ID: <325f545e-a32d-59d8-86d3-079ecdf21df2@freebsd.org>
Date: Thu, 7 Jul 2016 10:32:05 +0300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <201607070714.u677EqVx008159@gw.catspoiler.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 07:32:09 -0000

On 07.07.2016 10:14, Don Lewis wrote:
> On  6 Jul, Matthew Macy wrote:
>>
>>
>>
>>  ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov
>>  <ache@freebsd.org> wrote ----
>>  > On 07.07.2016 9:40, Matthew Macy wrote: 
>>  > >  
>>  > >  
>>  > >  
>>  > >  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov
>>  > >  <ache@freebsd.org> wrote ----
>>  > >  > On 07.07.2016 7:52, K. Macy wrote:  
>>  > >  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org>
>>  > >  > > wrote:
>>  > >  > >   
>>  > >  > >> On  6 Jul, Matthew Macy wrote:  
>>  > >  > >>> As a first step towards managing linux user space in a
>>  > >  > >>> chrooted
>>  > >  > >>> /compat/linux, initially for i915 testing with intel gpu
>>  > >  > >>> tools, later on to get widevine and steam to work I'm
>>  > >  > >>> trying to get apt to work. I've fixed a number of issues
>>  > >  > >>> to date in pseudofs/linprocfs but now I'm running in to
>>  > >  > >>> a bug caused by differences in SIGCHLD handling between
>>  > >  > >>> Linux and FreeBSD. The situation is that apt will spawn
>>  > >  > >>> dpkg and wait on a pipe read. On Linux when dpkg exits
>>  > >  > >>> the  SIGCHLD to apt causes a short read on the pipe
>>  > >  > >>> which lets apt then continue. On FreeBSD a SIGCHLD is
>>  > >  > >>> silently ignored. I've even experimented with doing a
>>  > >  > >>> kill -20 <apt pid> to no effect.
>>  > >  > >>>  
>>  > >  > >>> It would be easy enough to check sysvec against linux in
>>  > >  > >>> pipe_read and break out of the loop when it's awakened
>>  > >  > >>> from msleep (assuming there aren't deeper issues with
>>  > >  > >>> signal propagation for anything other than 
>>  > >  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm
>>  > >  > >>> assuming that anyone who has worked in this area
>>  > >  > >>> probably has a cleaner solution.
>>  > >  > >>  
>>  > >  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD
>>  > >  > >> but shouldn't be in this case.
>>  > >  > >   
>>  > >  > >   
>>  > >  > >   
>>  > >  > > Good point.  
>>  > >  > >   
>>  > >  > > Thinking more about it, this seems like a bug in FreeBSD.
>>  > >  > > Not a valid behavioral difference.
>>  > >  >   
>>  > >  > You better need consult with POSIX before fixing things toward
>>  > >  > any Linuxisms blindly in our native code. I don't have a
>>  > >  > time now to see, is it really a bug according to POSIX, but
>>  > >  > please read or just find all SIGCHLD there:
>>  > >  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html  
>>  > >  > it explain SIGCHLD actions in deep details.  
>>  > >  > And that one too:  
>>  > >  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html  
>>  > >  
>>  > >  
>>  > >  
>>  > > I was pretty clear in my initial email that I'm only interested
>>  > > in changing behavior for Linux programs.
>>  >  
>>  > Of course, but in case it is FreeBSD bug, it should be fixed in our 
>>  > native code first before making any changes in Linuxator. 
>>  >  
>>  > > And I was asking for help with that, not a link to SUSv3 or POSIX.  
>>  >  
>>  > In case I was not helpful, sorry for that. Before you try to change 
>>  > something in Linuxator you need to be sure that FreeBSD does it
>>  > right (or wrong, then fix FreeBSD native code first). I am just
>>  > insisting on proper steps of fixing it.
>>  >  
>>
>>
>> I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD
>> to deliberately interrupt a pipe read is such a weird idiom. I'll test
>> fork vs clone on Linux and see how OS X responds to a SIGCHLD during a
>> pipe read.
> 
> It really depends on how signal handling has been set up.  From my
> understanding of the FreeBSD man pages and the Open Group documents, the
> default handling for SIGCHLD is to just ignore it, in which case it
> shouldn't interrupt the pipe read.  If the process has set up a SIGCHLD
> signal handler, then what happens with the read should depend on whether
> or not SA_RESTART was passed to sigaction().  I would expect that Linux
> would be the same as FreeBSD and the Open Group specs.

Linux as SysV derivative was always different regarding to SA_RESTART
and other SA_* flags for signal(), see differences at the end of:
http://linux.die.net/man/2/signal


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 08:31:05 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92755B7686D
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 08:31:05 +0000 (UTC)
 (envelope-from julian@freebsd.org)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "vps1.elischer.org",
 Issuer "CA Cert Signing Authority" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 744C216F3
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 08:31:05 +0000 (UTC)
 (envelope-from julian@freebsd.org)
Received: from Julian-MBP3.local
 (ppp121-45-236-103.lns20.per1.internode.on.net [121.45.236.103])
 (authenticated bits=0)
 by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u678UxYA066256
 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO)
 for <freebsd-hackers@freebsd.org>; Thu, 7 Jul 2016 01:31:02 -0700 (PDT)
 (envelope-from julian@freebsd.org)
Subject: Re: A faulty program corrupts some its data preventing correct core
 generation (Failed to write core file for process postgres (error 14))
To: freebsd-hackers@freebsd.org
References: <CAH7qZfu=XveZCAgS0+dzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
 <20160705114808.GN38613@kib.kiev.ua>
 <CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg@mail.gmail.com>
From: Julian Elischer <julian@freebsd.org>
Message-ID: <39cd0468-8301-06eb-4363-a57b18c60dbb@freebsd.org>
Date: Thu, 7 Jul 2016 16:30:54 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
 Gecko/20100101 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 08:31:05 -0000

On 5/07/2016 10:43 PM, Maxim Sobolev wrote:
> Seems like candidate for the MFC into releng/10.3 and appropriate errata
> entry?
>
> -Max
quite possibly.  it sounds like a problem that needs to be fixed.
>
> On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov <kostikbel@gmail.com>
> wrote:
>
>> On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote:
>>> Hi all, investigating some random postgresql-9.1.21 server crashes on
>>> FreeBSD 10.3, we've started seeing those after upgrading from postgres
>>> 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
>>> unlikely. I suspect that postgres is at fault, however I am also curious
>>> how could it be that kernel is not capable of generating core file when
>>> application does something silly? Is it that some ELF-related data
>>> structures got corrupted or something else? Are we protecting the page
>>> where ELF header is mapped with R/O flag? I am looking at possibly
>>> recreating this by poking around elf header(s), seeing if I can corrupt
>> it
>>> in a similar manner reliably, any pointers or suggestions are
>> appreciated.
>>> Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
>>> postgres (error 14)
>>> Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
>>> signal 11
>>> Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
>>> postgres (error 14)
>>> Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on
>> signal
>>> 11
>>>
>>> #define EFAULT          14              /* Bad address */
>>>
>>> The resulting files are truncated and is not really usable for anything.
>>> We've seen the same issue
>>>
>>> -rw-------    1 pgsql     wheel     1310720 Jun 27 04:10
>> postgres.41361.core
>>> -rw-------    1 pgsql     wheel     1310720 Jul  1 05:21
>> postgres.1722.core
>>> [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
>>> GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
>>> Copyright (C) 2016 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-portbld-freebsd10.3".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from postgres...(no debugging symbols found)...done.
>>> BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core
>> file
>>> size >= 517120000, found: 1310720.
>>> [New LWP 100261]
>>> Core was generated by `postgres'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
>>> (gdb) where
>>> #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
>>> Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
>>> (gdb) q
>>>
>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html
>>
>>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 10:18:12 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5E1EB753F1
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 10:18:12 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 1406C1E7A
 for <freebsd-hackers@FreeBSD.org>; Thu,  7 Jul 2016 10:18:11 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA00458;
 Thu, 07 Jul 2016 13:18:09 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1bL6Nl-000D2Q-OC; Thu, 07 Jul 2016 13:18:09 +0300
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: Paul Koch <paul.koch137@gmail.com>, Andrew Bates <andrewbates09@gmail.com>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CAPi5Lmm6RtXQ6UxzcfoRKtGC-LfBLJAW0qOy6=F5fh3mg-OB5w@mail.gmail.com>
 <20160701113243.307739cc@splash.akips.com>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@FreeBSD.org>
From: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <5ccbe625-1df6-74cb-a3ba-e35182f53a77@FreeBSD.org>
Date: Thu, 7 Jul 2016 13:17:12 +0300
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.1
MIME-Version: 1.0
In-Reply-To: <20160701113243.307739cc@splash.akips.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 10:18:12 -0000

On 01/07/2016 04:32, Paul Koch wrote:
> akips  recordsize            128K                   default

I wonder if setting this to 4K or whatever is a logical block / page
size of your application if it's larger than 4K would help.
The setting has effect only for new files.

-- 
Andriy Gapon

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 14:04:35 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DFB7B74828;
 Thu,  7 Jul 2016 14:04:35 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id ABCE11A5A;
 Thu,  7 Jul 2016 14:04:34 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from tom.home (kib@localhost [127.0.0.1])
 by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u67E4OcV014345
 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Thu, 7 Jul 2016 17:04:24 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u67E4OcV014345
Received: (from kostik@localhost)
 by tom.home (8.15.2/8.15.2/Submit) id u67E4OAn014344;
 Thu, 7 Jul 2016 17:04:24 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 7 Jul 2016 17:04:24 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Don Lewis <truckman@FreeBSD.org>
Cc: mmacy@nextbsd.org, ache@freebsd.org, freebsd-hackers@freebsd.org,
 freebsd-current@freebsd.org, kmacy@freebsd.org
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
Message-ID: <20160707140424.GM38613@kib.kiev.ua>
References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org>
 <201607070714.u677EqVx008159@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201607070714.u677EqVx008159@gw.catspoiler.org>
User-Agent: Mutt/1.6.1 (2016-04-27)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 14:04:35 -0000

On Thu, Jul 07, 2016 at 12:14:52AM -0700, Don Lewis wrote:
> On  6 Jul, Matthew Macy wrote:
> > 
> > 
> > 
> >  ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov
> >  <ache@freebsd.org> wrote ----
> >  > On 07.07.2016 9:40, Matthew Macy wrote: 
> >  > >  
> >  > >  
> >  > >  
> >  > >  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov
> >  > >  <ache@freebsd.org> wrote ----
> >  > >  > On 07.07.2016 7:52, K. Macy wrote:  
> >  > >  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org>
> >  > >  > > wrote:
> >  > >  > >   
> >  > >  > >> On  6 Jul, Matthew Macy wrote:  
> >  > >  > >>> As a first step towards managing linux user space in a
> >  > >  > >>> chrooted
> >  > >  > >>> /compat/linux, initially for i915 testing with intel gpu
> >  > >  > >>> tools, later on to get widevine and steam to work I'm
> >  > >  > >>> trying to get apt to work. I've fixed a number of issues
> >  > >  > >>> to date in pseudofs/linprocfs but now I'm running in to
> >  > >  > >>> a bug caused by differences in SIGCHLD handling between
> >  > >  > >>> Linux and FreeBSD. The situation is that apt will spawn
> >  > >  > >>> dpkg and wait on a pipe read. On Linux when dpkg exits
> >  > >  > >>> the  SIGCHLD to apt causes a short read on the pipe
> >  > >  > >>> which lets apt then continue. On FreeBSD a SIGCHLD is
> >  > >  > >>> silently ignored. I've even experimented with doing a
> >  > >  > >>> kill -20 <apt pid> to no effect.
> >  > >  > >>>  
> >  > >  > >>> It would be easy enough to check sysvec against linux in
> >  > >  > >>> pipe_read and break out of the loop when it's awakened
> >  > >  > >>> from msleep (assuming there aren't deeper issues with
> >  > >  > >>> signal propagation for anything other than 
> >  > >  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm
> >  > >  > >>> assuming that anyone who has worked in this area
> >  > >  > >>> probably has a cleaner solution.
> >  > >  > >>  
> >  > >  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD
> >  > >  > >> but shouldn't be in this case.
> >  > >  > >   
> >  > >  > >   
> >  > >  > >   
> >  > >  > > Good point.  
> >  > >  > >   
> >  > >  > > Thinking more about it, this seems like a bug in FreeBSD.
> >  > >  > > Not a valid behavioral difference.
> >  > >  >   
> >  > >  > You better need consult with POSIX before fixing things toward
> >  > >  > any Linuxisms blindly in our native code. I don't have a
> >  > >  > time now to see, is it really a bug according to POSIX, but
> >  > >  > please read or just find all SIGCHLD there:
> >  > >  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html  
> >  > >  > it explain SIGCHLD actions in deep details.  
> >  > >  > And that one too:  
> >  > >  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html  
> >  > >  
> >  > >  
> >  > >  
> >  > > I was pretty clear in my initial email that I'm only interested
> >  > > in changing behavior for Linux programs.
> >  >  
> >  > Of course, but in case it is FreeBSD bug, it should be fixed in our 
> >  > native code first before making any changes in Linuxator. 
> >  >  
> >  > > And I was asking for help with that, not a link to SUSv3 or POSIX.  
> >  >  
> >  > In case I was not helpful, sorry for that. Before you try to change 
> >  > something in Linuxator you need to be sure that FreeBSD does it
> >  > right (or wrong, then fix FreeBSD native code first). I am just
> >  > insisting on proper steps of fixing it.
> >  >  
> > 
> > 
> > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD
> > to deliberately interrupt a pipe read is such a weird idiom. I'll test
> > fork vs clone on Linux and see how OS X responds to a SIGCHLD during a
> > pipe read.
> 
> It really depends on how signal handling has been set up.  From my
> understanding of the FreeBSD man pages and the Open Group documents, the
> default handling for SIGCHLD is to just ignore it, in which case it
> shouldn't interrupt the pipe read.  If the process has set up a SIGCHLD
> signal handler, then what happens with the read should depend on whether
> or not SA_RESTART was passed to sigaction().  I would expect that Linux
> would be the same as FreeBSD and the Open Group specs.
> 
> How does apt set up its handling of SIGCHLD?

BSD traditional and allowed handling of the signals with SIG_IGN
disposition is to discard such signal at the time of generation. Then,
such signal cannot interrupt a syscall regardless of SA_RESTART.  For
the interruption to work, some signal handler must be installed.

AFAIR both SysV and Linux do not discard ignored signals, but process
them up to the delivery point.

Sure the test demonstrating the difference is required to actually
diagnose and make conclusions.

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 14:26:13 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id ACE4CB74F6D;
 Thu,  7 Jul 2016 14:26:13 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com
 [IPv6:2607:f8b0:4002:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 716B61809;
 Thu,  7 Jul 2016 14:26:13 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x22b.google.com with SMTP id l125so15266370ywb.2;
 Thu, 07 Jul 2016 07:26:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=WNSjvTdBSDtMESYgVMuHn1mA6cAnDXoYPjRyL8goSd4=;
 b=GL+VREkdaiKLGsvPl8zpVaZaXmUaKFXxYpCn8E4fdIkr3JlL5zLb8TAR/pUxqTDZXH
 Notas4vUkj/dB295/9pjV/WERr/L0uvMbpybND3kYgsRA5sxokWwnp6Y7gy2qeHT32Ss
 DVajPHTjTO5wZ7mpZgBcbDlmO8M9mMK4KRFZXdJHghz8K+Pfsw37bz349qbif5Nuu0Do
 iaXeLmzrwjXQCHjeJiuDmTONWyphzxSp8bmcrAR5W4JLcD5sBbA7PJUBOgj9ce1ANSk1
 cc4rFfPgfmHEUjoiF8HdgPeoC6n25CzTeFEWGq/TdEeQnmOilnIIC98bsSbbD7CpgLaR
 +nDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=WNSjvTdBSDtMESYgVMuHn1mA6cAnDXoYPjRyL8goSd4=;
 b=T7pxCx+NDTnizlaGUjVTOLysqIz+gCZwH2lxj285vu5dWy5E8j2jyvoiO9MRuEOzz5
 7CZSV6i8xRUvvg4wq8Kc8Rw2Y8S7ehQ/WTnSI7skkvB5gWgjGySG2eLVna7RxjLEpz/V
 5EpVl9JxPySytasUwZjSU3ikrjBCSq11Ixmx6VuaJaTBXC+dClXqzkQi5ygWYgjPZmXu
 pSmNmE2VDOO8LJjD5CKV6pl/OQR7zNaafIXUOF2vI64MSgUt/HIFLW2lVQob4s/Hse5H
 xx43hEp8yK1LdrNP8N+rE3VSP9ucqHX4sndMB7H5wJPAnUx0QRYXLl0DZe+KVYxYkIIc
 7xYQ==
X-Gm-Message-State: ALyK8tK0GxE+KvzNG6H4aHsewnYkA34HvMo2wofG60dSuY/bBsFIdfSTDrCE4Sn03hpKYBhMxJ9Gvg1H6OdozA==
X-Received: by 10.37.205.130 with SMTP id d124mr408294ybf.181.1467901572542;
 Thu, 07 Jul 2016 07:26:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Thu, 7 Jul 2016 07:26:10 -0700 (PDT)
In-Reply-To: <20160707001218.GI38613@kib.kiev.ua>
References: <CAM9edeOek_zqRPt-0vDMNMK9CH31yAeVPAirWVvcuUWy5xsm4A@mail.gmail.com>
 <CAM9edeN1Npc=cNth2gAk1XFLvar-jZqzxWX50pLQVxDusMrOVg@mail.gmail.com>
 <20160706151822.GC38613@kib.kiev.ua>
 <CAM9edeMDdjO6C2BRXBxDV-trUG5A0NEua+K0H_wERq7H4AR72g@mail.gmail.com>
 <CAM9edePfMxm26yYC=o10CGhRSDUHXTTNosFc_T89v4Pxt0JM0g@mail.gmail.com>
 <20160706173758.GF38613@kib.kiev.ua>
 <CAM9edeOb0yUqaXbTMGBJVFqgJ++yaDr4tGV1TQ_UPOYmv4p2fw@mail.gmail.com>
 <20160707001218.GI38613@kib.kiev.ua>
From: David Cross <dcrosstech@gmail.com>
Date: Thu, 7 Jul 2016 10:26:10 -0400
Message-ID: <CAM9edePjo+UnWSzHLrcbsw0-5Z6y7xcbWB5eg1fak+zqbZWndQ@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST)
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Thu, 07 Jul 2016 15:42:28 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 14:26:13 -0000

The state was printed after the panic, yes.

If I understand the idea of softupdates correctly, I don't think its odd
this buffer wasn't even attempted to be written, it has b_dep defined, that
means those blocks should be written first, right?

Also, I was just able to reproduce this on 11.0-ALPHA6, I did a fresh fsck
on the filesystem to ensure it was clean (I typically don't fsck between
reprouction runs, since it takes so long, and when I do need a 'clean'
slate I just restore the snapshot, its faster than fsck).

The panic from 11.0-ALPHA6 is:

root@bhyve103:~ # panic: softdep_deallocate_dependencies: dangling deps
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe011b3861b0
vpanic() at vpanic+0x182/frame 0xfffffe011b386230
panic() at panic+0x43/frame 0xfffffe011b386290
softdep_deallocate_dependencies() at
softdep_deallocate_dependencies+0x71/frame 0xfffffe011b3862b0
brelse() at brelse+0x162/frame 0xfffffe011b386310
bufwrite() at bufwrite+0x206/frame 0xfffffe011b386360
ffs_write() at ffs_write+0x3ed/frame 0xfffffe011b386410
VOP_WRITE_APV() at VOP_WRITE_APV+0x16f/frame 0xfffffe011b386520
vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x2d5/frame
0xffffe011b3865f0
VOP_PUTPAGES_APV() at VOP_PUTPAGES_APV+0xda/frame 0xfffffe011b386620
vnode_pager_putpages() at vnode_pager_putpages+0x89/frame 0xfffffe011b386690
vm_pageout_flush() at vm_pageout_flush+0x12d/frame 0xfffffe011b386720
vm_object_page_collect_flush() at vm_object_page_collect_flush+0x23a/frame
0xffffe011b386820
vm_object_page_clean() at vm_object_page_clean+0x1be/frame
0xfffffe011b3868a0
vm_object_terminate() at vm_object_terminate+0xa5/frame 0xfffffe011b3868e0
vnode_destroy_vobject() at vnode_destroy_vobject+0x63/frame
0xfffffe011b386910
ufs_reclaim() at ufs_reclaim+0x1f/frame 0xfffffe011b386940
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xda/frame 0xfffffe011b386970
vgonel() at vgonel+0x204/frame 0xfffffe011b3869e0
vnlru_proc() at vnlru_proc+0x577/frame 0xfffffe011b386a70
fork_exit() at fork_exit+0x84/frame 0xfffffe011b386ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe011b386ab0


Pardon the machine name, I have a setup script for bhyve VMs, and I didn't
tweak the name, just the install location:
root@bhyve103:~ # uname -a
FreeBSD bhyve103.priv.dcrosstech.com 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #0
r302303:
 Fri Jul  1 03:32:49 UTC 2016
root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC
amd64


On the 10.3 kernel I was also able to walk the mnt_nvnodes list before the
FS panic and I have the vnode * saved from before the vnlru attempted
reclaim.

print *((struct vnode *)0xfffff80002dc2760)
$6 = {v_tag = 0xffffffff8072b891 "ufs", v_op = 0xffffffff80a13c40,
  v_data = 0xfffff8006a20b160, v_mount = 0xfffff800024e9cc0, v_nmntvnodes =
{
    tqe_next = 0xfffff80002dc2588, tqe_prev = 0xfffff80002dc2958}, v_un = {
    vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0},
  v_hashlist = {le_next = 0x0, le_prev = 0xfffffe0000932ef8}, v_cache_src =
{
    lh_first = 0x0}, v_cache_dst = {tqh_first = 0xfffff8006a18ce00,
    tqh_last = 0xfffff8006a18ce20}, v_cache_dd = 0x0, v_lock = {lock_object
= {
      lo_name = 0xffffffff8072b891 "ufs", lo_flags = 117112832, lo_data =
0,
      lo_witness = 0xfffffe0000607280}, lk_lock = 1, lk_exslpfail = 0,
    lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {
      lo_name = 0xffffffff8074a89f "vnode interlock", lo_flags = 16973824,
      lo_data = 0, lo_witness = 0xfffffe00005fd680}, mtx_lock = 4},
  v_vnlock = 0xfffff80002dc27c8, v_actfreelist = {
    tqe_next = 0xfffff80002dc2938, tqe_prev = 0xfffff80002dc2648}, v_bufobj
= {
    bo_lock = {lock_object = {lo_name = 0xffffffff80754d34 "bufobj
interlock",
        lo_flags = 86179840, lo_data = 0, lo_witness = 0xfffffe0000605700},
      rw_lock = 1}, bo_ops = 0xffffffff809e97c0,
    bo_object = 0xfffff80002c1b400, bo_synclist = {
      le_next = 0xfffff80002dc2a08, le_prev = 0xfffff80002dc2688},
    bo_private = 0xfffff80002dc2760, __bo_vnode = 0xfffff80002dc2760,
    bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80002dc2880},
      bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {
        tqh_first = 0xfffffe00f7ae8658, tqh_last = 0xfffffe00f7ae86a8},
      bv_root = {pt_root = 18446741878841706297}, bv_cnt = 1},
    bo_numoutput = 0, bo_flag = 1, bo_bsize = 16384}, v_pollinfo = 0x0,
  v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0,
      tqh_last = 0xfffff80002dc28e8}, rl_currdep = 0x0}, v_cstart = 0,
  v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 2, v_usecount = 0,
  v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 18236560,
  v_type = VREG}

I think what is wanted is the buffer and their dependency lists.. I am not
sure where those are under all of this.. bo_*?

On Wed, Jul 6, 2016 at 8:12 PM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Wed, Jul 06, 2016 at 02:21:20PM -0400, David Cross wrote:
> > (kgdb) up 5
> > #5  0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428
> > 428                     (*bioops.io_deallocate)(bp);
> > Current language:  auto; currently minimal
> > (kgdb) p/x *(struct buf *)0xfffffe00f77457d0
> > $1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0,
> >   b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags =
> > 0x0,
> >   b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400,
> >   b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev =
> > 0xfffff80002e884d0},
> >   b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28,
> >     tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800,
> >   b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b,
> >       lo_flags = 0x6730000, lo_data = 0x0, lo_witness =
> > 0xfffffe0000602f00},
> >     lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0,
> >     lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0,
> >   b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000,
> >   b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0,
> >   b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0,
> b_pager
> > = {
> >     pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0,
> >       tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}},
> >   b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180,
> >     0xfffff800b99b31e8, 0x0 <repeats 28 times>}, b_npages = 0x4, b_dep =
> {
> >     lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 =
> 0x0,
> >   b_fsprivate3 = 0x0, b_pin_count = 0x0}
> >
> >
> > This is the freshly allocated buf that causes the panic; is this what is
> > needed?  I "know" which vnode will cause the panic on vnlru cleanup, but
> I
> > don't know how to walk the memory list without a 'hook'.. as in, i can
> > setup the kernel in a state that I know will panic when the vnode is
> > cleaned up, I can force a panic 'early' (kill -9 1), and then I could get
> > that vnode.. if I could get the vnode list to walk.
>
> Was the state printed after the panic occured ?  What is strange is that
> buffer was not even tried for i/o, AFAIS.  Apart from empty
> b_error/b_iocmd,
> the b_lblkno is zero, which means that the buffer was never allocated on
> the disk.
>
> The b_blkno looks strangely high.  Can you print *(bp->b_vp) ?  If it is
> UFS vnode, do p *(struct inode)(<vnode>->v_data).  I am esp. interested
> in the vnode size.
>
> Can you reproduce the problem on HEAD ?
>

From owner-freebsd-hackers@freebsd.org  Thu Jul  7 18:05:12 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E26AB75016;
 Thu,  7 Jul 2016 18:05:12 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 7F42A128F;
 Thu,  7 Jul 2016 18:05:12 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467914710797328.50271387470593;
 Thu, 7 Jul 2016 11:05:10 -0700 (PDT)
Date: Thu, 07 Jul 2016 11:05:10 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Konstantin Belousov" <kostikbel@gmail.com>
Cc: "Don Lewis" <truckman@FreeBSD.org>, "" <ache@freebsd.org>, 
 "" <freebsd-hackers@freebsd.org>, "" <freebsd-current@freebsd.org>
Message-ID: <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org>
In-Reply-To: <20160707140424.GM38613@kib.kiev.ua>
References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org>
 <201607070714.u677EqVx008159@gw.catspoiler.org>
 <20160707140424.GM38613@kib.kiev.ua>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 18:05:12 -0000


 ---- On Thu, 07 Jul 2016 07:04:24 -0700 Konstantin Belousov <kostikbel@gmail.com> wrote ---- 
 > On Thu, Jul 07, 2016 at 12:14:52AM -0700, Don Lewis wrote: 
 > > On  6 Jul, Matthew Macy wrote: 
 > > >  
 > > >  
 > > >  
 > > >  ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov 
 > > >  <ache@freebsd.org> wrote ---- 
 > > >  > On 07.07.2016 9:40, Matthew Macy wrote:  
 > > >  > >   
 > > >  > >   
 > > >  > >   
 > > >  > >  ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov 
 > > >  > >  <ache@freebsd.org> wrote ---- 
 > > >  > >  > On 07.07.2016 7:52, K. Macy wrote:   
 > > >  > >  > > On Wednesday, July 6, 2016, Don Lewis <truckman@freebsd.org> 
 > > >  > >  > > wrote: 
 > > >  > >  > >    
 > > >  > >  > >> On  6 Jul, Matthew Macy wrote:   
 > > >  > >  > >>> As a first step towards managing linux user space in a 
 > > >  > >  > >>> chrooted 
 > > >  > >  > >>> /compat/linux, initially for i915 testing with intel gpu 
 > > >  > >  > >>> tools, later on to get widevine and steam to work I'm 
 > > >  > >  > >>> trying to get apt to work. I've fixed a number of issues 
 > > >  > >  > >>> to date in pseudofs/linprocfs but now I'm running in to 
 > > >  > >  > >>> a bug caused by differences in SIGCHLD handling between 
 > > >  > >  > >>> Linux and FreeBSD. The situation is that apt will spawn 
 > > >  > >  > >>> dpkg and wait on a pipe read. On Linux when dpkg exits 
 > > >  > >  > >>> the  SIGCHLD to apt causes a short read on the pipe 
 > > >  > >  > >>> which lets apt then continue. On FreeBSD a SIGCHLD is 
 > > >  > >  > >>> silently ignored. I've even experimented with doing a 
 > > >  > >  > >>> kill -20 <apt pid> to no effect. 
 > > >  > >  > >>>   
 > > >  > >  > >>> It would be easy enough to check sysvec against linux in 
 > > >  > >  > >>> pipe_read and break out of the loop when it's awakened 
 > > >  > >  > >>> from msleep (assuming there aren't deeper issues with 
 > > >  > >  > >>> signal propagation for anything other than  
 > > >  > >  > >>> SIGINT/SIGKILL) and then do a short read. However, I'm 
 > > >  > >  > >>> assuming that anyone who has worked in this area 
 > > >  > >  > >>> probably has a cleaner solution. 
 > > >  > >  > >>   
 > > >  > >  > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD 
 > > >  > >  > >> but shouldn't be in this case. 
 > > >  > >  > >    
 > > >  > >  > >    
 > > >  > >  > >    
 > > >  > >  > > Good point.   
 > > >  > >  > >    
 > > >  > >  > > Thinking more about it, this seems like a bug in FreeBSD. 
 > > >  > >  > > Not a valid behavioral difference. 
 > > >  > >  >    
 > > >  > >  > You better need consult with POSIX before fixing things toward 
 > > >  > >  > any Linuxisms blindly in our native code. I don't have a 
 > > >  > >  > time now to see, is it really a bug according to POSIX, but 
 > > >  > >  > please read or just find all SIGCHLD there: 
 > > >  > >  > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html   
 > > >  > >  > it explain SIGCHLD actions in deep details.   
 > > >  > >  > And that one too:   
 > > >  > >  > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html   
 > > >  > >   
 > > >  > >   
 > > >  > >   
 > > >  > > I was pretty clear in my initial email that I'm only interested 
 > > >  > > in changing behavior for Linux programs. 
 > > >  >   
 > > >  > Of course, but in case it is FreeBSD bug, it should be fixed in our  
 > > >  > native code first before making any changes in Linuxator.  
 > > >  >   
 > > >  > > And I was asking for help with that, not a link to SUSv3 or POSIX.   
 > > >  >   
 > > >  > In case I was not helpful, sorry for that. Before you try to change  
 > > >  > something in Linuxator you need to be sure that FreeBSD does it 
 > > >  > right (or wrong, then fix FreeBSD native code first). I am just 
 > > >  > insisting on proper steps of fixing it. 
 > > >  >   
 > > >  
 > > >  
 > > > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD 
 > > > to deliberately interrupt a pipe read is such a weird idiom. I'll test 
 > > > fork vs clone on Linux and see how OS X responds to a SIGCHLD during a 
 > > > pipe read. 
 > >  
 > > It really depends on how signal handling has been set up.  From my 
 > > understanding of the FreeBSD man pages and the Open Group documents, the 
 > > default handling for SIGCHLD is to just ignore it, in which case it 
 > > shouldn't interrupt the pipe read.  If the process has set up a SIGCHLD 
 > > signal handler, then what happens with the read should depend on whether 
 > > or not SA_RESTART was passed to sigaction().  I would expect that Linux 
 > > would be the same as FreeBSD and the Open Group specs. 
 > >  
 > > How does apt set up its handling of SIGCHLD? 
 >  
 > BSD traditional and allowed handling of the signals with SIG_IGN 
 > disposition is to discard such signal at the time of generation. Then, 
 > such signal cannot interrupt a syscall regardless of SA_RESTART.  For 
 > the interruption to work, some signal handler must be installed. 
 >  
 > AFAIR both SysV and Linux do not discard ignored signals, but process 
 > them up to the delivery point. 
 >  
 > Sure the test demonstrating the difference is required to actually 
 > diagnose and make conclusions. 


Unsurprisingly I may have misinterpreted the trace.

John observes:
Alternatively, if apt is creating a pipe() that it passes to dpkg() via fork() and apt 
only creates the read end opened and dpkg only keeps the write end up opened, then when 
dpkg exits, the pipe_read should return EOF when dpkg exits (that is normally the way pipes 
are used to detect child exit rather than EINTR from SIGCLD). 

The SIGCHLD may be a red herring as strace will report it even if it is ignored. What John describes is borne out by the traces.

FreeBSD from pipe creation to dpkg exit and apt hang
http://pastebin.com/TGRrMniD

Linux from pipe creation to dpkg exit and apt continue
http://pastebin.com/wPfd31Pf


-M


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 22:32:35 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E8DCB82A69
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 22:32:35 +0000 (UTC) (envelope-from yuri@rawbw.com)
Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 8E7731CAF
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 22:32:35 +0000 (UTC)
 (envelope-from yuri@rawbw.com)
Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net
 [24.5.143.190]) (authenticated bits=0)
 by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u67MWTFY074892
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO)
 for <freebsd-hackers@freebsd.org>; Thu, 7 Jul 2016 15:32:29 -0700 (PDT)
 (envelope-from yuri@rawbw.com)
X-Authentication-Warning: shell1.rawbw.com: Host
 c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be
 yuri.doctorlan.com
Subject: Re: Why kinfo_getvmmap is sometimes so expensive?
References: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
 <20160707001913.GJ38613@kib.kiev.ua>
To: Freebsd hackers list <freebsd-hackers@freebsd.org>
From: Yuri <yuri@rawbw.com>
Message-ID: <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com>
Date: Thu, 7 Jul 2016 15:32:28 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.1.0
MIME-Version: 1.0
In-Reply-To: <20160707001913.GJ38613@kib.kiev.ua>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 22:32:35 -0000

On 07/06/2016 17:19, Konstantin Belousov wrote:
> To calculate residency count for the process map entries, kernel has to
> iterate over all pages.  This operation was somewhat optimized in 10.3
> and HEAD, particularly for the large sparce mappings.  But for large populated
> mappings there is no other way then to check each page.
>
> You may confirm my hypothesis by setting sysctl
> kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU
> consumption changed.  Of course, you will not get the resident count
> in the returned structure, after the knob is tweaked.


When people raise the question of why malloc library doesn't unmap the 
memory, developers there usually say that they call madvise(MADV_FREE) 
and this is as good as unmap. But this example shows that this isn't 
quite the case on the FreeBSD, and unmapping is better.


Yuri


From owner-freebsd-hackers@freebsd.org  Thu Jul  7 23:20:15 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7F10B8249A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu,  7 Jul 2016 23:20:15 +0000 (UTC)
 (envelope-from cedric.blancher@gmail.com)
Received: from mail-pa0-x22a.google.com (mail-pa0-x22a.google.com
 [IPv6:2607:f8b0:400e:c03::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 87D6C17F6
 for <freebsd-hackers@freebsd.org>; Thu,  7 Jul 2016 23:20:15 +0000 (UTC)
 (envelope-from cedric.blancher@gmail.com)
Received: by mail-pa0-x22a.google.com with SMTP id uj8so9857139pab.3
 for <freebsd-hackers@freebsd.org>; Thu, 07 Jul 2016 16:20:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=n9qqVM9YdRvf66WYZCPZ2tOqYFzahK2zysWrJZsTtKA=;
 b=Wi50pxXx4K0f4dbVKFX2mE3BfNIwCK105o/twiShCyDoA0BQNNNtDPT+x41BSVQBlM
 vxB2/PPP0W/BMOev0bIPHKYQHsh9zDmtEV+CAylI3SlKi56Qmzr12U2ZHjVyG93F1Cpm
 iItpmxtJ3k1CZaV6SBVnPGIYvpAfuz1LATAUUQy0e3XYrws/7gNwulrU7wKlpoY9kDdN
 JZcmxxDdlLNUn3Jm72uc1HFpDEjw4wtvNaXeGUwwZ4hL0ZeHC3qkXkO5UYC8+lVGXL3q
 IRacRtAQZuSxlp2+0yF38TFGURuujfRg7DSyzsONzufWWwVEQL/W0Cd1sFDbIOauMu+A
 vl8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=n9qqVM9YdRvf66WYZCPZ2tOqYFzahK2zysWrJZsTtKA=;
 b=M8uu+NAXGdO4iFIlzZopJFiR5KvyrNjpXUpdlq/Q2Cd8v0VG+0F+qzbINMkoiy79Gp
 AkeNsDxK1bo3x8GxKSsbfDUgfyaBdJpNpfirc20TLtsZ9SWQkamBMndYVtrcMZ7pQs6k
 vcOeAwDwT/pMq66EeWw2wn+KyUKr+XNvydikdEq8WpqxrXzOUSFXcRJIzGvOzzXuLBtF
 t/8j0olD+NhOeFhSVZu3VGFurwKw2FXiMNWVSSf1lWf4Z/cn1fZXKM1MxZ1CyLM3lalp
 l+Sp5MXaKIAc100VYtqyObkoeO8pdegZ1pjMP8ZMlI1ZidQxkucX2VVf5xoyekPUaXIR
 PAxw==
X-Gm-Message-State: ALyK8tJl4VTdNj7xn/HyomnzYutGKWaaY0Ry3bErkmRM7IVXZ6WuYOZtm/ESpALj9NwmgMD5o/JIULyktR+BmQ==
X-Received: by 10.66.76.10 with SMTP id g10mr4627821paw.110.1467933615034;
 Thu, 07 Jul 2016 16:20:15 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.66.173.8 with HTTP; Thu, 7 Jul 2016 16:20:14 -0700 (PDT)
In-Reply-To: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
References: <20160630140625.3b4aece3@splash.akips.com>
 <CALXu0UfxRMnaamh+po5zp=iXdNUNuyj+7e_N1z8j46MtJmvyVA@mail.gmail.com>
 <20160703123004.74a7385a@splash.akips.com>
 <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org>
 <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net>
From: Cedric Blancher <cedric.blancher@gmail.com>
Date: Fri, 8 Jul 2016 01:20:14 +0200
Message-ID: <CALXu0UexG1G6ozZ+-QOpO168fT5n=L+yfKLJTzyRMWbCu6BjEg@mail.gmail.com>
Subject: Re: ZFS ARC and mmap/page cache coherency question
To: Karl Denninger <karl@denninger.net>
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>,
 illumos-dev <developer@lists.illumos.org>, 
 "Garrett D'Amore" <garrett@damore.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jul 2016 23:20:15 -0000

I think Garrett D'Amore <garrett@damore.org> had some ideas about the
VM<---->ZFS communication and double/multicaching issues too.

Ced

On 3 July 2016 at 17:43, Karl Denninger <karl@denninger.net> wrote:
>
> On 7/3/2016 02:45, Matthew Macy wrote:
>>
>>             Cedric greatly overstates the intractability of resolving it=
. Nonetheless, since the initial import very little has been done to improv=
e integration, and I don't know of anyone who is up to the task taking an i=
nterest in it. Consequently, mmap() performance is likely "doomed" for the =
foreseeable future.-M----
>
> Wellllll....
>
> I've done a fair bit of work here (see
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the
> political issues are at least as bad as the coding ones.
>
> In short what Cedric says about the root of the issue is real.  VM is
> really-well implemented for what it handles, but the root of the issue
> is that while the UFS data cache is part of VM and thus it "knows" about
> it, ZFS is not because it is a "bolt-on."  UMA leads to further (severe)
> complications for certain workloads.
>
> Finally the underlying ZFS dmu_tx sizing code is just plain wrong and in
> fact this is one of the biggest issues as when the system runs into
> trouble it can take a bad situation and make it a *lot* worse.  There is
> only one write-back cache maintained instead of one per zvol, and that's
> flat-out broken.  Being able to re-order async writes to disk (where
> fsync() has not been called) and minimizing seek latency is excellent.
> Sadly rotating media these days sabotages much of this due to opacity
> introduced at the drive level (e.g. varying sector counts per track,
> etc) but it can still help.  But where things go dramatically wrong is
> on a system where a large write-back cache is allocated relative to the
> underlying zvol I/O performance (this occurs on moderately-large and
> bigger RAM systems) with moderate numbers of modest-performance rotating
> media; in this case it is entirely possible for a flush of the write
> buffers to require upwards of a *minute* to complete, during which all
> other writes block.  If this happens during periods of high RAM demand
> and you manage to trigger a page-out at the same time system performance
> will go straight into the toilet.  I have seen instances where simply
> trying to edit a text file with vi (or a "select" against a database
> table) will hang for upwards of a minute leading you to believe the
> system has crashed, when it fact it has not.
>
> The interaction of VM with the above can lead to severe pathological
> behavior because the VM system has no way to tell the ZFS subsystem to
> pare back ARC (and at least as important, perhaps more-so -- unused but
> allocated UMA) when memory pressure exists *before* it pages.  ZFS tries
> to detect memory pressure and do this itself but it winds up competing
> with the VM system.  This leads to demonstrably wrong behavior because
> you never want to hold disk cache in preference to RSS; if you have a
> block of data from the disk the best case is you avoid one I/O (to
> re-read it); if you page you are *guaranteed* to take one I/O (to write
> the paged-out RSS to disk) and *might* take two (if you then must read
> it back in.)
>
> In short trading the avoidance of one *possible* I/O for a *guaranteed*
> I/O and a second possible one is *always* a net lose.
>
> To "fix" all of this "correctly" (for all cases, instead of certain
> cases) VM would have to "know" about ARC and its use of UMA, along with
> being able to police both.  ZFS also must have the dmu_tx writeback
> cache sized per-zvol with its size chosen by the actual I/O performance
> characteristics of the disks in the zvol itself.  I've looked into doing
> both and it's fairly complex, and what's worse is that it would
> effectively "marry" VM and ZFS, removing the "bolt-on" aspect of
> things.  This then leads to a lot of maintenance work over time because
> any time ZFS code changes (and it does, quite a bit) you then have to go
> back through that process in order to become coherent with Illumos.
>
> The PR above resolved (completely) the issues I was having along with a
> number of other people on 10.x and before (I've not yet rolled it
> forward to 11.) but it's quite clearly a hack of sorts, in that it
> detects and treats symptoms (e.g. dynamic TX cache size modification,
> etc) rather than integrating VM and ZFS cache management.
>
> --
> Karl Denninger
> karl@denninger.net <mailto:karl@denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/


--=20
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

From owner-freebsd-hackers@freebsd.org  Fri Jul  8 07:52:41 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2CCF9B7539B;
 Fri,  8 Jul 2016 07:52:41 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from sender163-mail.zoho.com (sender163-mail.zoho.com
 [74.201.84.163])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F2FBC1C66;
 Fri,  8 Jul 2016 07:52:40 +0000 (UTC)
 (envelope-from mmacy@nextbsd.org)
Received: from mail.zoho.com by mx.zohomail.com
 with SMTP id 1467964358050811.143266695883;
 Fri, 8 Jul 2016 00:52:38 -0700 (PDT)
Date: Fri, 08 Jul 2016 00:52:38 -0700
From: Matthew Macy <mmacy@nextbsd.org>
To: "Matthew Macy" <mmacy@nextbsd.org>
Cc: "Konstantin Belousov" <kostikbel@gmail.com>, 
 "" <freebsd-hackers@freebsd.org>, "Don Lewis" <truckman@FreeBSD.org>, 
 "" <freebsd-current@freebsd.org>, "" <ache@freebsd.org>
Message-ID: <155c97e7d70.126966a3c142756.8632532805949896728@nextbsd.org>
In-Reply-To: <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org>
References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org>
 <201607070714.u677EqVx008159@gw.catspoiler.org>
 <20160707140424.GM38613@kib.kiev.ua>
 <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org>
Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks
 apt
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Priority: Medium
User-Agent: Zoho Mail
X-Mailer: Zoho Mail
X-ZohoMail: Z_57973067 SPT_1 Z_57973066 SPT_1 SLF_D
X-Zoho-Virus-Status: 2
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 07:52:41 -0000


 > Unsurprisingly I may have misinterpreted the trace. 
 >  
 > John observes: 
 > Alternatively, if apt is creating a pipe() that it passes to dpkg() via fork() and apt  
 > only creates the read end opened and dpkg only keeps the write end up opened, then when  
 > dpkg exits, the pipe_read should return EOF when dpkg exits (that is normally the way pipes  
 > are used to detect child exit rather than EINTR from SIGCLD).  
 >  
 > The SIGCHLD may be a red herring as strace will report it even if it is ignored. What John describes is borne out by the traces. 
 >  
 > FreeBSD from pipe creation to dpkg exit and apt hang 
 > http://pastebin.com/TGRrMniD 
 >  
 > Linux from pipe creation to dpkg exit and apt continue 
 > http://pastebin.com/wPfd31Pf 
 
It turns out that this was footshooting. In my changes to linprocfs the <pid>/fd directory was holding additional references to the struct file pointers which prevented apt from getting an EOF when dpkg exited.

Thanks to all who commented. 

FWIW, after fixing the previous issue and then linux_mremap to be able to grow a mapping apt works now:

root@planecrash:/home/mmacy # chroot /compat/linux/ apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu xenial InRelease
Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Hit:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease                     
Fetched 94.5 kB in 1s (56.3 kB/s)                                                   
Reading package lists... Done

I don't think this is all that useful until I update / implement any system calls to get steam / widevine whatever working, but in case anyone cares this is all going on in the drm-next-4.6 branch alongside the graphics work.

-M


From owner-freebsd-hackers@freebsd.org  Fri Jul  8 10:55:10 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B3DEB82A18
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Jul 2016 10:55:10 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: from puchar.net (puchar.net [194.1.144.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "puchar.net", Issuer "puchar.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 94FFD11BB
 for <freebsd-hackers@freebsd.org>; Fri,  8 Jul 2016 10:55:09 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: Received: from 127.0.0.1 (localhost [127.0.0.1])
 by puchar.net (8.15.2/8.14.9) with ESMTPS id u68Asxn9002790
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-hackers@freebsd.org>; Fri, 8 Jul 2016 12:54:59 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from laptop.wojtek.intra (localhost [127.0.0.1])
 by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u68At2kB000865
 for <freebsd-hackers@freebsd.org>; Fri, 8 Jul 2016 12:55:02 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from localhost (wojtek@localhost)
 by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u68AsvCs000862
 for <freebsd-hackers@freebsd.org>; Fri, 8 Jul 2016 12:54:57 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs
Date: Fri, 8 Jul 2016 12:54:57 +0200 (CEST)
From: Wojciech Puchar <wojtek@puchar.net>
X-X-Sender: wojtek@laptop.wojtek.intra
To: freebsd-hackers@freebsd.org
Subject: help with onboard LAN
Message-ID: <alpine.BSF.2.20.1607081251001.854@laptop.wojtek.intra>
User-Agent: Alpine 2.20 (BSF 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (puchar.net [10.0.1.1]); Fri, 08 Jul 2016 12:54:59 +0200 (CEST)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 10:55:10 -0000

my supermicro-rebranded server is specified as having 2 1Gb/s ethernet 
ports onboard

what actually is:

ix0@pci0:3:0:0: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Ethernet Controller 10-Gigabit X540-AT2'
     class      = network
     subclass   = ethernet
ix1@pci0:3:0:1: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Ethernet Controller 10-Gigabit X540-AT2'
     class      = network
     subclass   = ethernet


which is strange

card is autodetected with ixgbe driver under FreeBSD 10 and i put

device miibus
device ixgbe

in my custom kernel.

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> 
port 0xe020-0xe03f mem 0xfbc00000-0xfbdfffff,0xfbe04000-0xfbe07fff irq 42 
at device 0.0 on pci3
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: 0c:c4:7a:6e:7e:9e
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> 
port 0xe000-0xe01f mem 0xfba00000-0xfbbfffff,0xfbe00000-0xfbe03fff irq 45 
at device 0.1 on pci3
ix1: Using MSIX interrupts with 9 vectors
ix1: Ethernet address: 0c:c4:7a:6e:7e:9f
ix1: PCI Express Bus: Speed 5.0GT/s Width x8

And it works.

But seems i have autonegotiation problem with gigabit switch - 
it connects at 100Mb/s

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500

options=8407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
         ether 0c:c4:7a:6e:7e:9e
         inet 194.1.144.90 netmask 0xfffffff8 broadcast 194.1.144.95
         inet 194.1.144.91 netmask 0xfffffff8 broadcast 194.1.144.95
         media: Ethernet autoselect (100baseTX <full-duplex>)
         status: active


i tried

ifconfig ix0 media 1000baseT

but it shows error.


Any idea what i really have in my server and how to manually set it up to 
1Gb/s

Seems like no phy is detected.

From owner-freebsd-hackers@freebsd.org  Fri Jul  8 12:03:03 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id EE8D3B8250A
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Jul 2016 12:03:03 +0000 (UTC)
 (envelope-from rwmaillists@googlemail.com)
Received: from mail-qk0-x244.google.com (mail-qk0-x244.google.com
 [IPv6:2607:f8b0:400d:c09::244])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A71E41AC1
 for <freebsd-hackers@freebsd.org>; Fri,  8 Jul 2016 12:03:03 +0000 (UTC)
 (envelope-from rwmaillists@googlemail.com)
Received: by mail-qk0-x244.google.com with SMTP id r68so8132632qka.3
 for <freebsd-hackers@freebsd.org>; Fri, 08 Jul 2016 05:03:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=googlemail.com; s=20120113;
 h=date:from:to:subject:message-id:in-reply-to:references:mime-version
 :content-transfer-encoding;
 bh=+8WPtPHos/Yoz950VJ4JznavMv6epPfdv3QnK2qd3pQ=;
 b=IIP3I39P3FTF6HCmQx9rhUfrB/vRVrXTMIEoEEKRFj4zQpfONI2VckolpdVJlhmLNv
 Yr09Qr/jYjmKoIqsldUkvsJhnR0GD8eIyKFBkWf3TxJnoV0ChZmoT/Znkth6Kd48tSOv
 jpgCR/S3rS23v4HWgxQFMF9yLoJr3jsZUHrRZo1BUx4333O+veDaemFVacALEEwWcUXb
 urcCo7UbgwN82VW3agOdza7bIMtdlwBD5DxPGlqcqLRkLWIKDf0FsxNdee5YfaE3mO47
 Z8tIdeYxXKV+fQCH31v+WnR14UOFjy94fZ4ivLgPccFAxcNT+RdIQuCWFHKrpJUtlA2M
 Z4MA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=+8WPtPHos/Yoz950VJ4JznavMv6epPfdv3QnK2qd3pQ=;
 b=kxUWgP3/3GC6rUqVmoIEpk1W0gClKVs9VSbBT5JtDQMzPmbl83xNvCDOu/ankU+euc
 btmOIC6OVGzwqR6z8YRbKFlN7oclsd9BFx1JA1rWiE22wDIOBj0dvnRSJgWlcqGWGqsD
 svXH9yG71wupmwhAsUN2MMC1euB60jni80X4IgFE7aJJ7RnjVpaak+V+ATq29fsFJyMc
 fD9IODOLgGxqa4DugOkwsyk8MxrkEOn0j7HycEhEFPq51Hik3/grn+mQr6+Ht9Qn2s8S
 TdKjafHcO5eKxpLilRpiLJawHmkwxFQTwV79QZGxjssDG4NP06i1U0N1NlbBnqsLf0/u
 Kj+A==
X-Gm-Message-State: ALyK8tL+QRlxMBZudMNzJQeZF0TN9McvcE7LOVVAK9+ehPBSg+gDIIuWiFGEicOGSl1Eww==
X-Received: by 10.194.200.100 with SMTP id jr4mr5025403wjc.176.1467979382321; 
 Fri, 08 Jul 2016 05:03:02 -0700 (PDT)
Received: from gumby.homeunix.com ([81.171.97.59])
 by smtp.gmail.com with ESMTPSA id x83sm2718546wmx.9.2016.07.08.05.03.00
 for <freebsd-hackers@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Fri, 08 Jul 2016 05:03:01 -0700 (PDT)
Date: Fri, 8 Jul 2016 13:02:58 +0100
From: RW <rwmaillists@googlemail.com>
To: freebsd-hackers@freebsd.org
Subject: Re: Why kinfo_getvmmap is sometimes so expensive?
Message-ID: <20160708130258.7b772558@gumby.homeunix.com>
In-Reply-To: <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com>
References: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
 <20160707001913.GJ38613@kib.kiev.ua>
 <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 12:03:04 -0000

On Thu, 7 Jul 2016 15:32:28 -0700
Yuri wrote:

> On 07/06/2016 17:19, Konstantin Belousov wrote:
> > To calculate residency count for the process map entries, kernel
> > has to iterate over all pages.  This operation was somewhat
> > optimized in 10.3 and HEAD, particularly for the large sparce
> > mappings.  But for large populated mappings there is no other way
> > then to check each page.
> >
> > You may confirm my hypothesis by setting sysctl
> > kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU
> > consumption changed.  Of course, you will not get the resident count
> > in the returned structure, after the knob is tweaked.  
> 
> 
> When people raise the question of why malloc library doesn't unmap
> the memory, developers there usually say that they call
> madvise(MADV_FREE) and this is as good as unmap. 

It's better than unmapping because freed memory is commonly re-malloced
shortly after it's freed. 


> But this example
> shows that this isn't quite the case on the FreeBSD, and unmapping is
> better.

That doesn't mean it's better in general.

From owner-freebsd-hackers@freebsd.org  Fri Jul  8 16:20:01 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D0A1B83395
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Jul 2016 16:20:01 +0000 (UTC)
 (envelope-from cse.cem@gmail.com)
Received: from mail-it0-f47.google.com (mail-it0-f47.google.com
 [209.85.214.47])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1BA0B1226
 for <freebsd-hackers@freebsd.org>; Fri,  8 Jul 2016 16:20:00 +0000 (UTC)
 (envelope-from cse.cem@gmail.com)
Received: by mail-it0-f47.google.com with SMTP id h190so13109794ith.1
 for <freebsd-hackers@freebsd.org>; Fri, 08 Jul 2016 09:20:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:reply-to:in-reply-to:references
 :from:date:message-id:subject:to:cc;
 bh=UNL5m55Wd9TBiws/ufa44O9Q7Of0TVuTF3jTudwKjdw=;
 b=UI5rRechDOR1LwHBwc4dnlbnD9oACJkcTSRldnt7JEmE1uaZkvZOXe4lzrSSvtDJ1v
 Wue9lNg+BBktFnnc1XiXfW0Gr3PuA/6zd+f7pGmHswOzRD2bGr8I4Ixcmt3BjF+PcMHX
 1YfMp2gihmye5WGLN5xE7fN52omJQJ/JYPeHN9RTr2lFF8oA9R6IQvQGwwRbqn6bgJXV
 a+aA2SAIw+LGxog9kMwSwLGUtm7ogUdICJunz5yCE7UfSTcP4QKDxt6X+zAJc7cqIRFS
 zqLaUc9rPfK9qln3W9/Koubony42HFMKw24ZcQzuxHTCKBqCxahZselRh5QzkcqKs7/4
 fVSQ==
X-Gm-Message-State: ALyK8tKkZh12+cY+mX8SJOlARoUSv8lqStCK7WxOLf0daTsGIKgiUDBcL4l2frhuwKtqNA==
X-Received: by 10.36.58.13 with SMTP id m13mr4076460itm.81.1467993744677;
 Fri, 08 Jul 2016 09:02:24 -0700 (PDT)
Received: from mail-io0-f176.google.com (mail-io0-f176.google.com.
 [209.85.223.176])
 by smtp.gmail.com with ESMTPSA id o139sm1490481ito.4.2016.07.08.09.02.24
 for <freebsd-hackers@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 08 Jul 2016 09:02:24 -0700 (PDT)
Received: by mail-io0-f176.google.com with SMTP id s93so4922600ioi.3
 for <freebsd-hackers@freebsd.org>; Fri, 08 Jul 2016 09:02:24 -0700 (PDT)
X-Received: by 10.107.46.162 with SMTP id u34mr9443035iou.162.1467993744071;
 Fri, 08 Jul 2016 09:02:24 -0700 (PDT)
MIME-Version: 1.0
Reply-To: cem@freebsd.org
Received: by 10.36.206.2 with HTTP; Fri, 8 Jul 2016 09:02:23 -0700 (PDT)
In-Reply-To: <20160708130258.7b772558@gumby.homeunix.com>
References: <e6dc27c0-0454-0666-b3e1-887bd116a847@rawbw.com>
 <20160707001913.GJ38613@kib.kiev.ua>
 <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com>
 <20160708130258.7b772558@gumby.homeunix.com>
From: Conrad Meyer <cem@freebsd.org>
Date: Fri, 8 Jul 2016 09:02:23 -0700
X-Gmail-Original-Message-ID: <CAG6CVpXQE=ox4D4WB1Z+AhV79QzStiVfFwm0V3_qOB2N5-KzzA@mail.gmail.com>
Message-ID: <CAG6CVpXQE=ox4D4WB1Z+AhV79QzStiVfFwm0V3_qOB2N5-KzzA@mail.gmail.com>
Subject: Re: Why kinfo_getvmmap is sometimes so expensive?
To: RW <rwmaillists@googlemail.com>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 16:20:01 -0000

On Fri, Jul 8, 2016 at 5:02 AM, RW via freebsd-hackers
<freebsd-hackers@freebsd.org> wrote:
> On Thu, 7 Jul 2016 15:32:28 -0700
> Yuri wrote:
>
>> When people raise the question of why malloc library doesn't unmap
>> the memory, developers there usually say that they call
>> madvise(MADV_FREE) and this is as good as unmap.
>
> It's better than unmapping because freed memory is commonly re-malloced
> shortly after it's freed.
>
>> But this example
>> shows that this isn't quite the case on the FreeBSD, and unmapping is
>> better.
>
> That doesn't mean it's better in general.

Additionally, it would not be difficult to make
"getProcessSizeBytes()" cheaper without changing malloc.  Fetching the
entire VM map from the kernel when you only care about an integer RSS
count is obviously inefficient.

Best,
Conrad

From owner-freebsd-hackers@freebsd.org  Fri Jul  8 16:34:38 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4BA52B839E7
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Jul 2016 16:34:38 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2FF211F3E
 for <freebsd-hackers@freebsd.org>; Fri,  8 Jul 2016 16:34:37 +0000 (UTC)
 (envelope-from allanjude@freebsd.org)
Received: from [10.1.1.2] (unknown [10.1.1.2])
 (Authenticated sender: allanjude.freebsd@scaleengine.com)
 by mx1.scaleengine.net (Postfix) with ESMTPSA id D28421622
 for <freebsd-hackers@freebsd.org>; Fri,  8 Jul 2016 16:34:30 +0000 (UTC)
Subject: Re: help with onboard LAN
To: freebsd-hackers@freebsd.org
References: <alpine.BSF.2.20.1607081251001.854@laptop.wojtek.intra>
From: Allan Jude <allanjude@freebsd.org>
Message-ID: <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org>
Date: Fri, 8 Jul 2016 12:34:30 -0400
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <alpine.BSF.2.20.1607081251001.854@laptop.wojtek.intra>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 16:34:38 -0000

On 2016-07-08 06:54, Wojciech Puchar wrote:
> my supermicro-rebranded server is specified as having 2 1Gb/s ethernet
> ports onboard
> 
> what actually is:
> 
> ix0@pci0:3:0:0: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = 'Ethernet Controller 10-Gigabit X540-AT2'
>     class      = network
>     subclass   = ethernet
> ix1@pci0:3:0:1: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01
> hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = 'Ethernet Controller 10-Gigabit X540-AT2'
>     class      = network
>     subclass   = ethernet
> 
> 
> 
> which is strange
> 
> card is autodetected with ixgbe driver under FreeBSD 10 and i put
> 
> device miibus
> device ixgbe
> 
> in my custom kernel.
> 
> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
> port 0xe020-0xe03f mem 0xfbc00000-0xfbdfffff,0xfbe04000-0xfbe07fff irq
> 42 at device 0.0 on pci3
> ix0: Using MSIX interrupts with 9 vectors
> ix0: Ethernet address: 0c:c4:7a:6e:7e:9e
> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
> port 0xe000-0xe01f mem 0xfba00000-0xfbbfffff,0xfbe00000-0xfbe03fff irq
> 45 at device 0.1 on pci3
> ix1: Using MSIX interrupts with 9 vectors
> ix1: Ethernet address: 0c:c4:7a:6e:7e:9f
> ix1: PCI Express Bus: Speed 5.0GT/s Width x8
> 
> And it works.
> 
> But seems i have autonegotiation problem with gigabit switch - it
> connects at 100Mb/s
> 
> ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> 
> options=8407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
> 
>         ether 0c:c4:7a:6e:7e:9e
>         inet 194.1.144.90 netmask 0xfffffff8 broadcast 194.1.144.95
>         inet 194.1.144.91 netmask 0xfffffff8 broadcast 194.1.144.95
>         media: Ethernet autoselect (100baseTX <full-duplex>)
>         status: active
> 
> 
> i tried
> 
> ifconfig ix0 media 1000baseT
> 
> but it shows error.
> 
> 
> Any idea what i really have in my server and how to manually set it up
> to 1Gb/s
> 
> Seems like no phy is detected.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

Are you sure they are 1 Gigabit ports? They look like 10 Gigabit ports.

I have not had trouble getting any of my 10 Gigabit ports to link to a 1
Gigabit switch.

Install and run 'dmidecode', and in the first page or two, get the model
number of the supermicro motherboard. It will help shed light on the
situation.

Will look something like this:

Base Board Information
        Manufacturer: Supermicro
        Product Name: X10DRi-LN4+


-- 
Allan Jude

From owner-freebsd-hackers@freebsd.org  Sat Jul  9 01:43:14 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A645FB83E1D;
 Sat,  9 Jul 2016 01:43:14 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: from mail-yw0-x22a.google.com (mail-yw0-x22a.google.com
 [IPv6:2607:f8b0:4002:c05::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 73B8C18B9;
 Sat,  9 Jul 2016 01:43:14 +0000 (UTC)
 (envelope-from dcrosstech@gmail.com)
Received: by mail-yw0-x22a.google.com with SMTP id l125so50839438ywb.2;
 Fri, 08 Jul 2016 18:43:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:from:date:message-id:subject:to:cc;
 bh=s9wFMRb5nSXLWOxXSbco6K4+d90kJ7McKN32ouFL5Cw=;
 b=Jre5cmm+10apMmStyyjiCkew3eGvH5AppI+CpYf74E0Ei2/x+WZJN2nR2L/hviMbCd
 315YEYB0ufUlbEuCHxuEf1DWrmyKYhcBP5KpuoFw9G8dhdsYmowY6A7FPsKv1vep/PW3
 P76DYlb4eCa9xzllyxrLb6YaO11jW69DlUmOWQ/xe1qjGWlLHLpZzFoqv0JgXhM3i9Rc
 jI/rT54BQQdglWE5Nz1Uhu1QjKe+maTV6zH0j4N8OnTKK+f7TwrFz0InVV0FZMjahC0S
 85L3dxqYjuCeG2p6Cf6ao/Vu57Qo7YvoZKilE2sXYuDnnsn3OK3FJPA9bZWYAX5o6O+K
 h3iQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc;
 bh=s9wFMRb5nSXLWOxXSbco6K4+d90kJ7McKN32ouFL5Cw=;
 b=On6E2EAJdOGwkgVsZLqZAs5wREV41ReW5Fqv6pINIHNRkv1APBiiaxgMx9r9xHTVrF
 qePcn2qPcLaSnxu0QPuyB2dpb/ZOupFX9ndLN81zXr5sYjjvKzlmdYsi8zXtCUsaSUaK
 NxtyCZ0HIeiW9K+P7Zq+5ASve9CzovfPch1IrcJNmBSS6PR3492JjzTV8YIc5qzcxm+X
 jyyOzh61tKhZ9V2X6XumPz/eG8kjO74wZQ9gJ4GsSLanOT0MHJ6RZjx4y8/kxnQSf0YN
 JT7UIOGwPwF6/j7YgU2yCiqKc8Bjn5lvkvSgjcsqikNdOCsGNrkYrClvfA2HwnN4/H65
 Mfpw==
X-Gm-Message-State: ALyK8tLjYGaFn4fg4A4RzLheSqw5T/aKTklr34TdDzp2cZjrkSWuVm6NlujkdTbmLsiaalVEAAhxcTCtWgHwag==
X-Received: by 10.13.217.20 with SMTP id b20mr115656ywe.44.1468028592896; Fri,
 08 Jul 2016 18:43:12 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.37.212.66 with HTTP; Fri, 8 Jul 2016 18:43:12 -0700 (PDT)
From: David Cross <dcrosstech@gmail.com>
Date: Fri, 8 Jul 2016 21:43:12 -0400
Message-ID: <CAM9edeNYDPhq3E3zxm8zmoiRcfPp=87xF2veUVP_jPGj4a67fA@mail.gmail.com>
Subject: Re: Reproducable panic in FFS with softupdates and no journaling
 (10.3-RELEASE-pLATEST) FOUND IT, including reproduction steps
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org
X-Mailman-Approved-At: Sat, 09 Jul 2016 01:52:10 +0000
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.22
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2016 01:43:14 -0000

Ok... I found it.

All of the writes go through ffs_write (including VOP_RECLAIM, so my
statement that VOP_RECLAIM couldn't handle things that vinvalbuf left
behind is obviously incorrect).  Sometimes it worked, sometimes it paniced,
I started putting more deugging into it and I noticed the following: The
problem file would balloc twice as follows:

attempting to balloc inode 18237205
softdep_setup_allocdirect(18237205, 1, 72834400, 0, 8192, 0,
0xfffffe00f76a6d88)
balloc at 291337600, flags: 50000


attempting to balloc inode 18237205
softdep_setup_allocdirect(18237205, 0, 72834448, 0, 16384, 0,
0xfffffe00f7749970)
balloc at 291337792, flags: 7f040080
panic: softdep_deallocate_dependencies: dangling deps

Furthrer reading of ffs_write to figure out why it worked sometimes and not
others pointed me at the IO_SYNC flag, if passed in ffs_write dispatches to
bwrite.. which gives the panic, otherwise it goes to bawrite which does
not.  However the problem is in ufs_balloc, around line 778 (which I saw in
the earlier newbuf dump); There NO call to any write method for that buffer.

If we compare this to the other calls to softdep_setup_allocdirect in that
function (lines: 148, 264, 708, 828) we see that each of them has some call
to bwrite, bdwrite, bawrite following it (a number of the other calls do
not make any direct calls to b*writes either, I do not know nearly enough
to say if those are correct or incorrect; I tried adding bwrite arround
those lines with a conditional on IO_SYNC and I only made it panic
earlier.  I just don't know what the semantics of this enough.

That being said, I was finally able to isolate a set of reproduction steps
that anyone can run.  As it stands it relies on a set of filesystem options
that are no longer standard (but were, not that long ago), but I definitely
believe they could be trivially modified to work on *any* UFS1/UFS2
filesystem... To that extent I am NOT including them, I will reply
individually with the exploit code an instructions to reproduce; if you
want, and you have an appropriate commit history or other credentials I
will forward it on.

Thanks, and I eagerly look forward to the patch, or assisting where I can
in development.

From owner-freebsd-hackers@freebsd.org  Sat Jul  9 05:14:37 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46BBCB8467D
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Jul 2016 05:14:37 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: from puchar.net (puchar.net [194.1.144.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "puchar.net", Issuer "puchar.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id D054918A8;
 Sat,  9 Jul 2016 05:14:36 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: Received: from 127.0.0.1 (localhost [127.0.0.1])
 by puchar.net (8.15.2/8.14.9) with ESMTPS id u695EXWS070743
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 9 Jul 2016 07:14:34 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from laptop.wojtek.intra (localhost [127.0.0.1])
 by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u695Eb2C008603;
 Sat, 9 Jul 2016 07:14:37 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from localhost (wojtek@localhost)
 by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u695EVhO008600;
 Sat, 9 Jul 2016 07:14:32 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs
Date: Sat, 9 Jul 2016 07:14:31 +0200 (CEST)
From: Wojciech Puchar <wojtek@puchar.net>
X-X-Sender: wojtek@laptop.wojtek.intra
To: Allan Jude <allanjude@freebsd.org>
cc: freebsd-hackers@freebsd.org
Subject: Re: help with onboard LAN - fixed
In-Reply-To: <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org>
Message-ID: <alpine.BSF.2.20.1607090713590.8599@laptop.wojtek.intra>
References: <alpine.BSF.2.20.1607081251001.854@laptop.wojtek.intra>
 <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org>
User-Agent: Alpine 2.20 (BSF 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (puchar.net [10.0.1.1]); Sat, 09 Jul 2016 07:14:34 +0200 (CEST)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2016 05:14:37 -0000

by using driver from latest FreeBSD-10.
Everything now works fine. thanks for help


From owner-freebsd-hackers@freebsd.org  Sat Jul  9 10:47:17 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80BB3B831FE
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Jul 2016 10:47:17 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: from puchar.net (puchar.net [194.1.144.90])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "puchar.net", Issuer "puchar.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 1596719B7
 for <freebsd-hackers@freebsd.org>; Sat,  9 Jul 2016 10:47:16 +0000 (UTC)
 (envelope-from wojtek@puchar.net)
Received: Received: from 127.0.0.1 (localhost [127.0.0.1])
 by puchar.net (8.15.2/8.14.9) with ESMTPS id u69AlExk008206
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
 for <freebsd-hackers@freebsd.org>; Sat, 9 Jul 2016 12:47:14 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from laptop.wojtek.intra (localhost [127.0.0.1])
 by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u69AlI9w001211
 for <freebsd-hackers@freebsd.org>; Sat, 9 Jul 2016 12:47:18 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
Received: from localhost (wojtek@localhost)
 by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u69AlD0K001208
 for <freebsd-hackers@freebsd.org>; Sat, 9 Jul 2016 12:47:13 +0200 (CEST)
 (envelope-from wojtek@puchar.net)
X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs
Date: Sat, 9 Jul 2016 12:47:13 +0200 (CEST)
From: Wojciech Puchar <wojtek@puchar.net>
X-X-Sender: wojtek@laptop.wojtek.intra
To: freebsd-hackers@freebsd.org
Subject: apache&EnableSendfile on = 100% CPU
Message-ID: <alpine.BSF.2.20.1607091245360.1205@laptop.wojtek.intra>
User-Agent: Alpine 2.20 (BSF 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (puchar.net [10.0.1.1]); Sat, 09 Jul 2016 12:47:15 +0200 (CEST)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2016 10:47:17 -0000

is it apache or possible kernel bug (FreeBSD 10)?

When i turn on EnableSendfile setting in apache config, the process 
handling http connection would use 100% CPU no matter if i transfer 1kB/s 
or 1GB/s.

turning it off fixes the problem.

where can i search for problem source?


From owner-freebsd-hackers@freebsd.org  Sat Jul  9 06:47:47 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88FA0B85B87
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Sat,  9 Jul 2016 06:47:47 +0000 (UTC)
 (envelope-from rupavath@juniper.net)
Received: from NAM03-CO1-obe.outbound.protection.outlook.com
 (mail-co1nam03on0125.outbound.protection.outlook.com [104.47.40.125])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (Client CN "mail.protection.outlook.com",
 Issuer "Microsoft IT SSL SHA2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 487A619A0
 for <freebsd-hackers@freebsd.org>; Sat,  9 Jul 2016 06:47:46 +0000 (UTC)
 (envelope-from rupavath@juniper.net)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=junipernetworks.onmicrosoft.com; s=selector1-juniper-net;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=nGKVTKXnpC07ePQLOdDN2ZKDLmgub5HJMNiLv/rOhBA=;
 b=GPovcPY+cJvZSXGpu6H4qBUOJ9AWvU7ej6xSyoKbOxhfP/N+8hLEJrVthnAuMS0ymRDMdj9x8TCNNouUZoGesJhPtHIgVbtigwZkZAE7V9gHMq/w6cdbQorNBjuMzR2xz1kjttN9YHCvHNg3Cd44554czVnzu8Xyh4QNH0uapaI=
Received: from CY4PR05MB2824.namprd05.prod.outlook.com (10.169.182.146) by
 CY4PR05MB2823.namprd05.prod.outlook.com (10.169.182.145) with Microsoft SMTP
 Server (TLS) id 15.1.523.12; Sat, 9 Jul 2016 00:13:35 +0000
Received: from CY4PR05MB2824.namprd05.prod.outlook.com ([10.169.182.146]) by
 CY4PR05MB2824.namprd05.prod.outlook.com ([10.169.182.146]) with mapi id
 15.01.0523.028; Sat, 9 Jul 2016 00:13:35 +0000
From: Sreekanth Rupavatharam <rupavath@juniper.net>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: mbuf leak in kern_sendit?
Thread-Topic: mbuf leak in kern_sendit?
Thread-Index: AQHR2Xa75spuXAJjqkGxwSdNjN01Rg==
Date: Sat, 9 Jul 2016 00:13:35 +0000
Message-ID: <1286BFDE-9238-4967-913F-26E0E28D0F74@juniper.net>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
user-agent: Microsoft-MacOutlook/f.17.0.160611
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=rupavath@juniper.net; 
x-ms-exchange-messagesentrepresentingtype: 1
x-originating-ip: [2601:646:8200:65cc:1432:f7ab:4189:ef7]
x-ms-office365-filtering-correlation-id: 145b30de-07d7-44c5-4326-08d3a78dde62
x-microsoft-exchange-diagnostics: 1; CY4PR05MB2823;
 6:oUpc0Z/rwOeWd17lyTOHAY74K4dXuWeSZCyQr3YDiIpjk1Z1Bh+QXWOIoHcjFWDHIwIMAs5foq4Pm8/tRHv26iuhldzbkAN0c7+PCZZy4P98j9jgU9XLESEQh9aa5rVKVJx66vIxhGKzryQI/lodPyO+j8Ee6mOCSkc+YinUPiTMQvxB2VZoh1TQIXL+2iTVcJVA2/ClCtbuQoL8BWc0fFn65KrBIZaLtOmh5Y1SmLxl5HZZVKuPS9FlC4CrE+H2ceVUf17U4tsLR/dzkX1+xEoLPbgM/uYrn4Q87nsP5gCmYqwWIlfbIl2qmsqz9FKmgAlZJwiLIRiQtjoSTyS7Yw==;
 5:WPegGxBcEKMAyV7glij6y8DOyNsDzLORoS+hs6BE1jDNAatllfQxPObh2Sh/gMHVZ/HokjVNxELOXihvRKqjr7H8qeE3T5qi4PGCDYKm991CZLGyIkT1ZoteW+HMHCPecy6KGEF5u3yAliLV5C+FxA==;
 24:2mf0igjC4+5/BKefiv8pE2n1+Dj/BSVDdDrqDsfpXb3YD0g9aqRHOkVP2jSFx6nhcTguQERJDPS+wEgNYdeVBRenohXFrGZ9yBgE/XKooWA=;
 7:45iyb0CEGuRDxa6I3kjVeIdKRQBQbbWcW85aPKGgVloax8Ov8eaouUZipgQ2i5Gen2ym1+TYo9nXCDaa0SifB/OPoS1TK0YR7vCBePI9Kp8qWVXSn3Thi9yetNO1PexbrzsxK77A0SqwEwhQwoNOgNfxmTAwVFC7AuKNTdBcjqKXfetPAxRi/B/S6DHn21cumvT2qs0do7DHfHNggwNh6SX+xv5dIq3prt2f9iBzrjUtl0XXfpVJ0E3ZYrpFYZ4W
x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR05MB2823;
x-microsoft-antispam-prvs: <CY4PR05MB2823A7F8E6678B891789897CC93D0@CY4PR05MB2823.namprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026);
 SRVR:CY4PR05MB2823; BCL:0; PCL:0; RULEID:; SRVR:CY4PR05MB2823; 
x-forefront-prvs: 0998671D02
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10019020)(6009001)(7916002)(189002)(199003)(3280700002)(97736004)(83506001)(229853001)(450100001)(82746002)(110136002)(107886002)(101416001)(2351001)(7846002)(77096005)(33656002)(81166006)(189998001)(106356001)(81156014)(54356999)(105586002)(8936002)(2900100001)(106116001)(83716003)(87936001)(50986999)(99286002)(122556002)(68736007)(5002640100001)(6116002)(8676002)(4001350100001)(586003)(11100500001)(92566002)(2501003)(10400500002)(36756003)(86362001)(305945005)(7736002)(102836003)(3660700001)(5640700001)(2906002)(3826002)(104396002);
 DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR05MB2823;
 H:CY4PR05MB2824.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
received-spf: None (protection.outlook.com: juniper.net does not designate
 permitted sender hosts)
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="utf-8"
Content-ID: <CFC68D510902624AB4F7F4808FA693C7@namprd05.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: juniper.net
X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2016 00:13:35.1349 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR05MB2823
X-Mailman-Approved-At: Sat, 09 Jul 2016 11:02:06 +0000
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Jul 2016 06:47:47 -0000

SSBzZWUgaW4ga2Vybl9zZW5kaXQoKSBmdW5jdGlvbihzdGFibGUvMTApLCB0aGUgY29udHJvbCBt
YnVmIGRvZXNu4oCZdCBnZXQgZnJlZWQgb24gZXJyb3IuIEUuZy4sIA0KOTE0IAkgICAgICAgIGlm
IChtcC0+bXNnX25hbWUgIT0gTlVMTCkgew0KOTE1IAkgICAgICAgICAgICAgICAgZXJyb3IgPSBt
YWNfc29ja2V0X2NoZWNrX2Nvbm5lY3QodGQtPnRkX3VjcmVkLCBzbywNCjkxNiAJICAgICAgICAg
ICAgICAgICAgICBtcC0+bXNnX25hbWUpOw0KOTE3IAkgICAgICAgICAgICAgICAgaWYgKGVycm9y
ICE9IDApDQo5MTggCSAgICAgICAgICAgICAgICAgICAgICAgIGdvdG8gYmFkOyDih5AgSGVyZQ0K
OTE5IAkgICAgICAgIH0NCg0Kb3IgDQoNCjkzMyAgICAgICAgZm9yIChpID0gMDsgaSA8IG1wLT5t
c2dfaW92bGVuOyBpKyssIGlvdisrKSB7DQo5MzQgCSAgICAgICAgICAgICAgICBpZiAoKGF1aW8u
dWlvX3Jlc2lkICs9IGlvdi0+aW92X2xlbikgPCAwKSB7DQo5MzUgCSAgICAgICAgICAgICAgICAg
ICAgICAgIGVycm9yID0gRUlOVkFMOw0KOTM2IAkgICAgICAgICAgICAgICAgICAgICAgICBnb3Rv
IGJhZDsg4oeQIEhlcmUNCjkzNyAJICAgICAgICAgICAgICAgIH0NCjkzOCAJICAgICAgICB9DQoN
Cg0KOTY1IAliYWQ6DQo5NjYgCSAgICAgICAgZmRyb3AoZnAsIHRkKTsNCjk2NyAJICAgICAgICBy
ZXR1cm4gKGVycm9yKTsNCk5vIGZyZWUgb2YgY29udHJvbCBtYnVmIGhlcmUgZWl0aGVyLiANCg0K
QWN0dWFsbHksIHRoZSBvbmx5IHBsYWNlIHdoZXJlIHRoZSBtYnVmIGdldHMgZnJlZWQgaXMgd2hl
biBpdCBjYWxscyBwcnVfc29zZW5kIHdoZXJlIGl0IGdldHMgZnJlZWQgaW4gdGhlcmUuIEFtIEkg
bWlzc2luZyBzb21ldGhpbmcgaGVyZT8gRS5nLiwgdHJhY2tpbmcgdGhlIGNhbGwgdHJhY2UgZnJv
bSBzZW5kaXQNCnNlbmRpdCgpDQogICAgICAgc29ja2FyZ3MoKSAtPiBjb250cm9sIG1idWYgaXMg
YWxsb2NhdGVkIGhlcmUNCiAgICAgICBrZXJuX3NlbmRpdCgpIC0+IGl04oCZcyBmcmVlZCBvbmx5
IG9uIHBydV9zb3NlbmQoKQ0KICAgICAgIGNvbnRyb2wgbm90IGZyZWVkIG9uIGVycm9yLiAgQW0g
SSBtaXNzaW5nIHNvbWV0aGluZz8gDQoNCg0KDQoNClRoYW5rcywNCg0KLVNyZWVrYW50aA0KDQoN
Cg==