From owner-freebsd-hackers@freebsd.org Sun Jul 3 00:09:07 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEA76B81222 for ; Sun, 3 Jul 2016 00:09:07 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from d.mail.sonic.net (d.mail.sonic.net [64.142.111.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CC37D20C3 for ; Sun, 3 Jul 2016 00:09:07 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from zeppelin.tachypleus.net ([75.104.66.200]) (authenticated bits=0) by d.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id u6308oKv001358 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Sat, 2 Jul 2016 17:08:56 -0700 Subject: Re: Review request: sparse CPU ID maps To: outro pessoa References: <57761101.3030101@freebsd.org> Cc: "freebsd-hackers@freebsd.org" From: Nathan Whitehorn Message-ID: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> Date: Sat, 2 Jul 2016 17:08:54 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: X-Sonic-CAuth: UmFuZG9tSVYKmi8BVaO5vdwkFcy3d3W2aVNoXDMOt/jA28xtNafKqHjwt8UBhPWQ+fp0+rdq8bMkuG8QZshgnKtV8sUFzvflfsO0Q/6f0hY= X-Sonic-ID: C;TiMlUrJA5hGo5pNwxPCmMQ== M;dKzDVLJA5hGo5pNwxPCmMQ== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 00:09:08 -0000 A reasonable first pass at checking for this kind of bug is doing grep -lR '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following files: arm/mv/armadaxp/armadaxp_mp.c arm/include/counter.h arm/broadcom/bcm2835/bcm2836.c arm/broadcom/bcm2835/bcm2836_mp.c arm/freescale/imx/imx6_mp.c arm/allwinner/aw_mp.c arm/rockchip/rk30xx_mp.c arm/amlogic/aml8726/aml8726_mp.c arm/samsung/exynos/exynos5_mp.c arm/arm/mp_machdep.c arm/nvidia/tegra124/tegra124_mp.c arm64/include/counter.h arm64/arm64/gic_v3.c arm64/arm64/gic_v3_its.c arm64/arm64/gicv3_its.c All of them should, in some sense, be CPU_FOREACH(), but it may not matter. For example, it may not be possible to have sparse CPU IDs on some or all of those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think. -Nathan On 07/02/16 10:31, outro pessoa wrote: > Nathan, > What type of ARM hardware do you need? > > On Fri, Jul 1, 2016 at 2:43 AM, Nathan Whitehorn > > wrote: > > I have been working on fixing PR 210106 > (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210106) and > have run into the fact that several pieces of the kernel, notably > parts of subr_taskqueue.c, require that CPU IDs be dense in the > range [0, mp_ncpus), which the kernel does not guarantee, for > example in the case of CPUs with hyperthreading in which the > threading is disabled. This is leading to hangs in late boot in > -CURRENT. > > I've prepared the following patch, which fixes PR 210106, but I > would like a few more eyeballs on it before committing it. It > fixes most of the bogus uses of mp_ncpus in the kernel, but not > all: doing grep -R '< mp_ncpus' /sys | wc -l gives 52 remaining > instances of loops in [0, mp_ncpus) or [1, mp_ncpus), most or all > of which should instead be CPU_FOREACH(), but none of which I feel > comfortable changing (36 are in ARM code for hardware I don't have > access to). > > The patch lives here: > http://people.freebsd.org/~nwhitehorn/sparse_cpu_ids.diff > > -Nathan > _______________________________________________ > freebsd-hackers@freebsd.org > mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org > " > > From owner-freebsd-hackers@freebsd.org Sun Jul 3 02:30:19 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3160BB8FD8E for ; Sun, 3 Jul 2016 02:30:19 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com [IPv6:2607:f8b0:400e:c03::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0888228AC for ; Sun, 3 Jul 2016 02:30:19 +0000 (UTC) (envelope-from paul.koch137@gmail.com) Received: by mail-pa0-x22b.google.com with SMTP id zl15so8648703pab.3 for ; Sat, 02 Jul 2016 19:30:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=; b=kFzUvI8nHwSI+GJ0/geizmv7pFpNtoDzYxiO965F/zvJLoWIFZ0mqNzjkQNOdK1QlO AeevwnC5vhojS1ghzv4OuGzSWsHU9tKiJXLhbzBODjehDrLSnAMiZQwsoGGC6KiI0nli 21U8qN2EoqoFX1yCtpCBBJoRH1XA0QR/ipq2ksqnnGMgiiFFqCeSedkuM5BnX3qljPUZ jepSkOAYnDB4ylmwIFvH+HMvrue9nnnnnaxq8E7wMJFaIEwzUaGjTSAhRKRWGStrt8V8 LLpGc/XZacY/IvD2reoKNTUbWPjeOSvQmfr7U6gTs6993DZSBKT8N/GkEnXUjADHD7bP lLpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PuTrRoDEsNQYFAUu5pKW9A9MTimqKttEYHuTr+cldgY=; b=QYhEG2O3X9qSfIhDn6nhuTQUZguJ7KntyrjejbAdZoJ8TRmF5xmnI8Pq1cPnhMoXx7 W6dR9Sq6dUaK0rL1fGtr1Xxa2UebsM12yZAjlabLtuitNYeG/LXFcHQJoO/kWtGdBBsh 2bUql57/KADXsr5MpUMZyl+LfudM+ZDCHhgeJ62q8lkDfE6MKxTDXy7CAX70dLGtoWL8 N7Qtwiic1VSqUXfBJzxg+53RDwGCpsijnufoa3ghdKEIPx99bTzJvbITcM9D8F1KieS2 4mkjAhemDI3XzYBuC54m/Gqx1J/b5r7T9h+jb0RZLYQ6y+OS94EpPV7jInTdUyndGm9K y89Q== X-Gm-Message-State: ALyK8tL6asMcrCUtgocJ11GuYyKKAdUMti2wUf4j0CS7/W+jBbKw0mDr6/y3j6Z+vs4d6w== X-Received: by 10.66.193.231 with SMTP id hr7mr10761945pac.28.1467513018439; Sat, 02 Jul 2016 19:30:18 -0700 (PDT) Received: from splash.akips.com (CPE-120-146-191-2.static.qld.bigpond.net.au. [120.146.191.2]) by smtp.gmail.com with ESMTPSA id o68sm698740pfb.18.2016.07.02.19.30.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 02 Jul 2016 19:30:17 -0700 (PDT) Date: Sun, 3 Jul 2016 12:30:04 +1000 From: Paul Koch To: Cedric Blancher Cc: "freebsd-hackers@freebsd.org" Subject: Re: ZFS ARC and mmap/page cache coherency question Message-ID: <20160703123004.74a7385a@splash.akips.com> In-Reply-To: References: <20160630140625.3b4aece3@splash.akips.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 02:30:19 -0000 Is there a "long story", or is mmap() performance on ZFS doomed for the foreseeable future ? Paul. > Short story: ZFS was tacked on the kernel and was never properly > integrated into the VM page management, which leads to DRAMATIC poor > performance for anything which uses mmap() for write IO. This was > solved in Oracle Solaris with the great VM allocator rewrite which > landed after Opensolaris was made closed source again. > > Without a complete rewrite of the VM system this problem is unsolvable. > > Ced > > On 30 June 2016 at 06:06, Paul Koch wrote: > > > > Posted this to -stable on the 15th June, but no feedback... > > > > We are trying to understand a performance issue when syncing large mmap'ed > > files on ZFS. > > > > Example test box setup: > > FreeBSD 10.3-p5 > > Intel i7-5820K 3.30GHz with 64G RAM > > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe > > > > Read performance of a sequentially written large file on the pool is > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's some large database files using MAP_NOSYNC, and we > > call fsync() every 10 minutes when we know the file system is mostly > > idle. In our test setup, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files (under 10M). All of the memory pages in the > > mmap'ed files are updated every minute with new values, so the entire > > mmap'ed file needs to be synced to disk, not just fragments. > > > > When the 10 minute fsync() occurs, gstat typically shows very little disk > > reads and very high write speeds, which is what we expect. But, every 80 > > minutes we process the data in the large mmap'ed files and store it in > > highly compressed blocks of a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the performance of the next fsync() of the mmap'ed > > files falls off a cliff. We are assuming it is because the ARC has > > thrown away the cached data of the mmap'ed files. gstat shows lots of > > read/write contention and lots of things tend to stall waiting for disk. > > > > Is this just a lack of ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the ARC with the mmap'ed files again before we > > call fsync() ? > > > > We've tried cat and read() on the mmap'ed files but doesn't seem to touch > > the disk at all and the fsync() performance is still poor, so it looks > > like the ARC is not being filled. msync() doesn't seem to be much > > different. mincore() stats show the mmap'ed data is entirely incore and > > referenced. > > > > Paul. > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Sun Jul 3 07:45:28 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62A4AB902AE for ; Sun, 3 Jul 2016 07:45:28 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 52B3C2D32 for ; Sun, 3 Jul 2016 07:45:27 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467531924887811.5045019290758; Sun, 3 Jul 2016 00:45:24 -0700 (PDT) Date: Sun, 03 Jul 2016 00:45:24 -0700 From: Matthew Macy To: "Paul Koch" Cc: "Cedric Blancher" , "freebsd-hackers" Message-ID: <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> In-Reply-To: <20160703123004.74a7385a@splash.akips.com> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> Subject: Re: ZFS ARC and mmap/page cache coherency question MIME-Version: 1.0 User-Agent: Zoho Mail X-Mailer: Zoho Mail Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 07:45:28 -0000 =20 =20 Cedric greatly overstates the intractability of resolving it. N= onetheless, since the initial import very little has=C2=A0been done to impr= ove integration, and I don't know of anyone who is up to the task taking an= interest in it. Consequently, mmap() performance is likely "doomed" for th= e foreseeable future.-M---- On Sat, 02 Jul 2016 19:30:04 -0700 Paul Koch wrote ---- Is there a "long story", or is mmap() per= formance on ZFS doomed for the foreseeable future ? =C2=A0=C2=A0=C2=A0=C2= =A0Paul. > Short story: ZFS was tacked on the kernel and was never properl= y > integrated into the VM page management, which leads to DRAMATIC poor > = performance for anything which uses mmap() for write IO. This was > solved = in Oracle Solaris with the great VM allocator rewrite which > landed after = Opensolaris was made closed source again. > > Without a complete rewrite o= f the VM system this problem is unsolvable. > > Ced > > On 30 June 2016 a= t 06:06, Paul Koch wrote: > > > > Posted this to -= stable on the 15th June, but no feedback... > > > > We are trying to unders= tand a performance issue when syncing large mmap'ed > > files on ZFS. > > >= > Example test box setup: > > FreeBSD 10.3-p5 > > Intel i7-5820K 3.30GHz= with 64G RAM > > 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe >= > > > Read performance of a sequentially written large file on the pool is= > > typically around 950Mbytes/sec using dd. > > > > Our software mmap's s= ome large database files using MAP_NOSYNC, and we > > call fsync() every 10= minutes when we know the file system is mostly > > idle. In our test setu= p, the database files are 1.1G, 2G, 1.4G, 12G, > > 4.7G and ~20 small files= (under 10M). All of the memory pages in the > > mmap'ed files are updated= every minute with new values, so the entire > > mmap'ed file needs to be s= ynced to disk, not just fragments. > > > > When the 10 minute fsync() occur= s, gstat typically shows very little disk > > reads and very high write spe= eds, which is what we expect. But, every 80 > > minutes we process the dat= a in the large mmap'ed files and store it in > > highly compressed blocks o= f a ~300G file using pread/pwrite (i.e. not > > mmap'ed). After that, the p= erformance of the next fsync() of the mmap'ed > > files falls off a cliff. = We are assuming it is because the ARC has > > thrown away the cached data = of the mmap'ed files. gstat shows lots of > > read/write contention and lo= ts of things tend to stall waiting for disk. > > > > Is this just a lack of= ZFS ARC and page cache coherency ?? > > > > Is there a way to prime the AR= C with the mmap'ed files again before we > > call fsync() ? > > > > We've t= ried cat and read() on the mmap'ed files but doesn't seem to touch > > the = disk at all and the fsync() performance is still poor, so it looks > > like= the ARC is not being filled. msync() doesn't seem to be much > > differen= t. mincore() stats show the mmap'ed data is entirely incore and > > referen= ced. > > > > Paul. > > ____________________________________________= ___ > > freebsd-hackers@freebsd.org mailing list > > https://lists.freebsd.= org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to >= > "freebsd-hackers-unsubscribe@freebsd.org" ____________________________= ___________________ freebsd-hackers@freebsd.org mailing list https://lists.= freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail = to "freebsd-hackers-unsubscribe@freebsd.org"=20 =20 =20 =20 =20 From owner-freebsd-hackers@freebsd.org Sun Jul 3 15:50:58 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96842B8FCEF for ; Sun, 3 Jul 2016 15:50:58 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D14F28D7 for ; Sun, 3 Jul 2016 15:50:57 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 9D8A121E527 for ; Sun, 3 Jul 2016 10:43:34 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> From: Karl Denninger Message-ID: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> Date: Sun, 3 Jul 2016 10:43:19 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms010805070601040608020308" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 15:50:58 -0000 This is a cryptographically signed message in MIME format. --------------ms010805070601040608020308 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/3/2016 02:45, Matthew Macy wrote: > =20 > Cedric greatly overstates the intractability of resolving i= t. Nonetheless, since the initial import very little has been done to imp= rove integration, and I don't know of anyone who is up to the task taking= an interest in it. Consequently, mmap() performance is likely "doomed" f= or the foreseeable future.-M----=20 Wellllll.... I've done a fair bit of work here (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the political issues are at least as bad as the coding ones. In short what Cedric says about the root of the issue is real. VM is really-well implemented for what it handles, but the root of the issue is that while the UFS data cache is part of VM and thus it "knows" about it, ZFS is not because it is a "bolt-on." UMA leads to further (severe) complications for certain workloads.=20 Finally the underlying ZFS dmu_tx sizing code is just plain wrong and in fact this is one of the biggest issues as when the system runs into trouble it can take a bad situation and make it a *lot* worse. There is only one write-back cache maintained instead of one per zvol, and that's flat-out broken. Being able to re-order async writes to disk (where fsync() has not been called) and minimizing seek latency is excellent.=20 Sadly rotating media these days sabotages much of this due to opacity introduced at the drive level (e.g. varying sector counts per track, etc) but it can still help. But where things go dramatically wrong is on a system where a large write-back cache is allocated relative to the underlying zvol I/O performance (this occurs on moderately-large and bigger RAM systems) with moderate numbers of modest-performance rotating media; in this case it is entirely possible for a flush of the write buffers to require upwards of a *minute* to complete, during which all other writes block. If this happens during periods of high RAM demand and you manage to trigger a page-out at the same time system performance will go straight into the toilet. I have seen instances where simply trying to edit a text file with vi (or a "select" against a database table) will hang for upwards of a minute leading you to believe the system has crashed, when it fact it has not. The interaction of VM with the above can lead to severe pathological behavior because the VM system has no way to tell the ZFS subsystem to pare back ARC (and at least as important, perhaps more-so -- unused but allocated UMA) when memory pressure exists *before* it pages. ZFS tries to detect memory pressure and do this itself but it winds up competing with the VM system. This leads to demonstrably wrong behavior because you never want to hold disk cache in preference to RSS; if you have a block of data from the disk the best case is you avoid one I/O (to re-read it); if you page you are *guaranteed* to take one I/O (to write the paged-out RSS to disk) and *might* take two (if you then must read it back in.) In short trading the avoidance of one *possible* I/O for a *guaranteed* I/O and a second possible one is *always* a net lose. To "fix" all of this "correctly" (for all cases, instead of certain cases) VM would have to "know" about ARC and its use of UMA, along with being able to police both. ZFS also must have the dmu_tx writeback cache sized per-zvol with its size chosen by the actual I/O performance characteristics of the disks in the zvol itself. I've looked into doing both and it's fairly complex, and what's worse is that it would effectively "marry" VM and ZFS, removing the "bolt-on" aspect of things. This then leads to a lot of maintenance work over time because any time ZFS code changes (and it does, quite a bit) you then have to go back through that process in order to become coherent with Illumos. The PR above resolved (completely) the issues I was having along with a number of other people on 10.x and before (I've not yet rolled it forward to 11.) but it's quite clearly a hack of sorts, in that it detects and treats symptoms (e.g. dynamic TX cache size modification, etc) rather than integrating VM and ZFS cache management. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms010805070601040608020308 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDMxNTQzMTlaME8GCSqGSIb3DQEJBDFCBECY /gnxWw2Ru9QcdkEP45S3vFDHKc0DCTSTjQ0/rDnq0wnGcZ7nZvzOcYwUObgkXsJxxiNj3mAW k5yFS2ELzpI5MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAKEFxS8zS ezGQK4SJsAWSr1expcp8Abo06jjXZRPsJMlJPu+Pc7LKrjOQzlAtiqq5jhw0X42nmY/NC85y 8hOrB4PBxor36GgWp5+2v/mIgyA1xsE87UGedFZ7WKT9DtlJszM9zqd2uvDpFXK6tsj2ye3K 8XvRi6cfY5HnBwnqhi0Qr8e+60K7QXY1YEnKKeABFRpIRLBB2IzHihRcoL/AhpUnoZzUqUYc ZVOvI+xK7L7sw0nw95ovvYBOwuxKOTj6CVki58uTiKDpF4rV/SK+v4wXeD+N7dyNH/HR6T6i uZn2jiLwVVGbluAJHpOKrHBS0/NeD34wCX1QIB3mWVELPRHQpoALwwsBBMEUGyrCVld8siSL tZM0eq/YLl+7ruc9+dbKcKCKOKYfWZzyy97Y0VAzj/4RDgUJstb6xzRouaMJdFHXCDAWBByn DQxNCkObmSh8sKtGEJfbLihS0qbEvCZW5f54HkaKLE8i8B1tIAKzpaEFrcI63zpYXAnFx8ZL UNykcm06JPE9N0BtFkrcj/a1KWdqYxK+m4N10l8UAAaNj6e0rhfhqB7TPap/XNUSTbfwMWMD +58m5iLOV7WJil+p+rY6KoAkenGkF6tB5sx2ut2dBhOl4evPa5/5KVo0ngsMsxKgMdxS6Txe D8tDQcpYB4bS7nMKOtX8qoxGVqIAAAAAAAA= --------------ms010805070601040608020308-- From owner-freebsd-hackers@freebsd.org Sun Jul 3 19:37:28 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5939B90B8E for ; Sun, 3 Jul 2016 19:37:28 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 936D521BC; Sun, 3 Jul 2016 19:37:28 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by mail-io0-x230.google.com with SMTP id g13so137407693ioj.1; Sun, 03 Jul 2016 12:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=BDvzoTm8PMvSDcU2jfKDHIPOKMsfMHdc1yat19v2TpQ=; b=0WHFKMVgw+UY27TyyQt2SySIU6TN6gRN2azg8VppIH3Yb75GZraA15/S0j4ewRGVTe dnXRd0QmW1a2jUWRngAmgzJPKraF1DEqmcmqr2GNnjKmM6v85csTuFWgwaPNAdF1dorW Xf396ee6Ogy6T3oIFV88zAgsIRQ+CpoM0O0nfnRlPOqiROsDJdHGJdDl44Ixzmijzu1l EUnaXeXSEzQ9XTdsbsqgqElxJqxpz8vak7UUCCKwXDMbDFZNceNN2l8XO5CaX84rBAsC gtgGg+U4GHxuzjfH6Kmhd49LWfzCIQa527GAu9SI9Lw08LwIKlQb661eBOvmDTY71XvH Z2IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=BDvzoTm8PMvSDcU2jfKDHIPOKMsfMHdc1yat19v2TpQ=; b=KbLZlwB8PoEgvteTd42ej0f67uLArmoLLrmcepScHr17nqJR3mdKmg611TYpo6yiM2 swGJp4NReoGp51rzn77IMdBuGQGXiF8wenoOylVWJDHe3lCWTaGZ3XjLT+suxYRImd3j KXxuch/IXObZP8udMgMa7MEhSbbiZmQpPe5dEC1L9FCSXub36X3P4EhhPVDxPmYASP/c 6Eio4RG5mmrmwXCsUZZmLvakYkMnu8mdf70HZVoZ4jI3PYpBBWT1109IEEyi2BewTRtR D/z/Fqj5QUySN7x5hV4R1WKQvwxvIcVPWM4NSv9VXA/0uMVMzR/bu4vEqRluJYbdoEiL sLUA== X-Gm-Message-State: ALyK8tLIFAWUXJ7d3EK1bNeVEMstpb/gkvyxs1DE2MWwWHH1uLc0/0jJDKsvgFY68UW+bv+3+5fievJsUzbVGw== X-Received: by 10.107.144.86 with SMTP id s83mr5793822iod.165.1467574647939; Sun, 03 Jul 2016 12:37:27 -0700 (PDT) MIME-Version: 1.0 Sender: adrian.chadd@gmail.com Received: by 10.36.210.212 with HTTP; Sun, 3 Jul 2016 12:37:27 -0700 (PDT) In-Reply-To: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> References: <57761101.3030101@freebsd.org> <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> From: Adrian Chadd Date: Sun, 3 Jul 2016 12:37:27 -0700 X-Google-Sender-Auth: hmGAQPKUi-aXkQZh1CznYJTojpg Message-ID: Subject: Re: Review request: sparse CPU ID maps To: Nathan Whitehorn Cc: outro pessoa , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 19:37:28 -0000 On 2 July 2016 at 17:08, Nathan Whitehorn wrote: > A reasonable first pass at checking for this kind of bug is doing grep -lR > '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following > files: > arm/mv/armadaxp/armadaxp_mp.c > arm/include/counter.h > arm/broadcom/bcm2835/bcm2836.c > arm/broadcom/bcm2835/bcm2836_mp.c > arm/freescale/imx/imx6_mp.c > arm/allwinner/aw_mp.c > arm/rockchip/rk30xx_mp.c > arm/amlogic/aml8726/aml8726_mp.c > arm/samsung/exynos/exynos5_mp.c > arm/arm/mp_machdep.c > arm/nvidia/tegra124/tegra124_mp.c > arm64/include/counter.h > arm64/arm64/gic_v3.c > arm64/arm64/gic_v3_its.c > arm64/arm64/gicv3_its.c > > All of them should, in some sense, be CPU_FOREACH(), but it may not matter. > For example, it may not be possible to have sparse CPU IDs on some or all of > those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are > there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think. > -Nathan I think converting all the users over to the CPU_FOREACH thing is the right way to go, even if the SOC doesn't require it. People do bring up new systems by copy/pasta'ing an existing similar system, so we're best served by having all the consumers migrated. But, I'd do it in head/12. Early in head/12. :-P -adrian From owner-freebsd-hackers@freebsd.org Sun Jul 3 20:11:01 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4F26B9055E for ; Sun, 3 Jul 2016 20:11:01 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x229.google.com (mail-it0-x229.google.com [IPv6:2607:f8b0:4001:c0b::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6C8FC20BC for ; Sun, 3 Jul 2016 20:11:01 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x229.google.com with SMTP id f6so14268444ith.0 for ; Sun, 03 Jul 2016 13:11:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=/qpGdDeYnkbKhUu/2jW0D/26XASzkXIAXyt6OyfN9yc=; b=KqshAoCtzHR4UgEwmVQU7WNVDhIp4DiB/LOzsM3e2mL4Vgbn9AEAMF0i6/TBpJ8v1N kTibrhgut9PhReo641/VDtyHIQ7NoqupHlK0EVAkNUXevPJIO3Z3yYDblA/d29y9oS3C eEoYwkY4jWb2BIiXyVjJ9/E8gMjJgLrTwF6PdLhKKN77Yn7zPkY7IqIVdINn+4FtCjG8 PFtRbGUUWmVvP+8v3sG95THfg/TqvFD2TgEDQYGQoCk+y+URT/JUZwq4kGZFR3+fenTu geu+i1UG9venGzwW3ZQsneK2AjI5dYLDz0S4vHBXzYsG+F/EOx9YKuQjVjY3Dhs6Pzsy FGHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=/qpGdDeYnkbKhUu/2jW0D/26XASzkXIAXyt6OyfN9yc=; b=fwmk7cRNNpGzSEeQWRnoZlV7XdgpN4Qf6F5mZvwbfKVk4zRIxjK4Ms8YJ4npI+7kIt 0+/QiB7qmJk2kj8oovaHIYYzPZshfSGR/YLinJG366HGD5B/78cLIm54a4rXgsne7s47 GXUySfqUgz0oGBI7Si3Go3jQuVTScNVcD97aOgwaAnRb9QAcHmcfq5FPavUUoCllCgPM mdrUZflvdetREw6GOjjAyw0jVjT0hhoSwFmVVzs8t6/Chp7wGX6VA7dUL+S3e4YFSxpk O3rhywqfs4TVSwDqh+DHEjhoOPE3EBzfveoYMznWLQEHI16tJT6hJQRTt05N83fhsdIx Uarw== X-Gm-Message-State: ALyK8tIBj97vpcdC/5CFnStUU24/dfh53uWj9+wyu/xmv7RMnt2PBv64GlFq/rHWY67GwZZMJQRwxQZbNYAnxw== X-Received: by 10.36.41.16 with SMTP id p16mr7184342itp.60.1467576660661; Sun, 03 Jul 2016 13:11:00 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.137.131 with HTTP; Sun, 3 Jul 2016 13:11:00 -0700 (PDT) X-Originating-IP: [69.53.245.200] In-Reply-To: References: <57761101.3030101@freebsd.org> <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> From: Warner Losh Date: Sun, 3 Jul 2016 14:11:00 -0600 X-Google-Sender-Auth: 6dKLTeRARZHGUyjQWLKVe2aJG7k Message-ID: Subject: Re: Review request: sparse CPU ID maps To: Adrian Chadd Cc: Nathan Whitehorn , "freebsd-hackers@freebsd.org" , outro pessoa Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Jul 2016 20:11:01 -0000 On Sun, Jul 3, 2016 at 1:37 PM, Adrian Chadd wrote: > On 2 July 2016 at 17:08, Nathan Whitehorn wrote: >> A reasonable first pass at checking for this kind of bug is doing grep -lR >> '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following >> files: >> arm/mv/armadaxp/armadaxp_mp.c >> arm/include/counter.h >> arm/broadcom/bcm2835/bcm2836.c >> arm/broadcom/bcm2835/bcm2836_mp.c >> arm/freescale/imx/imx6_mp.c >> arm/allwinner/aw_mp.c >> arm/rockchip/rk30xx_mp.c >> arm/amlogic/aml8726/aml8726_mp.c >> arm/samsung/exynos/exynos5_mp.c >> arm/arm/mp_machdep.c >> arm/nvidia/tegra124/tegra124_mp.c >> arm64/include/counter.h >> arm64/arm64/gic_v3.c >> arm64/arm64/gic_v3_its.c >> arm64/arm64/gicv3_its.c >> >> All of them should, in some sense, be CPU_FOREACH(), but it may not matter. >> For example, it may not be possible to have sparse CPU IDs on some or all of >> those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are >> there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think. >> -Nathan > > I think converting all the users over to the CPU_FOREACH thing is the > right way to go, even if the SOC doesn't require it. People do bring > up new systems by copy/pasta'ing an existing similar system, so we're > best served by having all the consumers migrated. > > But, I'd do it in head/12. Early in head/12. :-P It is a mergeable change too, since it wouldn't change any APIs. At least the conversion to CPU_FOREACH. We don't want too many sweeping changes that can't be merged too early (that way leads to lots of maintenance issues), but we can do something like this. Merging would be optional, but possible, for those bits of the tree that need it. Though, for something like this, there's little against doing a full merge and a lot for it... Warner From owner-freebsd-hackers@freebsd.org Mon Jul 4 23:45:54 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C369B9135A for ; Mon, 4 Jul 2016 23:45:54 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7DFF72E23 for ; Mon, 4 Jul 2016 23:45:54 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467675943133327.7328959933959; Mon, 4 Jul 2016 16:45:43 -0700 (PDT) Date: Mon, 04 Jul 2016 16:45:43 -0700 From: Matthew Macy To: "Karl Denninger" Cc: "" Message-ID: <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> In-Reply-To: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> Subject: Re: ZFS ARC and mmap/page cache coherency question MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jul 2016 23:45:54 -0000 ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- > > On 7/3/2016 02:45, Matthew Macy wrote: > > > > Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M---- > > Wellllll.... > > I've done a fair bit of work here (see > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the > political issues are at least as bad as the coding ones. > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. -M From owner-freebsd-hackers@freebsd.org Tue Jul 5 02:26:32 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4733FB92403 for ; Tue, 5 Jul 2016 02:26:32 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 027F129DF for ; Tue, 5 Jul 2016 02:26:31 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 49DE7220E9A for ; Mon, 4 Jul 2016 21:26:22 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> To: freebsd-hackers@freebsd.org From: Karl Denninger Message-ID: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> Date: Mon, 4 Jul 2016 21:26:06 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms050507020202060509060304" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 02:26:32 -0000 This is a cryptographically signed message in MIME format. --------------ms050507020202060509060304 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/4/2016 18:45, Matthew Macy wrote: > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ----=20 > > =20 > > On 7/3/2016 02:45, Matthew Macy wrote:=20 > > > =20 > > > Cedric greatly overstates the intractability of resolv= ing it. Nonetheless, since the initial import very little has been done t= o improve integration, and I don't know of anyone who is up to the task t= aking an interest in it. Consequently, mmap() performance is likely "doom= ed" for the foreseeable future.-M---- =20 > > =20 > > Wellllll....=20 > > =20 > > I've done a fair bit of work here (see=20 > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the = > > political issues are at least as bad as the coding ones.=20 > > =20 > =20 > > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. = Have you ever tried disabling MFU caching to see how much worse LRU only = is? I'm not really convinced the ARC's benefits justify its cost. > > -M > The ARC is very useful when it gets a hit as it avoid an I/O that would otherwise take place. Where it sucks is when the system evicts working set to preserve ARC.=20 That's always wrong in that you're trading a speculative I/O (if the cache is hit later) for a *guaranteed* one (to page out) and maybe *two* (to page back in.) --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms050507020202060509060304 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjI2MDZaME8GCSqGSIb3DQEJBDFCBEB1 WHQWd/psHthhOmx/UBbTVc/rRuUJykgCh15FTom2W0LKTiXE9vmkdvRia04S+F+p55k/9neE 8y9/BNIXfflYMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAXmPT56G/ zjai5XBUmaTMX9oWvJ8MVZbiI7QZkjTW9daqQKWw2zmCuzWn3LanfCeFahHiCnGjl+N+mmdg tcdM59FD0wG0zEgU84n4fk3A3IowIg7iijKPB7B5lIJ7rby2jZ0ZJalJDePhhZBQAUwhC17t OaJ5bVrC0qGkwZLbbUqRZwWaH0ADCLav41CrXyVq1JwN+AJcMnq650Hr9m8Jj39rHR1S4r42 fmFz2QGKlE5E9JcfnOWg/RJnMe2KrpMUbwTMVHihyVm60Gi3ovOu73tuawHTgE83Wk/kB02R GrFc844M9HQm5FZ/jsOk+XxeK925HoK7ifocSHILrGmX3TRb7DYE+QpUVBACtu0RupzB/c8Q GNNB4MzwJX33x0eAVqRodHqG59F5GKpQWka3/KYMHzInk8jokKd6uxsvrRJ4TzGfOLypYyw2 MJ+qAyw3zGfyJDRh+ii9K+H3F7sK2R+vOUD4n5DrGUUtRYR7udk0TKI5/QS0x37qW3GWepRh exSAeWBiJSwVKc8NoMDjNgDQPLUwuhL2k4hlPD9osFCXn78m4s/rHMPaxNgqrGmu8JNwRONL +fJsdHWC2BEHvVFv/BHaRo2Ku0ZBE70e4Wk9R7jqDL+lbqEyricLKbqpxRxC7PxmkOrX7xB+ rK9jJU0+bA6IOhqqwK5SzAEqXEMAAAAAAAA= --------------ms050507020202060509060304-- From owner-freebsd-hackers@freebsd.org Tue Jul 5 02:28:29 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 266ABB924BA for ; Tue, 5 Jul 2016 02:28:29 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) by mx1.freebsd.org (Postfix) with ESMTP id 0993B2B18 for ; Tue, 5 Jul 2016 02:28:28 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (unknown [10.1.1.2]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id D92A1DBF5 for ; Tue, 5 Jul 2016 02:28:27 +0000 (UTC) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> From: Allan Jude Message-ID: <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> Date: Mon, 4 Jul 2016 22:28:27 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 02:28:29 -0000 On 2016-07-04 22:26, Karl Denninger wrote: > > > On 7/4/2016 18:45, Matthew Macy wrote: >> >> >> ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- >> > >> > On 7/3/2016 02:45, Matthew Macy wrote: >> > > >> > > Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M---- >> > >> > Wellllll.... >> > >> > I've done a fair bit of work here (see >> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the >> > political issues are at least as bad as the coding ones. >> > >> >> >> Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. >> >> -M >> > > The ARC is very useful when it gets a hit as it avoid an I/O that would > otherwise take place. > > Where it sucks is when the system evicts working set to preserve ARC. > That's always wrong in that you're trading a speculative I/O (if the > cache is hit later) for a *guaranteed* one (to page out) and maybe *two* > (to page back in.) > ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_target that makes sure the ARC is reined in when there is memory pressure, by ensuring a minimum amount of actually free pages. -- Allan Jude From owner-freebsd-hackers@freebsd.org Tue Jul 5 02:33:07 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5D6EB9261C for ; Tue, 5 Jul 2016 02:33:07 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 99EBA2E8D for ; Tue, 5 Jul 2016 02:33:07 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 62742220F1C for ; Mon, 4 Jul 2016 21:33:05 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> From: Karl Denninger Message-ID: <768b6169-70d9-5500-c455-563d8340972e@denninger.net> Date: Mon, 4 Jul 2016 21:32:49 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms010208080905010008010306" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 02:33:07 -0000 This is a cryptographically signed message in MIME format. --------------ms010208080905010008010306 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/4/2016 21:28, Allan Jude wrote: > On 2016-07-04 22:26, Karl Denninger wrote: >> >> On 7/4/2016 18:45, Matthew Macy wrote: >>> >>> ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ----=20 >>> > =20 >>> > On 7/3/2016 02:45, Matthew Macy wrote:=20 >>> > > =20 >>> > > Cedric greatly overstates the intractability of reso= lving it. Nonetheless, since the initial import very little has been done= to improve integration, and I don't know of anyone who is up to the task= taking an interest in it. Consequently, mmap() performance is likely "do= omed" for the foreseeable future.-M---- =20 >>> > =20 >>> > Wellllll....=20 >>> > =20 >>> > I've done a fair bit of work here (see=20 >>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and th= e=20 >>> > political issues are at least as bad as the coding ones.=20 >>> > =20 >>> =20 >>> >>> Strictly speaking, the root of the problem is the ARC. Not ZFS per se= =2E Have you ever tried disabling MFU caching to see how much worse LRU o= nly is? I'm not really convinced the ARC's benefits justify its cost. >>> >>> -M >>> >> The ARC is very useful when it gets a hit as it avoid an I/O that woul= d >> otherwise take place. >> >> Where it sucks is when the system evicts working set to preserve ARC. = >> That's always wrong in that you're trading a speculative I/O (if the >> cache is hit later) for a *guaranteed* one (to page out) and maybe *tw= o* >> (to page back in.) >> > ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_targe= t > that makes sure the ARC is reined in when there is memory pressure, by > ensuring a minimum amount of actually free pages. > Oh, but..... Again, go read the PR I linked (and the current version of the patch against 10-STABLE.) The issues are far more intertwined than that.=20 Specifically, the dmu_tx cache decision (size of the write-back cache) is flat-out broken and inappropriate in essentially all cases, and the interaction of UMA and ARC is very destructive under a wide variety of workloads. The patch has hack-around for the dmu_tx problem and a reasonably-effective fix for the UMA issues. Actually fixing dmu_tx, however, is nowhere near that easy since it really needs to be computed per-zvol on an actual bytes moved per-unit-of-time basis. Note that one of the patches in the set I developed is indeed arc_free_target (indeed it was the first approach I took) -- but without addressing the other two issues it doesn't solve the problem. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms010208080905010008010306 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjMyNDlaME8GCSqGSIb3DQEJBDFCBEAV KMg4AHaoLMMe3So8i486K5oIjQpUdwmN9cYYgIYYLy9rftciot7s+0SuIHL4A7n80GJv/oWg 9uzBwLxM8hbqMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAojhtlyEO eUqXIVLILPLpP1kqX5MZ+8QUJ+R/qE2CT/k59jwiQpxlfKinfuUoSJ/FIHjSlJX9Ky4x7bHg 9vulhAjnGvogIxmbdDLGGvbLhYb+hmWxTvugcLP8L3PrQ3Jya5hnmraZsU/ty84R40sO54Q1 ANySPdu0MtLhF8eMYPO2RmDk3I23WrSnQJpQlBwaBvO5hwPn5IZEVFJ2Cp3j0U02+O8Mvw2x MgiwWAE9uHfqVwRXwMVBfP+rnigZw03ocWf/GfLFvu7/Jz/Ce4KVuwx5Xt/jPIS2VRrLmL0F 1PTCH1MVgGsmDGn0EkwGIe6MyXbGoa1Ra/SoCAo9ROpGk/HlH1KPUlwGltJYtn8TzJBIHvWC nQ1kywEmPqU/8TXB4PgmcXqq6Wn9rR0rSi6cuwJzmswenV/UbD7pMhWHCOXK+23PQXxfv1vM rMmQKYrXSbyBDn8qfBMjenVwvAYt9S2wcz3JpDGCd+xRKV8AAXUGhRAe3uFmWTXn+qiAPF+k feMeDzeQ7UbCb2dM4JCuqtuef94qFAICgFzOg/FYcwQbcQu8blKvLDEDimhJEoUIk2nVgGG6 y+kxXbuGR3cNwu/mg4LoOx8DW9C8N9HNP9QSl4U4bG4VsGVSA+Vxd3eOw87TTGWOboavKUZM yhrT2cK0KifkNEBu5OyNwMEaOssAAAAAAAA= --------------ms010208080905010008010306-- From owner-freebsd-hackers@freebsd.org Tue Jul 5 02:36:35 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7357B926E8 for ; Tue, 5 Jul 2016 02:36:35 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) by mx1.freebsd.org (Postfix) with ESMTP id 7F469105D for ; Tue, 5 Jul 2016 02:36:35 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [192.168.1.10] (unknown [192.168.1.10]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 9633DDC0A for ; Tue, 5 Jul 2016 02:36:34 +0000 (UTC) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> From: Allan Jude Message-ID: Date: Mon, 4 Jul 2016 22:36:31 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <768b6169-70d9-5500-c455-563d8340972e@denninger.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 02:36:35 -0000 On 2016-07-04 22:32, Karl Denninger wrote: > On 7/4/2016 21:28, Allan Jude wrote: >> On 2016-07-04 22:26, Karl Denninger wrote: >>> >>> On 7/4/2016 18:45, Matthew Macy wrote: >>>> >>>> ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- >>>> > >>>> > On 7/3/2016 02:45, Matthew Macy wrote: >>>> > > >>>> > > Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M---- >>>> > >>>> > Wellllll.... >>>> > >>>> > I've done a fair bit of work here (see >>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the >>>> > political issues are at least as bad as the coding ones. >>>> > >>>> >>>> >>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. >>>> >>>> -M >>>> >>> The ARC is very useful when it gets a hit as it avoid an I/O that would >>> otherwise take place. >>> >>> Where it sucks is when the system evicts working set to preserve ARC. >>> That's always wrong in that you're trading a speculative I/O (if the >>> cache is hit later) for a *guaranteed* one (to page out) and maybe *two* >>> (to page back in.) >>> >> ZFS is better behaved in 11.x, there is a sysctl vfs.zfs.arc_free_target >> that makes sure the ARC is reined in when there is memory pressure, by >> ensuring a minimum amount of actually free pages. >> > Oh, but..... > > Again, go read the PR I linked (and the current version of the patch > against 10-STABLE.) The issues are far more intertwined than that. > Specifically, the dmu_tx cache decision (size of the write-back cache) > is flat-out broken and inappropriate in essentially all cases, and the > interaction of UMA and ARC is very destructive under a wide variety of > workloads. The patch has hack-around for the dmu_tx problem and a > reasonably-effective fix for the UMA issues. Actually fixing dmu_tx, > however, is nowhere near that easy since it really needs to be computed > per-zvol on an actual bytes moved per-unit-of-time basis. > > Note that one of the patches in the set I developed is indeed > arc_free_target (indeed it was the first approach I took) -- but without > addressing the other two issues it doesn't solve the problem. > You keep saying per zvol. Do you mean per vdev? I am under the impression that no zvol's are involved in the use case this thread is about. Improving the way ZFS frees memory, specifically UMA and the 'kmem caches' will help a lot as well. In addition, another patch just went in to allow you to change the arc_max and arc_min on a running system. -- Allan Jude From owner-freebsd-hackers@freebsd.org Tue Jul 5 02:46:47 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52EEFB92892 for ; Tue, 5 Jul 2016 02:46:47 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 017CD151C for ; Tue, 5 Jul 2016 02:46:46 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 19B8B220FDA for ; Mon, 4 Jul 2016 21:46:44 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> From: Karl Denninger Message-ID: Date: Mon, 4 Jul 2016 21:46:29 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms010409020104080906010506" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 02:46:47 -0000 This is a cryptographically signed message in MIME format. --------------ms010409020104080906010506 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/4/2016 21:36, Allan Jude wrote: > On 2016-07-04 22:32, Karl Denninger wrote: >> On 7/4/2016 21:28, Allan Jude wrote: >>> On 2016-07-04 22:26, Karl Denninger wrote: >>>> >>>> On 7/4/2016 18:45, Matthew Macy wrote: >>>>> >>>>> ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger >>>>> wrote ---- >>>>> > >>>>> > On 7/3/2016 02:45, Matthew Macy wrote: >>>>> > > >>>>> > > Cedric greatly overstates the intractability of >>>>> resolving it. Nonetheless, since the initial import very little >>>>> has been done to improve integration, and I don't know of anyone >>>>> who is up to the task taking an interest in it. Consequently, >>>>> mmap() performance is likely "doomed" for the foreseeable >>>>> future.-M---- >>>>> > >>>>> > Wellllll.... >>>>> > >>>>> > I've done a fair bit of work here (see >>>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and = the >>>>> > political issues are at least as bad as the coding ones. >>>>> > >>>>> >>>>> >>>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per >>>>> se. Have you ever tried disabling MFU caching to see how much >>>>> worse LRU only is? I'm not really convinced the ARC's benefits >>>>> justify its cost. >>>>> >>>>> -M >>>>> >>>> The ARC is very useful when it gets a hit as it avoid an I/O that >>>> would >>>> otherwise take place. >>>> >>>> Where it sucks is when the system evicts working set to preserve ARC= =2E >>>> That's always wrong in that you're trading a speculative I/O (if the= >>>> cache is hit later) for a *guaranteed* one (to page out) and maybe >>>> *two* >>>> (to page back in.) >>>> >>> ZFS is better behaved in 11.x, there is a sysctl >>> vfs.zfs.arc_free_target >>> that makes sure the ARC is reined in when there is memory pressure, b= y >>> ensuring a minimum amount of actually free pages. >>> >> Oh, but..... >> >> Again, go read the PR I linked (and the current version of the patch >> against 10-STABLE.) The issues are far more intertwined than that. >> Specifically, the dmu_tx cache decision (size of the write-back cache)= >> is flat-out broken and inappropriate in essentially all cases, and the= >> interaction of UMA and ARC is very destructive under a wide variety of= >> workloads. The patch has hack-around for the dmu_tx problem and a >> reasonably-effective fix for the UMA issues. Actually fixing dmu_tx, >> however, is nowhere near that easy since it really needs to be compute= d >> per-zvol on an actual bytes moved per-unit-of-time basis. >> >> Note that one of the patches in the set I developed is indeed >> arc_free_target (indeed it was the first approach I took) -- but witho= ut >> addressing the other two issues it doesn't solve the problem. >> > > You keep saying per zvol. Do you mean per vdev? I am under the > impression that no zvol's are involved in the use case this thread is > about. Sorry, per-vdev. The problem with dmu_tx is that it's system-wide.=20 This is wildly inappropriate for several reasons -- first, it is computed on size-of-RAM with a hard cap (which is stupid on its face) and it entirely insensitive to the performance of the vdev's in question. Specifically, it is very common for a system to have very fast (e.g. SSD) disks, perhaps in a mirror configuration, and then spinning rust in a RaidZ2 config for bulk storage. Those are very, very different performance wise and they should have wildly different write-back cache sizes. At present there is exactly one such write-back cache and it's both system-wide and pays exactly zero attention to the throughput of the underlying vdevs it is talking to. This is why you can provoke minute-long stalls on a system with moderate (e.g. 32GB) amounts of RAM if there are spinning rust devices in the configuration. > > Improving the way ZFS frees memory, specifically UMA and the 'kmem > caches' will help a lot as well. > Well, yeah. But that means you have to police up the size of the UMA =2Evs. how much is actually in use in the UMA. What the PR does is get pretty aggressive with that whenever RAM is tight, and before the pager can start playing hell with system performance. > In addition, another patch just went in to allow you to change the > arc_max and arc_min on a running system. > Yes, the PR I did a long time ago made that "active" on a running system.... so I've had that for quite some time. Not that you really ought to need to play with that (if you feel a need to then you're still at step 1 or 2 of what I went through with analyzing and working on this in the 10.x code.....) --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms010409020104080906010506 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUwMjQ2MjlaME8GCSqGSIb3DQEJBDFCBEAD QidOIbJLVCn4JDVQQmXLjHXBkph1n3i81pzVT6ckttaROoPA/2MTZQH3Bp6qaMZEHVS6RevL xClQSpCqnvEhMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAC47VkM+m ZQ2YAs6GfFwHC/bP3nsNN2feyRwnZMJ90eF4AL0Qm2H9KPNhoa0kDoNFQDEWl6AGeVj2gyxL Rk1HEX3m3f2RqZQqanMdBtIPe8P/AZxqMWOUErWBUES1ee1YMz50mqqAOUEcxBiYNFDMbFCN vwsqwHlIJdn2Rz+IYoUlUKlanTbSBXaODgKh7UjD4hAi917A7E67bOqwiAb9tp3cDjNRMEo4 dciyujK3tEHyEXmupTYvnXVOqT2kLjDxcxfiPDQF3B7tzTbHcStVCloTHCxSuvpZK3lfZhCB Xu84S3ZW/MmJF8CCl50b+Te0NWNJbc7yTRKHvS3b1Upb9U1jcXlbJF5OlFNJ3umazSTJoPoB TYKPkJBS8j3yfTnN4w+v5evrYaYpIFXSQ5KvAuMT87A7dDGUWpVx8EmrisTP2ZMYI4qSAPxb FwAeUTwxeI2hJ237gukNoNMb+eXDoMyn0FgAz6i4ngp2cpA6YAIghLYjhVYeaRMGSJj3ESSL d60a1QziYTAl2fbG644SoKBufKmQ43zMTFW0DdprnthW2S07K9NHXCVIDOxV4cun1yZMv54i 3zgGFXEdUaakTjUn4kF3F1vFuskPomi2ipZOyQwXngTH5molosR23Iwj9cSaPWho4jVY4wQG dRWjt/a65kWboAbhCuw+YMdXkiwAAAAAAAA= --------------ms010409020104080906010506-- From owner-freebsd-hackers@freebsd.org Tue Jul 5 03:01:20 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2ABD5B92ADF for ; Tue, 5 Jul 2016 03:01:20 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) by mx1.freebsd.org (Postfix) with ESMTP id 0B39E20FF for ; Tue, 5 Jul 2016 03:01:19 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [192.168.1.10] (unknown [192.168.1.10]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id E363BDC4E for ; Tue, 5 Jul 2016 03:01:18 +0000 (UTC) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> From: Allan Jude Message-ID: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org> Date: Mon, 4 Jul 2016 23:01:16 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 03:01:20 -0000 On 2016-07-04 22:46, Karl Denninger wrote: > On 7/4/2016 21:36, Allan Jude wrote: >> On 2016-07-04 22:32, Karl Denninger wrote: >>> On 7/4/2016 21:28, Allan Jude wrote: >>>> On 2016-07-04 22:26, Karl Denninger wrote: >>>>> >>>>> On 7/4/2016 18:45, Matthew Macy wrote: >>>>>> >>>>>> ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger >>>>>> wrote ---- >>>>>> > >>>>>> > On 7/3/2016 02:45, Matthew Macy wrote: >>>>>> > > >>>>>> > > Cedric greatly overstates the intractability of >>>>>> resolving it. Nonetheless, since the initial import very little >>>>>> has been done to improve integration, and I don't know of anyone >>>>>> who is up to the task taking an interest in it. Consequently, >>>>>> mmap() performance is likely "doomed" for the foreseeable >>>>>> future.-M---- >>>>>> > >>>>>> > Wellllll.... >>>>>> > >>>>>> > I've done a fair bit of work here (see >>>>>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the >>>>>> > political issues are at least as bad as the coding ones. >>>>>> > >>>>>> >>>>>> >>>>>> Strictly speaking, the root of the problem is the ARC. Not ZFS per >>>>>> se. Have you ever tried disabling MFU caching to see how much >>>>>> worse LRU only is? I'm not really convinced the ARC's benefits >>>>>> justify its cost. >>>>>> >>>>>> -M >>>>>> >>>>> The ARC is very useful when it gets a hit as it avoid an I/O that >>>>> would >>>>> otherwise take place. >>>>> >>>>> Where it sucks is when the system evicts working set to preserve ARC. >>>>> That's always wrong in that you're trading a speculative I/O (if the >>>>> cache is hit later) for a *guaranteed* one (to page out) and maybe >>>>> *two* >>>>> (to page back in.) >>>>> >>>> ZFS is better behaved in 11.x, there is a sysctl >>>> vfs.zfs.arc_free_target >>>> that makes sure the ARC is reined in when there is memory pressure, by >>>> ensuring a minimum amount of actually free pages. >>>> >>> Oh, but..... >>> >>> Again, go read the PR I linked (and the current version of the patch >>> against 10-STABLE.) The issues are far more intertwined than that. >>> Specifically, the dmu_tx cache decision (size of the write-back cache) >>> is flat-out broken and inappropriate in essentially all cases, and the >>> interaction of UMA and ARC is very destructive under a wide variety of >>> workloads. The patch has hack-around for the dmu_tx problem and a >>> reasonably-effective fix for the UMA issues. Actually fixing dmu_tx, >>> however, is nowhere near that easy since it really needs to be computed >>> per-zvol on an actual bytes moved per-unit-of-time basis. >>> >>> Note that one of the patches in the set I developed is indeed >>> arc_free_target (indeed it was the first approach I took) -- but without >>> addressing the other two issues it doesn't solve the problem. >>> >> >> You keep saying per zvol. Do you mean per vdev? I am under the >> impression that no zvol's are involved in the use case this thread is >> about. > Sorry, per-vdev. The problem with dmu_tx is that it's system-wide. > This is wildly inappropriate for several reasons -- first, it is > computed on size-of-RAM with a hard cap (which is stupid on its face) > and it entirely insensitive to the performance of the vdev's in > question. Specifically, it is very common for a system to have very > fast (e.g. SSD) disks, perhaps in a mirror configuration, and then > spinning rust in a RaidZ2 config for bulk storage. Those are very, very > different performance wise and they should have wildly different > write-back cache sizes. At present there is exactly one such write-back > cache and it's both system-wide and pays exactly zero attention to the > throughput of the underlying vdevs it is talking to. > > This is why you can provoke minute-long stalls on a system with moderate > (e.g. 32GB) amounts of RAM if there are spinning rust devices in the > configuration. > >> >> Improving the way ZFS frees memory, specifically UMA and the 'kmem >> caches' will help a lot as well. >> > Well, yeah. But that means you have to police up the size of the UMA > .vs. how much is actually in use in the UMA. What the PR does is get > pretty aggressive with that whenever RAM is tight, and before the pager > can start playing hell with system performance. > >> In addition, another patch just went in to allow you to change the >> arc_max and arc_min on a running system. >> > Yes, the PR I did a long time ago made that "active" on a running > system.... so I've had that for quite some time. Not that you really > ought to need to play with that (if you feel a need to then you're still > at step 1 or 2 of what I went through with analyzing and working on this > in the 10.x code.....) > Have you looked into the the ZFS 'Write Throttle', it seems like it was meant to solve the writeback problem you are describing. It starts sending back pressure up to the application by introducing larger and larger delays in the write() call until your disks can keep up with your applications. http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/ http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/ -- Allan Jude From owner-freebsd-hackers@freebsd.org Tue Jul 5 05:26:27 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A52DB855D3 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 79E1218E4 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id 7274FB855D1; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 721FBB855D0 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-it0-x22e.google.com (mail-it0-x22e.google.com [IPv6:2607:f8b0:4001:c0b::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 45BE718E1 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-it0-x22e.google.com with SMTP id j185so7707757ith.1 for ; Mon, 04 Jul 2016 22:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:from:date:message-id:subject:to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=OBUujVrFLBLTktJQpJGrjIQMCw7UscFvYmwt7KbLsJT4p4x11E4lHJWuEQ1Dp7QxC+ oya5gvOmKhwp0EPyuWAMRo/mjSz5Sg8SutyhwpNlLkeUfOrpINIuB8etQgUtY/WhngJF wt6HQWusvyDAtmJh1DTT5RxZSBUHlxo5hSQAdaPvb9n4r9JSZaEtVbdPFjDaSbrpcdOk DPWFPO440lZfqIzOO3sNzT0Vd136u9YmML6E21WvUzUbjgZFu9WebUbUru5trpfCWK9u VeotiqfZW5OtZ8NqSojSf+Z6q/KagZACAxbVorZejJZ2g0HoymUMrjce1FuD1F8kNH6q GQBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=hNfcxSYFCRBwR1PyZfnOSZ5Y8ADW75hle0K1m6sOIpSSUkeu0T4CFqx065insmI14T 8mCBK5/k1macOPFeByZHV+yrc9F/xGraV21oyig3WHa9MRvGE0eXsXHdl/l+0KZWEaH0 14koeUqV32GnW2gk12boXFYRd0EvSz5Em8MwSYCl7KbhVxTz3C4Fju4Xp5ik+gK5CWJf 5dwD9RUfP5dW3b13J6MS75sSUG2hdcFQN173rf6GYmYVYXP3lNuUtmaSUqlGDh4AHYh2 gYcD5WmZm2XZmaJWqYniLPypoqgxLtjSVPjMDUo5ZiCdT4gM78N6+iDlUYJIzv+ZAK6L mvHw== X-Gm-Message-State: ALyK8tJLSDKS0xyr7b3alrR2X9sI1Fkf+EpqkZEf0P9inGtB1vtTQTihmnhKaSTSFbVH2z9QxlpgyFQn+THcKRQG X-Received: by 10.36.91.66 with SMTP id g63mr11055580itb.16.1467696386364; Mon, 04 Jul 2016 22:26:26 -0700 (PDT) MIME-Version: 1.0 Sender: sobomax@sippysoft.com Received: by 10.36.59.193 with HTTP; Mon, 4 Jul 2016 22:26:25 -0700 (PDT) From: Maxim Sobolev Date: Mon, 4 Jul 2016 22:26:25 -0700 X-Google-Sender-Auth: DIV2CM4kakl33DY5WwTrB3JuOKg Message-ID: Subject: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: stable@freebsd.org, hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 05:26:27 -0000 Hi all, investigating some random postgresql-9.1.21 server crashes on FreeBSD 10.3, we've started seeing those after upgrading from postgres 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very unlikely. I suspect that postgres is at fault, however I am also curious how could it be that kernel is not capable of generating core file when application does something silly? Is it that some ELF-related data structures got corrupted or something else? Are we protecting the page where ELF header is mapped with R/O flag? I am looking at possibly recreating this by poking around elf header(s), seeing if I can corrupt it in a similar manner reliably, any pointers or suggestions are appreciated. Jun 27 04:10:18 dal12 kernel: Failed to write core file for process postgres (error 14) Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on signal 11 Jul 1 05:21:46 dal12 kernel: Failed to write core file for process postgres (error 14) Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal 11 #define EFAULT 14 /* Bad address */ The resulting files are truncated and is not really usable for anything. We've seen the same issue -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd10.3". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from postgres...(no debugging symbols found)...done. BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file size >= 517120000, found: 1310720. [New LWP 100261] Core was generated by `postgres'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 (gdb) where #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 (gdb) q -Max From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:14:22 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B1ACB21827 for ; Tue, 5 Jul 2016 11:14:22 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0823F1F5A for ; Tue, 5 Jul 2016 11:14:21 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 8ED22153413; Tue, 5 Jul 2016 13:14:18 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Vdgp9rAVpD23; Tue, 5 Jul 2016 13:13:49 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 3475D15340A for ; Tue, 5 Jul 2016 13:13:49 +0200 (CEST) To: FreeBSD Hackers From: Willem Jan Withagen Subject: Problem during dlopen() Message-ID: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> Date: Tue, 5 Jul 2016 13:13:42 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:14:22 -0000 Hi, I'm banging my head agains the wall because I cannot seem to get this working. The problem is due to changing from automake to cmake building. But all my dlopens start failing with something like: load failed dlopen(build/lib/compressor/libceph_snappy.so) or dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" If do a lookup for the name: nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl if give me: U _ZN4ceph6buffer4list8iterator7advanceEl The parent/calling executable however has: 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl Clearly dlopen is not able to match these 2 and succeed. Question: So on which part of the building is what switch missing. Thanx, --WjW From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:45:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AB1FB21E74; Tue, 5 Jul 2016 11:45:25 +0000 (UTC) (envelope-from gahr@FreeBSD.org) Received: from mail.ptrcrt.ch (gahr.cloud.tilaa.com [84.22.109.158]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5EF211C9F; Tue, 5 Jul 2016 11:45:22 +0000 (UTC) (envelope-from gahr@FreeBSD.org) Received: from webmail.ptrcrt.ch (www.gahr.ch [192.168.1.2]) by mail.ptrcrt.ch (OpenSMTPD) with ESMTPSA id c381a7a4 TLS version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO; Tue, 5 Jul 2016 11:45:14 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 05 Jul 2016 13:45:13 +0200 From: Pietro Cerutti To: Willem Jan Withagen Cc: FreeBSD Hackers , owner-freebsd-hackers@freebsd.org Subject: Re: Problem during dlopen() Organization: The FreeBSD Project In-Reply-To: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> Message-ID: <416028b6b2a1dffe4e010b5792c56100@gahr.ch> X-Sender: gahr@FreeBSD.org User-Agent: Roundcube Webmail/1.2.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:45:25 -0000 On 2016-07-05 13:13, Willem Jan Withagen wrote: > Hi, > > I'm banging my head agains the wall because I cannot seem to get this > working. > > The problem is due to changing from automake to cmake building. > > But all my dlopens start failing with something like: > load failed dlopen(build/lib/compressor/libceph_snappy.so) or > dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: > Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" > > If do a lookup for the name: > nm build/lib/libceph_snappy.so |grep > ceph6buffer4list8iterator7advanceEl > > if give me: > U _ZN4ceph6buffer4list8iterator7advanceEl > > The parent/calling executable however has: > 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl > > Clearly dlopen is not able to match these 2 and succeed. > > Question: > So on which part of the building is what switch missing. Wild guess: -Wl,-E linking the executable. -- Pietro Cerutti gahr@FreeBSD.org PGP Public Key: http://gahr.ch/pgp From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:48:16 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F91DB21F9C for ; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 57DF01DF9 for ; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 53318B21F98; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52C25B21F97; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F11141DF8; Tue, 5 Jul 2016 11:48:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u65Bm9b6022894 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 5 Jul 2016 14:48:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u65Bm9b6022894 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u65Bm8AJ022893; Tue, 5 Jul 2016 14:48:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 5 Jul 2016 14:48:08 +0300 From: Konstantin Belousov To: Maxim Sobolev Cc: stable@freebsd.org, hackers@freebsd.org Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) Message-ID: <20160705114808.GN38613@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:48:16 -0000 On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: > Hi all, investigating some random postgresql-9.1.21 server crashes on > FreeBSD 10.3, we've started seeing those after upgrading from postgres > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very > unlikely. I suspect that postgres is at fault, however I am also curious > how could it be that kernel is not capable of generating core file when > application does something silly? Is it that some ELF-related data > structures got corrupted or something else? Are we protecting the page > where ELF header is mapped with R/O flag? I am looking at possibly > recreating this by poking around elf header(s), seeing if I can corrupt it > in a similar manner reliably, any pointers or suggestions are appreciated. > > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on > signal 11 > Jul 1 05:21:46 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal > 11 > > #define EFAULT 14 /* Bad address */ > > The resulting files are truncated and is not really usable for anything. > We've seen the same issue > > -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core > -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core > > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] > Copyright (C) 2016 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-portbld-freebsd10.3". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from postgres...(no debugging symbols found)...done. > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file > size >= 517120000, found: 1310720. > [New LWP 100261] > Core was generated by `postgres'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > (gdb) where > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 > (gdb) q > https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:51:08 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEBE1B21180 for ; Tue, 5 Jul 2016 11:51:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5F39F122E for ; Tue, 5 Jul 2016 11:51:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u65BowoR023945 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 5 Jul 2016 14:50:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u65BowoR023945 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u65BovOp023944; Tue, 5 Jul 2016 14:50:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 5 Jul 2016 14:50:57 +0300 From: Konstantin Belousov To: Willem Jan Withagen Cc: FreeBSD Hackers Subject: Re: Problem during dlopen() Message-ID: <20160705115057.GO38613@kib.kiev.ua> References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:51:09 -0000 On Tue, Jul 05, 2016 at 01:13:42PM +0200, Willem Jan Withagen wrote: > Hi, > > I'm banging my head agains the wall because I cannot seem to get this > working. > > The problem is due to changing from automake to cmake building. > > But all my dlopens start failing with something like: > load failed dlopen(build/lib/compressor/libceph_snappy.so) or > dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: > Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" > > If do a lookup for the name: > nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl > > if give me: > U _ZN4ceph6buffer4list8iterator7advanceEl > > The parent/calling executable however has: > 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl Are you sure ? In which symbol table (dynamic or debug) the referenced symbol appear ? Check with nm -D. If it is in debug table, you need --export-dynamic linker switch when creating the binary defining the symbols. > > Clearly dlopen is not able to match these 2 and succeed. Why are you sure that the dynamic linker at fault ? > > Question: > So on which part of the building is what switch missing. > > Thanx, > --WjW > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:53:29 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0768BB21348; Tue, 5 Jul 2016 11:53:29 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C3B2115B1; Tue, 5 Jul 2016 11:53:28 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 128CE1534C7; Tue, 5 Jul 2016 13:53:26 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5ToSpg2XROY4; Tue, 5 Jul 2016 13:53:16 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 3A9B3153413; Tue, 5 Jul 2016 13:53:16 +0200 (CEST) Subject: Re: Problem during dlopen() To: Pietro Cerutti References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> <416028b6b2a1dffe4e010b5792c56100@gahr.ch> Cc: FreeBSD Hackers , owner-freebsd-hackers@freebsd.org From: Willem Jan Withagen Message-ID: <0aa90be4-4a10-9477-e550-a0e399d97216@digiware.nl> Date: Tue, 5 Jul 2016 13:53:09 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <416028b6b2a1dffe4e010b5792c56100@gahr.ch> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:53:29 -0000 On 5-7-2016 13:45, Pietro Cerutti wrote: > On 2016-07-05 13:13, Willem Jan Withagen wrote: >> Hi, >> >> I'm banging my head agains the wall because I cannot seem to get this >> working. >> >> The problem is due to changing from automake to cmake building. >> >> But all my dlopens start failing with something like: >> load failed dlopen(build/lib/compressor/libceph_snappy.so) or >> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: >> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" >> >> If do a lookup for the name: >> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl >> >> if give me: >> U _ZN4ceph6buffer4list8iterator7advanceEl >> >> The parent/calling executable however has: >> 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl >> >> Clearly dlopen is not able to match these 2 and succeed. >> >> Question: >> So on which part of the building is what switch missing. > > Wild guess: -Wl,-E linking the executable. > Any guess is a good guess to try. :) Will give it a shot. --WjW From owner-freebsd-hackers@freebsd.org Tue Jul 5 12:00:36 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5761DB21960; Tue, 5 Jul 2016 12:00:36 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1BD9F19DC; Tue, 5 Jul 2016 12:00:35 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 7E98A1534C7; Tue, 5 Jul 2016 14:00:33 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o8hmxt6eZzFK; Tue, 5 Jul 2016 14:00:06 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 6868E153413; Tue, 5 Jul 2016 14:00:06 +0200 (CEST) Subject: Re: Problem during dlopen() To: Pietro Cerutti References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> <416028b6b2a1dffe4e010b5792c56100@gahr.ch> Cc: FreeBSD Hackers , owner-freebsd-hackers@freebsd.org From: Willem Jan Withagen Message-ID: <206facb4-cb1c-e52c-b387-3344c41c12e7@digiware.nl> Date: Tue, 5 Jul 2016 13:59:59 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <416028b6b2a1dffe4e010b5792c56100@gahr.ch> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 12:00:36 -0000 On 5-7-2016 13:45, Pietro Cerutti wrote: > On 2016-07-05 13:13, Willem Jan Withagen wrote: >> Hi, >> >> I'm banging my head agains the wall because I cannot seem to get this >> working. >> >> The problem is due to changing from automake to cmake building. >> >> But all my dlopens start failing with something like: >> load failed dlopen(build/lib/compressor/libceph_snappy.so) or >> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: >> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" >> >> If do a lookup for the name: >> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl >> >> if give me: >> U _ZN4ceph6buffer4list8iterator7advanceEl >> >> The parent/calling executable however has: >> 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl >> >> Clearly dlopen is not able to match these 2 and succeed. >> >> Question: >> So on which part of the building is what switch missing. > > Wild guess: -Wl,-E linking the executable. > And bonus point for Pietro. I did have that switch, but I had it on the lib that needed to be loaded. So: Too many switches/options/flags in Cmake for my taste. --WjW From owner-freebsd-hackers@freebsd.org Tue Jul 5 12:23:27 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EAD9B715A6 for ; Tue, 5 Jul 2016 12:23:27 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4A44B17CA for ; Tue, 5 Jul 2016 12:23:27 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id ED757153402; Tue, 5 Jul 2016 14:23:15 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S7BUwGztjqfR; Tue, 5 Jul 2016 14:22:48 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id D9AA315340A; Tue, 5 Jul 2016 14:22:48 +0200 (CEST) Subject: Re: Problem during dlopen() To: Konstantin Belousov References: <5e29e535-f91f-35fb-2a7e-324bb19b658f@digiware.nl> <20160705115057.GO38613@kib.kiev.ua> Cc: FreeBSD Hackers From: Willem Jan Withagen Message-ID: <3570efcb-f106-95ba-52da-972a55d2fc33@digiware.nl> Date: Tue, 5 Jul 2016 14:22:42 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160705115057.GO38613@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 12:23:27 -0000 On 5-7-2016 13:50, Konstantin Belousov wrote: > On Tue, Jul 05, 2016 at 01:13:42PM +0200, Willem Jan Withagen wrote: >> Hi, >> >> I'm banging my head agains the wall because I cannot seem to get this >> working. >> >> The problem is due to changing from automake to cmake building. >> >> But all my dlopens start failing with something like: >> load failed dlopen(build/lib/compressor/libceph_snappy.so) or >> dlopen(build/lib/libceph_snappy.so): build/lib/libceph_snappy.so: >> Undefined symbol "_ZN4ceph6buffer4list8iterator7advanceEl" >> >> If do a lookup for the name: >> nm build/lib/libceph_snappy.so |grep ceph6buffer4list8iterator7advanceEl >> >> if give me: >> U _ZN4ceph6buffer4list8iterator7advanceEl >> >> The parent/calling executable however has: >> 0000000000513de0 T _ZN4ceph6buffer4list8iterator7advanceEl > Are you sure ? In which symbol table (dynamic or debug) the referenced > symbol appear ? Check with nm -D. If it is in debug table, you need > --export-dynamic linker switch when creating the binary defining the > symbols. You are getting the other half of the point that Pietro got. I did a blunt nm not excluding any tables... Thanx, --WjW From owner-freebsd-hackers@freebsd.org Tue Jul 5 14:31:12 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D26F8B7382F for ; Tue, 5 Jul 2016 14:31:12 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 850161967 for ; Tue, 5 Jul 2016 14:31:11 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id A758122073F for ; Tue, 5 Jul 2016 09:31:08 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <34cf2d30-8884-95b6-f852-457d55710daf@freebsd.org> <768b6169-70d9-5500-c455-563d8340972e@denninger.net> <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org> From: Karl Denninger Message-ID: Date: Tue, 5 Jul 2016 09:30:51 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <272d657a-52ae-4f45-008c-3de6fb1b0c48@freebsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms050604050603000003050505" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 14:31:12 -0000 This is a cryptographically signed message in MIME format. --------------ms050604050603000003050505 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/4/2016 22:01, Allan Jude wrote: > On 2016-07-04 22:46, Karl Denninger wrote: >> >>> You keep saying per zvol. Do you mean per vdev? I am under the >>> impression that no zvol's are involved in the use case this thread is= >>> about. >> Sorry, per-vdev. The problem with dmu_tx is that it's system-wide. >> This is wildly inappropriate for several reasons -- first, it is >> computed on size-of-RAM with a hard cap (which is stupid on its face) >> and it entirely insensitive to the performance of the vdev's in >> question. Specifically, it is very common for a system to have very >> fast (e.g. SSD) disks, perhaps in a mirror configuration, and then >> spinning rust in a RaidZ2 config for bulk storage. Those are very, ve= ry >> different performance wise and they should have wildly different >> write-back cache sizes. At present there is exactly one such write-ba= ck >> cache and it's both system-wide and pays exactly zero attention to the= >> throughput of the underlying vdevs it is talking to. >> >> This is why you can provoke minute-long stalls on a system with modera= te >> (e.g. 32GB) amounts of RAM if there are spinning rust devices in the >> configuration. >> >>> >>> Improving the way ZFS frees memory, specifically UMA and the 'kmem >>> caches' will help a lot as well. >>> >> Well, yeah. But that means you have to police up the size of the UMA >> .vs. how much is actually in use in the UMA. What the PR does is get >> pretty aggressive with that whenever RAM is tight, and before the page= r >> can start playing hell with system performance. >> >>> In addition, another patch just went in to allow you to change the >>> arc_max and arc_min on a running system. >>> >> Yes, the PR I did a long time ago made that "active" on a running >> system.... so I've had that for quite some time. Not that you really >> ought to need to play with that (if you feel a need to then you're sti= ll >> at step 1 or 2 of what I went through with analyzing and working on th= is >> in the 10.x code.....) >> > > Have you looked into the the ZFS 'Write Throttle', it seems like it > was meant to solve the writeback problem you are describing. It starts > sending back pressure up to the application by introducing larger and > larger delays in the write() call until your disks can keep up with > your applications. > > http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/ > > http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/ > I believe this has been brought into FreeBSD's implementation; I recall going through it. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms050604050603000003050505 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNDMwNTFaME8GCSqGSIb3DQEJBDFCBECD rQB1crTWkeBbJPZtcru08rZBv2y3HIBGXLi38ruOrCBCfXJffBJCfKv+LJJoL5pA1fPPkQEx sS4V/gDp1k0CMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAkWqI9kAf JqhCKltoYcw3vBqB11I+6iVlqd568CK7FKreL7vQG94rKOtSgR/gP7b2rpqqiteBS1sPsqpQ rKdhHo8HM5sVECzMKqbXq7SHSUSAt+UPWH44qcjUyqNNW7HP6EezceFMm6Ree6n+FgNyPa6O LY+yZVp2vSCg6h115plY6Jeq5fiMKyVNxbycr2M4f597OrwwbNCXGVktIrItgBmSU7jrt8yS n8OhRd99N37bHJh7wqZ8EnGElTa2ENFQJ0uw0xSGhrV6EtzJdHEaWhSjmVaneY/9MPTQMuFz P7H7X0P1QA2257RGp3ZZte18De2HwaG2d+uNkBHZrcD9VeOrCDjJiyQGsLiGq1vKiE2C4k+m qVygGO03+9+9tpQY78tMwl7rHtL7QQ4pVI7toX5UVN3Ny/OMapF6wBx/8OmY4gWg8QmAbMJE rPzVXad+JjN+11+xr+H53YQWd5fox78I3yO8PKdh3RGJ7Ffgtb4k829OOpM8HOonKti0OhxK aQIP/KTEx30mx4zIimK9kkW4ETitkyrQhFGjjeonlszqQH4NlVrkwzf1J1Ac5U3Za8wvdj5K rxrFRTUCinfRodbGHZqu9BsUyAXBjeUu71X2N4arrM3xNLjq8o+5qnYKUXILVT+G4iBeZRWX ofcCv0D8vbizqSD440lbs0ToWg4AAAAAAAA= --------------ms050604050603000003050505-- From owner-freebsd-hackers@freebsd.org Tue Jul 5 14:43:56 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16DDEB73F28 for ; Tue, 5 Jul 2016 14:43:56 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E733E16B6 for ; Tue, 5 Jul 2016 14:43:55 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id E328DB73F23; Tue, 5 Jul 2016 14:43:55 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E2D94B73F22 for ; Tue, 5 Jul 2016 14:43:55 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6EE4416B4 for ; Tue, 5 Jul 2016 14:43:55 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-wm0-x22e.google.com with SMTP id a66so155961516wme.0 for ; Tue, 05 Jul 2016 07:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=2UxvES/Nt7XSNT8IovIwmSfj/iLLw0k+Fi0kBHPpE6I=; b=IFLFUwDx8zKlMBfILlOHHuCZXbEzUwnoVgM6s/xCae0Zn6Cs8RctxH8G56cOU3Yl1Q JW4KXOS6chooPaGgE7MMdk9+CfEAzIBIBDn7SVxnLApXv7v1Mje6kXVI4q4FU+4roQjI 2tiJyp2BLf+iRaZ5W+EJYuyY8dGpSjVb8z23HqxR+8QI380vxS62n8oNKH35JC+KtpXO y7pyo9L/86xb/ASgSG6q0ANQRUA5fprNfN4zvu1zWaYT3ZI6AmMzkCEBoUQ9QtPn8kCs vsEBVGGrSBHqGI8/TmgumYS0nn3q/wU4QcArcSM5Uo1OtOEbojuXeuORV33j2YxcBfnb kiaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=2UxvES/Nt7XSNT8IovIwmSfj/iLLw0k+Fi0kBHPpE6I=; b=Jr607GVVhqjXPop7ThIuWcnZ+9nuuX4JxDz+Zvfpsgng9CVO+BDce8Xx6RpewJJbXa h4/pYdcOIjk20mW5tDtc3cW634Wym8P24LClDAhv5uvmYmjNuaLNyZ6BW/wJ7QJpioed bAgZ9t4rGHuywafEOHTnNrQ5khHYuuqI6Qlf3rKKqimbSVkyNwmyi4mt9qlP8R8+it3c iw1PMKDvSEwnIP1217312KNUGHjSfdoIl5PTNlEl0yHUMqKn+c3x8/y6AmSai+KHbtXx qGzaOABXIhLCvOG7ZSuHTu1/Wt63G9VJS8yNet7sVt1S46z08e+XA8ZfpljaY3fc2XED dk3Q== X-Gm-Message-State: ALyK8tJ+XCb3KwTX6qedYY/A3/2H8ZhlUDgH2BpZflTvqhjYYnNZCGAYc9RcLSVr4mA/4BaS2pL1/ZtQ+3bGaJRI X-Received: by 10.194.150.167 with SMTP id uj7mr16004821wjb.168.1467729833520; Tue, 05 Jul 2016 07:43:53 -0700 (PDT) MIME-Version: 1.0 Sender: sobomax@sippysoft.com Received: by 10.194.96.173 with HTTP; Tue, 5 Jul 2016 07:43:52 -0700 (PDT) In-Reply-To: <20160705114808.GN38613@kib.kiev.ua> References: <20160705114808.GN38613@kib.kiev.ua> From: Maxim Sobolev Date: Tue, 5 Jul 2016 07:43:52 -0700 X-Google-Sender-Auth: E-lV_8x0x9_v8XbroxQzrseI7GE Message-ID: Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: Konstantin Belousov Cc: stable@freebsd.org, hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 14:43:56 -0000 Seems like candidate for the MFC into releng/10.3 and appropriate errata entry? -Max On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov wrote: > On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: > > Hi all, investigating some random postgresql-9.1.21 server crashes on > > FreeBSD 10.3, we've started seeing those after upgrading from postgres > > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very > > unlikely. I suspect that postgres is at fault, however I am also curious > > how could it be that kernel is not capable of generating core file when > > application does something silly? Is it that some ELF-related data > > structures got corrupted or something else? Are we protecting the page > > where ELF header is mapped with R/O flag? I am looking at possibly > > recreating this by poking around elf header(s), seeing if I can corrupt > it > > in a similar manner reliably, any pointers or suggestions are > appreciated. > > > > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process > > postgres (error 14) > > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on > > signal 11 > > Jul 1 05:21:46 dal12 kernel: Failed to write core file for process > > postgres (error 14) > > Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on > signal > > 11 > > > > #define EFAULT 14 /* Bad address */ > > > > The resulting files are truncated and is not really usable for anything. > > We've seen the same issue > > > > -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 > postgres.41361.core > > -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 > postgres.1722.core > > > > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core > > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] > > Copyright (C) 2016 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html > > > > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-portbld-freebsd10.3". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > . > > Find the GDB manual and other documentation resources online at: > > . > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from postgres...(no debugging symbols found)...done. > > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core > file > > size >= 517120000, found: 1310720. > > [New LWP 100261] > > Core was generated by `postgres'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > > (gdb) where > > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 > > (gdb) q > > > https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html > > From owner-freebsd-hackers@freebsd.org Tue Jul 5 17:19:45 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 655BDB72D29 for ; Tue, 5 Jul 2016 17:19:45 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 55E061DDD for ; Tue, 5 Jul 2016 17:19:44 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467739169055520.7792599844186; Tue, 5 Jul 2016 10:19:29 -0700 (PDT) Date: Tue, 05 Jul 2016 10:19:28 -0700 From: Matthew Macy To: "Karl Denninger" Cc: "" Message-ID: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> In-Reply-To: <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> Subject: Re: ZFS ARC and mmap/page cache coherency question MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 17:19:45 -0000 ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger wrote ---- > > > On 7/4/2016 18:45, Matthew Macy wrote: > > > > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- > > > > > > On 7/3/2016 02:45, Matthew Macy wrote: > > > > > > > > Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M---- > > > > > > Wellllll.... > > > > > > I've done a fair bit of work here (see > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the > > > political issues are at least as bad as the coding ones. > > > > > > > > > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost. > > > > -M > > > > The ARC is very useful when it gets a hit as it avoid an I/O that would > otherwise take place. > > Where it sucks is when the system evicts working set to preserve ARC. > That's always wrong in that you're trading a speculative I/O (if the > cache is hit later) for a *guaranteed* one (to page out) and maybe *two* > (to page back in.) The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. There are a lot of issues stemming from the fact that ZFS is a transactional object store with a POSIX FS on top. One is that it caches disk blocks as opposed to file blocks. However, if one could resolve that and have the page cache manage these blocks life would be much much better. However, you'd lose MFU. Hence my question. -M From owner-freebsd-hackers@freebsd.org Tue Jul 5 17:35:14 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 264E9B73113 for ; Tue, 5 Jul 2016 17:35:14 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qt0-x22e.google.com (mail-qt0-x22e.google.com [IPv6:2607:f8b0:400d:c0d::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D131617E2 for ; Tue, 5 Jul 2016 17:35:13 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by mail-qt0-x22e.google.com with SMTP id c34so104445492qte.0 for ; Tue, 05 Jul 2016 10:35:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=jnvYUDvcX2tBjz/VzmuPZ861AVn6KjURI8n5MId/F6I=; b=RmLU+AsQD2KGtVn22pW5mYwDHbUvoap4cVNPkQRXsCmEghE2Z2n/Pcv6suEJZNjgqE gWwXIVlQVAictZYBGKa/Vbyt/CRlcSFet6XpEvS2vn/vmoKXuovj/q0ff6dmlhGFMPIq Q5icKwweiJE8N1IHi4Gvnz2gT/bXgJkh/3BFvvy7UtmFh1R8bF6Gv5Uf6X3ouBlWzzZR Uhygu4p1G5kfzozHLOaAZMf33k1lniOaZLTnmY0BJl1DTerJHTv8uPdfCqqi6tZr60fI SS6PKmgkJtKghdyIHkc4m8oIrfreWu/hHdV87DlBsFnLniwSJnbPa3oeiAguuaYPuGj/ bGWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=jnvYUDvcX2tBjz/VzmuPZ861AVn6KjURI8n5MId/F6I=; b=MyzLwKNccu/2bmMT54hAP5wMKrhPUIZH2m5ztG58FbvCCSI9x/E1ACuJg4k6SqfFhy levqIBF1Qc3mVyNtxPFfOSQFh8j+aC18p+YbYPel94StVaB/bUqg0NgozHixxxPUveRx qL8+e8xIf3h2maXJxRb1IxSNhYlXRsHCEmtp72VnK7g1yPHKnqr+IkmZRT2HaNsHxmv7 uYey2C1I5eGWTaefcHMJMv8VWMLnDGaSC5xJUJvTJL6e7FtQPZ060KKc90Mbq436GFBb xmr9MeRWoRXhmiX831DaI3m9JaV8Za1QImDiv3Zfo6H6PMajB5CGkDEdyRql3sJ35Y6A 8wHg== X-Gm-Message-State: ALyK8tLEJ91rC1DNAhGpIkZgv3OERrk/Y0sxu5y6/rZjSgcbk25gX7O2QA5u2pSPMqlkXNrvYKGGeym0gX2S0A== X-Received: by 10.200.52.197 with SMTP id x5mr28376658qtb.41.1467740112904; Tue, 05 Jul 2016 10:35:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.56.93 with HTTP; Tue, 5 Jul 2016 10:35:12 -0700 (PDT) In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> From: Freddie Cash Date: Tue, 5 Jul 2016 10:35:12 -0700 Message-ID: Subject: Re: ZFS ARC and mmap/page cache coherency question To: Matthew Macy Cc: Karl Denninger , FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 17:35:14 -0000 On Tue, Jul 5, 2016 at 10:19 AM, Matthew Macy wrote: > ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger < > karl@denninger.net> wrote ---- > > On 7/4/2016 18:45, Matthew Macy wrote: > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger < > karl@denninger.net> wrote ---- > > > > > > > > On 7/3/2016 02:45, Matthew Macy wrote: > > > > > > > > > > Cedric greatly overstates the intractability of > resolving it. Nonetheless, since the initial import very little has been > done to improve integration, and I don't know of anyone who is up to the > task taking an interest in it. Consequently, mmap() performance is likely > "doomed" for the foreseeable future.-M---- > > > > > > > > Wellllll.... > > > > > > > > I've done a fair bit of work here (see > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and t= he > > > > political issues are at least as bad as the coding ones. > > > > > > Strictly speaking, the root of the problem is the ARC. Not ZFS per > se. Have you ever tried disabling MFU caching to see how much worse LRU > only is? I'm not really convinced the ARC's benefits justify its cost. > > > > The ARC is very useful when it gets a hit as it avoid an I/O that woul= d > > otherwise take place. > > > > Where it sucks is when the system evicts working set to preserve ARC. > > That's always wrong in that you're trading a speculative I/O (if the > > cache is hit later) for a *guaranteed* one (to page out) and maybe *tw= o* > > (to page back in.) > > The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. > There are a lot of issues stemming from the fact that ZFS is a > transactional object store with a POSIX FS on top. One is that it caches > disk blocks as opposed to file blocks. However, if one could resolve that > and have the page cache manage these blocks life would be much much bette= r. > However, you'd lose MFU. Hence my question. > =E2=80=8BAre you confusing terms here? Pretty sure the ARC uses MRU (Most Recently Used) and MFU (Most Frequently Used) caches. Not LRU (Least Recently Used). Or am I misunderstanding what you're trying to say? =E2=80=8B --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-hackers@freebsd.org Tue Jul 5 17:43:06 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96298B7330E for ; Tue, 5 Jul 2016 17:43:06 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6FCCC1C16 for ; Tue, 5 Jul 2016 17:43:06 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467740580472931.4090951466359; Tue, 5 Jul 2016 10:43:00 -0700 (PDT) Date: Tue, 05 Jul 2016 10:43:00 -0700 From: Matthew Macy To: "Freddie Cash" Cc: "FreeBSD Hackers" , "Karl Denninger" Message-ID: <155bc27ea44.c75d1029200540.4499688981397092064@nextbsd.org> In-Reply-To: References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> Subject: Re: ZFS ARC and mmap/page cache coherency question MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 17:43:06 -0000 ---- On Tue, 05 Jul 2016 10:35:12 -0700 Freddie Cash w= rote ----=20 > On Tue, Jul 5, 2016 at 10:19 AM, Matthew Macy wrote: >=20 > > ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger < > > karl@denninger.net> wrote ---- > > > On 7/4/2016 18:45, Matthew Macy wrote: > > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger < > > karl@denninger.net> wrote ---- > > > > > > > > > > On 7/3/2016 02:45, Matthew Macy wrote: > > > > > > > > > > > > Cedric greatly overstates the intractability of > > resolving it. Nonetheless, since the initial import very little has be= en > > done to improve integration, and I don't know of anyone who is up to t= he > > task taking an interest in it. Consequently, mmap() performance is lik= ely > > "doomed" for the foreseeable future.-M---- > > > > > > > > > > Wellllll.... > > > > > > > > > > I've done a fair bit of work here (see > > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) an= d the > > > > > political issues are at least as bad as the coding ones. > > > > > > > > Strictly speaking, the root of the problem is the ARC. Not ZFS pe= r > > se. Have you ever tried disabling MFU caching to see how much worse LR= U > > only is? I'm not really convinced the ARC's benefits justify its cost. > > > > > > The ARC is very useful when it gets a hit as it avoid an I/O that w= ould > > > otherwise take place. > > > > > > Where it sucks is when the system evicts working set to preserve AR= C. > > > That's always wrong in that you're trading a speculative I/O (if th= e > > > cache is hit later) for a *guaranteed* one (to page out) and maybe = *two* > > > (to page back in.) > > > > The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. > > There are a lot of issues stemming from the fact that ZFS is a > > transactional object store with a POSIX FS on top. One is that it cach= es > > disk blocks as opposed to file blocks. However, if one could resolve t= hat > > and have the page cache manage these blocks life would be much much be= tter. > > However, you'd lose MFU. Hence my question. > > >=20 > =E2=80=8BAre you confusing terms here? >=20 > Pretty sure the ARC uses MRU (Most Recently Used) and MFU (Most Frequent= ly > Used) caches. Not LRU (Least Recently Used). >=20 > Or am I misunderstanding what you're trying to say? =20 If it caches based on MRU, by definition it evicts LRU. I did mix caching p= olicy with eviction policy in the same sentence which is obviously not corr= ect. Nonetheless, it should be obvious that I meant MFU+MRU caching vs MRU = caching only. Thanks. -M From owner-freebsd-hackers@freebsd.org Tue Jul 5 17:50:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36FF7B73515 for ; Tue, 5 Jul 2016 17:50:37 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D2CD61EEF for ; Tue, 5 Jul 2016 17:50:36 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 013CB220426 for ; Tue, 5 Jul 2016 12:50:33 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> From: Karl Denninger Message-ID: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net> Date: Tue, 5 Jul 2016 12:50:16 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms040109070705040203000606" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 17:50:37 -0000 This is a cryptographically signed message in MIME format. --------------ms040109070705040203000606 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/5/2016 12:19, Matthew Macy wrote: > > > ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger wrote ----=20 > > =20 > > =20 > > On 7/4/2016 18:45, Matthew Macy wrote:=20 > > >=20 > > >=20 > > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- =20 > > > > =20 > > > > On 7/3/2016 02:45, Matthew Macy wrote: =20 > > > > > =20 > > > > > Cedric greatly overstates the intractability of r= esolving it. Nonetheless, since the initial import very little has been d= one to improve integration, and I don't know of anyone who is up to the t= ask taking an interest in it. Consequently, mmap() performance is likely = "doomed" for the foreseeable future.-M---- =20 > > > > =20 > > > > Wellllll.... =20 > > > > =20 > > > > I've done a fair bit of work here (see =20 > > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and= the =20 > > > > political issues are at least as bad as the coding ones. =20 > > > > =20 > > > =20 > > >=20 > > > Strictly speaking, the root of the problem is the ARC. Not ZFS per= se. Have you ever tried disabling MFU caching to see how much worse LRU = only is? I'm not really convinced the ARC's benefits justify its cost.=20 > > >=20 > > > -M=20 > > >=20 > > =20 > > The ARC is very useful when it gets a hit as it avoid an I/O that wo= uld=20 > > otherwise take place.=20 > > =20 > > Where it sucks is when the system evicts working set to preserve ARC= =2E =20 > > That's always wrong in that you're trading a speculative I/O (if the= =20 > > cache is hit later) for a *guaranteed* one (to page out) and maybe *= two*=20 > > (to page back in.)=20 > =20 > The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. T= here are a lot of issues stemming from the fact that ZFS is a transaction= al object store with a POSIX FS on top. One is that it caches disk blocks= as opposed to file blocks. However, if one could resolve that and have t= he page cache manage these blocks life would be much much better. However= , you'd lose MFU. Hence my question. > > -M > I suspect there's an argument to be made there but the present problems make determining the impact of that difficult or impossible as those effects are swamped by the other issues. I can fairly-easily create workloads on the base code where simply typing "vi ", making a change and hitting ":w" will result in a stall of tens of seconds or more while the cache flush that gets requested is run down. I've resolved a good part (but not all instances) of this through my work. My understanding is that 11- has had additional work done to the base code, but three underlying issues are not, from what I can see in the commit logs and discussions, addressed: The VM system will page out working set while leaving ARC alone, UMA reserved-but-not-in-use space is not policed adequately when memory pressure exists *before* the pager starts considering evicting working set and the write-back cache is for many machine configurations grossly inappropriate and cannot be tuned adequately by hand (particularly being true on a system with vdevs that have materially-varying performance levels.) I have more-or-less stopped work on the tree on a forward basis since I got to a place with 10.2 that (1) works for my production requirements, resolving the problems and (2) ran into what I deemed to be intractable political issues within core on progress toward eradicating the root of the problem. I will probably revisit the situation with 11- at some point, as I'll want to roll my production systems forward. However, I don't know when that will be -- right now 11- is stable enough for some of my embedded work (e.g. on the Raspberry Pi2) but is not on my server and client-class machines. Indeed just yesterday I got a lock-order reversal panic while doing a shutdown after a kernel update on one of my lab boxes running a just-updated 11- codebase. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040109070705040203000606 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxNzUwMTZaME8GCSqGSIb3DQEJBDFCBEB+ uW3KWU2eWDSXQTUP44BqHki8DdlspeuMs4iJnNFKXBwEb87FP/Qe3cSJk7JA9zPF4h13zPI8 Df2xbeNhsq9JMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAQmw70oJD QhBLWxXdxGwD1Dws9tblRJ67e7dRElxtME/yJs1Gxtl4o4hwC76qd4mMmJ5wrCMcaZ9qDZwX TKpC5/fWGU/sqXv4utH6fF18lbimDjm/SywA06DXwklNWHs+Y9k9HU06FXHn+n71wKHjR6t4 lRqF5yt6Uf7MK9quuL3l06HXgwoQZf75IR3WNSCvbrujAgLQDhjaaHLv12HiQPwbKsL5dAS2 PeF4wenKdi46Buil3qZ2EW7jrkoFoe2toUjak9skpZwFUD8X6ddPJf/kaofxq8bO7CJ4+bVx ypOlRVNxVOEbRN5NNdHyel0hhFyNGVDiuOkrzOzhk1YBxRy0nYAeP/0DkhkZLcEEPyqLX9Kb HH9Iy3kHEgJvw1vmvA+Jlpxrp1WcE7/pMQYndb2EfLXXNKaoJ0SnLlhD5uva/M00IxU+Rmr2 TolbZP5/pLsUYgiFkujv0jh/ChTOoEvIJFQNn3OELCI+MJPmJG6x9NVNBb4CmaiuP2L5IKNY /59qJVeS1CwVZAPAHUGRMc900VFi3HS1mLvyZC7NBCI1Fzp5V7Qrw6lh3gNNGr9PolxhaCS0 rRTLk1QrEyhmxCof/WQQHBWJqdhoTRu5TU8hSZoPmRCDbfGIWjphhTfCtXVDetDYJojtXnFn Aq/qFus05SnoKigpGQhxSEo3dCoAAAAAAAA= --------------ms040109070705040203000606-- From owner-freebsd-hackers@freebsd.org Tue Jul 5 18:26:53 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF911B201E9 for ; Tue, 5 Jul 2016 18:26:53 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BB1E11ABB; Tue, 5 Jul 2016 18:26:53 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from zeppelin.tachypleus.net (c-50-139-166-237.hsd1.ma.comcast.net [50.139.166.237]) (authenticated bits=0) by c.mail.sonic.net (8.15.1/8.15.1) with ESMTPSA id u65IQif4011024 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Tue, 5 Jul 2016 11:26:45 -0700 Subject: Re: Review request: sparse CPU ID maps To: Warner Losh , Adrian Chadd References: <57761101.3030101@freebsd.org> <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> Cc: "freebsd-hackers@freebsd.org" , outro pessoa From: Nathan Whitehorn Message-ID: <59222776-45b4-640c-b5e4-5f8b8d6c45e5@freebsd.org> Date: Tue, 5 Jul 2016 11:26:43 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Sonic-CAuth: UmFuZG9tSVYJgx+TKnWdwLYYgGpUKMH24lQEzyNrm2WuvLTkkvMD93V4HNeLJsFBsFz+JFeBv5pVObXO6FwdyMW4zdnjoesyEkATbPUrXbI= X-Sonic-ID: C;1EfABt5C5hG6rZtMTlz00w== M;tjdIB95C5hG6rZtMTlz00w== X-Spam-Flag: No X-Sonic-Spam-Details: 0.0/5.0 by cerberusd X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 18:26:53 -0000 On 07/03/16 13:11, Warner Losh wrote: > On Sun, Jul 3, 2016 at 1:37 PM, Adrian Chadd wrote: >> On 2 July 2016 at 17:08, Nathan Whitehorn wrote: >>> A reasonable first pass at checking for this kind of bug is doing grep -lR >>> '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows the following >>> files: >>> arm/mv/armadaxp/armadaxp_mp.c >>> arm/include/counter.h >>> arm/broadcom/bcm2835/bcm2836.c >>> arm/broadcom/bcm2835/bcm2836_mp.c >>> arm/freescale/imx/imx6_mp.c >>> arm/allwinner/aw_mp.c >>> arm/rockchip/rk30xx_mp.c >>> arm/amlogic/aml8726/aml8726_mp.c >>> arm/samsung/exynos/exynos5_mp.c >>> arm/arm/mp_machdep.c >>> arm/nvidia/tegra124/tegra124_mp.c >>> arm64/include/counter.h >>> arm64/arm64/gic_v3.c >>> arm64/arm64/gic_v3_its.c >>> arm64/arm64/gicv3_its.c >>> >>> All of them should, in some sense, be CPU_FOREACH(), but it may not matter. >>> For example, it may not be possible to have sparse CPU IDs on some or all of >>> those SOCs. At least the generic ones (counter, mp_machdep.c, gic (why are >>> there both gic_v3_its.c and gicv3_its.c?)) should be changed, I think. >>> -Nathan >> I think converting all the users over to the CPU_FOREACH thing is the >> right way to go, even if the SOC doesn't require it. People do bring >> up new systems by copy/pasta'ing an existing similar system, so we're >> best served by having all the consumers migrated. >> >> But, I'd do it in head/12. Early in head/12. :-P > It is a mergeable change too, since it wouldn't change any APIs. > At least the conversion to CPU_FOREACH. We don't want too many > sweeping changes that can't be merged too early (that way leads to > lots of maintenance issues), but we can do something like this. Merging > would be optional, but possible, for those bits of the tree that need it. > Though, for something like this, there's little against doing a full merge > and a lot for it... > > Warner > That sounds like the right approach. Since the original patch fixes bugs in 11, rather than niceties, I will send it to re@ tomorrow. After the branch, I'll do a sweep of other obviously wrong code for 12 with an MFC timer. -Nathan From owner-freebsd-hackers@freebsd.org Tue Jul 5 18:40:32 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9DD6B2047C for ; Tue, 5 Jul 2016 18:40:32 +0000 (UTC) (envelope-from lionelcons1972@gmail.com) Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com [IPv6:2607:f8b0:4002:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A52791226 for ; Tue, 5 Jul 2016 18:40:32 +0000 (UTC) (envelope-from lionelcons1972@gmail.com) Received: by mail-yw0-x230.google.com with SMTP id i12so68519410ywa.1 for ; Tue, 05 Jul 2016 11:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=; b=vBgOQy5Ic2H1eLXr9vQeHdHKc0D3MaSbb2wBkoS34hHFVbySfDlysBU+wU/LRBNIcG z3DbsKcJR7l1nTWsdrZebd0vVT3+RsLvJ5KM34hiZ+sE48RQnHfwuUcC3gJdGptLyk3t idWPbroBB6VWMbP+S6hEgzXnjW7hmR5+84ix0nowTCc1JUiuW+w0yJsWpoS5aPfOhJd6 DGybS7JVV/NeyKwAZUdrWpgJC/g7H37sLpURdA14XXYcBdogccNLksxBMdZHRQAqLOmY LCGw2ohbP982gLLvhRk+C/0gvOBBXQjwXhtVOi2utVPpUdExlAOa2Pf25cuuBUNPP6zZ pSsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=bPHVmSWMxMML7KaxWY4/HrDQ6m4E8ABHFTh6rWXUyYM=; b=GuQ/1vvDZxiLpzxEhiym7ArkS9LGSx0f5g3s7e2yu6ay/wXhRSPCXK8ezTl1KH2VI2 1u6kpvCn8vckoeKmNiPtvSCh8JuIIjiQY5OB8pTWkXKyg+SwA3e1gdAHKWawTdxzWbrD D8i8fiRTUnKImWKklGnQhWOUnbTquKVi1dHRA4YCGVRdrgZUzc541ECsJexmQM0xKWQn rX4bNjHvL1AfMJRAARMNJynwQNTmE2L9sBbh/75d9lsVlghuk5wIgDfbUSubc0cJVRUc o7lPf42PYbyPwEr2tjwINMl81RIvpcDlcVs9m/rj9qOmJpNJlPFAkvW8qXMzCph6HlNI 9d2g== X-Gm-Message-State: ALyK8tIpkSB5wRy0WwfVPm7zk7XBIVFsxMn1GNaYgN/Djr/vGNc35akJCJCPa16PyfqHyFmVAgnYDRuUyREnsQ== X-Received: by 10.129.50.83 with SMTP id y80mr12096377ywy.305.1467744031740; Tue, 05 Jul 2016 11:40:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.193.194 with HTTP; Tue, 5 Jul 2016 11:40:30 -0700 (PDT) In-Reply-To: <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net> From: Lionel Cons Date: Tue, 5 Jul 2016 20:40:30 +0200 Message-ID: Subject: Re: ZFS ARC and mmap/page cache coherency question To: Karl Denninger Cc: Freebsd hackers list Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 18:40:33 -0000 So what Oracle did (based on work done by SUN for Opensolaris) was to: 1. Modify ZFS to prevent *ANY* double/multi caching [this is considered a design defect] 2. Introduce a new VM subsystem which scales a lot better and provides hooks for [1] so there are never two or more copies of the same data in the system Given that this was a huge, paid, multiyear effort its not likely going to happen that the design defects in opensource ZFS will ever go away. Lionel On 5 July 2016 at 19:50, Karl Denninger wrote: > > On 7/5/2016 12:19, Matthew Macy wrote: >> >> >> ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger wrote ---- >> > >> > >> > On 7/4/2016 18:45, Matthew Macy wrote: >> > > >> > > >> > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- >> > > > >> > > > On 7/3/2016 02:45, Matthew Macy wrote: >> > > > > >> > > > > Cedric greatly overstates the intractability of re= solving it. Nonetheless, since the initial import very little has been done= to improve integration, and I don't know of anyone who is up to the task t= aking an interest in it. Consequently, mmap() performance is likely "doomed= " for the foreseeable future.-M---- >> > > > >> > > > Wellllll.... >> > > > >> > > > I've done a fair bit of work here (see >> > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and = the >> > > > political issues are at least as bad as the coding ones. >> > > > >> > > >> > > >> > > Strictly speaking, the root of the problem is the ARC. Not ZFS per = se. Have you ever tried disabling MFU caching to see how much worse LRU onl= y is? I'm not really convinced the ARC's benefits justify its cost. >> > > >> > > -M >> > > >> > >> > The ARC is very useful when it gets a hit as it avoid an I/O that wou= ld >> > otherwise take place. >> > >> > Where it sucks is when the system evicts working set to preserve ARC. >> > That's always wrong in that you're trading a speculative I/O (if the >> > cache is hit later) for a *guaranteed* one (to page out) and maybe *t= wo* >> > (to page back in.) >> >> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. Th= ere are a lot of issues stemming from the fact that ZFS is a transactional = object store with a POSIX FS on top. One is that it caches disk blocks as o= pposed to file blocks. However, if one could resolve that and have the page= cache manage these blocks life would be much much better. However, you'd l= ose MFU. Hence my question. >> >> -M >> > I suspect there's an argument to be made there but the present problems > make determining the impact of that difficult or impossible as those > effects are swamped by the other issues. > > I can fairly-easily create workloads on the base code where simply > typing "vi ", making a change and hitting ":w" will result in > a stall of tens of seconds or more while the cache flush that gets > requested is run down. I've resolved a good part (but not all > instances) of this through my work. > > My understanding is that 11- has had additional work done to the base > code, but three underlying issues are not, from what I can see in the > commit logs and discussions, addressed: The VM system will page out > working set while leaving ARC alone, UMA reserved-but-not-in-use space > is not policed adequately when memory pressure exists *before* the pager > starts considering evicting working set and the write-back cache is for > many machine configurations grossly inappropriate and cannot be tuned > adequately by hand (particularly being true on a system with vdevs that > have materially-varying performance levels.) > > I have more-or-less stopped work on the tree on a forward basis since I > got to a place with 10.2 that (1) works for my production requirements, > resolving the problems and (2) ran into what I deemed to be intractable > political issues within core on progress toward eradicating the root of > the problem. > > I will probably revisit the situation with 11- at some point, as I'll > want to roll my production systems forward. However, I don't know when > that will be -- right now 11- is stable enough for some of my embedded > work (e.g. on the Raspberry Pi2) but is not on my server and > client-class machines. Indeed just yesterday I got a lock-order > reversal panic while doing a shutdown after a kernel update on one of my > lab boxes running a just-updated 11- codebase. > > -- > Karl Denninger > karl@denninger.net > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ --=20 Lionel From owner-freebsd-hackers@freebsd.org Tue Jul 5 19:09:14 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3218BB20EDE for ; Tue, 5 Jul 2016 19:09:14 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E58741AFF for ; Tue, 5 Jul 2016 19:09:13 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 7C3CC2209A0 for ; Tue, 5 Jul 2016 14:09:12 -0500 (CDT) Subject: Re: ZFS ARC and mmap/page cache coherency question To: freebsd-hackers@freebsd.org References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> <155b84da0aa.ad3af0e6139335.8627172617037605875@nextbsd.org> <7e00af5a-86cd-25f8-a4c6-2d946b507409@denninger.net> <155bc1260e6.12001bf18198857.6272515207330027022@nextbsd.org> <31f4d30f-4170-0d04-bd23-1b998474a92e@denninger.net> From: Karl Denninger Message-ID: <2be70811-add4-d630-7f5a-a5a53ee2a5d4@denninger.net> Date: Tue, 5 Jul 2016 14:08:55 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms070700000603020002030909" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 19:09:14 -0000 This is a cryptographically signed message in MIME format. --------------ms070700000603020002030909 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable You'd get most of the way to what Oracle did, I suspect, if the system: 1. Dynamically resized the write cache on a per-vdev basis so as to prevent a flush from stalling all write I/O for a material amount of time (which can and *does* happen now) 2. Made VM aware of UMA "committed-but-free" on an ongoing basis and policed it on a sliding basis (that is, as RAM pressure rises VM considers it more-important to reap UMA so as to prevent marked-used-but-in-fact-free RAM from accumulating when RAM is under pressure.) 3. Bi-directionally hooked VM so that it initiates and cooperates with ZFS on ARC size management. Specifically, if ZFS decides ARC is to be reaped then it must notify VM so that (1) UMA can be reaped first, if necessary and then if ARC *still* needs to be reaped it occurs *before* VM pages anything out. If and only if ARC is at minimum should the VM system evict working set to the pagefile. #1 is entirely within ZFS but is fairly hard to do well, and neither Illumos or FreeBSD's team have taken a serious crack at it. #2 I've taken a fairly decent look at but not implemented code on the VM side to do it. What I *have* done is implemented code on the ZFS side to do it within the ZFS paradigm, which is technically in the wrong place but works pretty well -- so long as the UMA fragmentation is coming from ZFS. #3 is a bear, especially if you don't move that code into VM (which intimately "marries" the ZFS and VM code; that's very bad from a maintainability perspective.) What I've implemented is somewhat of a hack in that regard in that it has ZFS triggering before VM does, it gets aggressive with reaping its own UMA areas and the writeback cache when there is RAM pressure and thus *most* of the time avoids the paging pathology while allowing the ARC to use the truly-free RAM. It ought to be in the VM code however, because the pressure sometimes does not come from ZFS. This is why one of my production machines looks like right now with the patch in -- this system runs a quite-active Postgres database along with a material number of other things at the same time; this doesn't look bad at all in terms of efficiency. [karl@NewFS ~]$ zfs-stats -A ------------------------------------------------------------------------ ZFS Subsystem Report Tue Jul 5 14:05:06 2016 ------------------------------------------------------------------------ ARC Summary: (HEALTHY) Memory Throttle Count: 0 ARC Misc: Deleted: 29.11m Recycle Misses: 0 Mutex Misses: 67.14k Evict Skips: 72.84m ARC Size: 72.10% 16.10 GiB Target Size: (Adaptive) 83.00% 18.53 GiB Min Size (Hard Limit): 12.50% 2.79 GiB Max Size (High Water): 8:1 22.33 GiB ARC Size Breakdown: Recently Used Cache Size: 81.84% 15.17 GiB Frequently Used Cache Size: 18.16% 3.37 GiB ARC Hash Breakdown: Elements Max: 1.84m Elements Current: 33.47% 614.39k Collisions: 41.78m Chain Max: 6 Chains: 39.45k ------------------------------------------------------------------------ ARC Efficiency: 1.88b Cache Hit Ratio: 78.45% 1.48b Cache Miss Ratio: 21.55% 405.88m Actual Hit Ratio: 77.46% 1.46b Data Demand Efficiency: 77.97% 1.45b Data Prefetch Efficiency: 24.82% 9.07m CACHE HITS BY CACHE LIST: Anonymously Used: 0.52% 7.62m Most Recently Used: 8.38% 123.87m Most Frequently Used: 90.36% 1.34b Most Recently Used Ghost: 0.18% 2.65m Most Frequently Used Ghost: 0.56% 8.30m CACHE HITS BY DATA TYPE: Demand Data: 76.71% 1.13b Prefetch Data: 0.15% 2.25m Demand Metadata: 21.82% 322.33m Prefetch Metadata: 1.33% 19.58m CACHE MISSES BY DATA TYPE: Demand Data: 78.91% 320.29m Prefetch Data: 1.68% 6.82m Demand Metadata: 16.70% 67.79m Prefetch Metadata: 2.70% 10.97m ------------------------------------------------------------------------ The system currently has 20Gb wired, ~3Gb free and ~1Gb inactive with a tiny amount in the cache bucket (~46mb) On 7/5/2016 13:40, Lionel Cons wrote: > So what Oracle did (based on work done by SUN for Opensolaris) was to: > 1. Modify ZFS to prevent *ANY* double/multi caching [this is > considered a design defect] > 2. Introduce a new VM subsystem which scales a lot better and provides > hooks for [1] so there are never two or more copies of the same data > in the system > > Given that this was a huge, paid, multiyear effort its not likely > going to happen that the design defects in opensource ZFS will ever go > away. > > Lionel > > On 5 July 2016 at 19:50, Karl Denninger wrote: >> On 7/5/2016 12:19, Matthew Macy wrote: >>> >>> ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger wrote ---- >>> > >>> > >>> > On 7/4/2016 18:45, Matthew Macy wrote: >>> > > >>> > > >>> > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger wrote ---- >>> > > > >>> > > > On 7/3/2016 02:45, Matthew Macy wrote: >>> > > > > >>> > > > > Cedric greatly overstates the intractability of= resolving it. Nonetheless, since the initial import very little has been= done to improve integration, and I don't know of anyone who is up to the= task taking an interest in it. Consequently, mmap() performance is likel= y "doomed" for the foreseeable future.-M---- >>> > > > >>> > > > Wellllll.... >>> > > > >>> > > > I've done a fair bit of work here (see >>> > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) a= nd the >>> > > > political issues are at least as bad as the coding ones. >>> > > > >>> > > >>> > > >>> > > Strictly speaking, the root of the problem is the ARC. Not ZFS p= er se. Have you ever tried disabling MFU caching to see how much worse LR= U only is? I'm not really convinced the ARC's benefits justify its cost. >>> > > >>> > > -M >>> > > >>> > >>> > The ARC is very useful when it gets a hit as it avoid an I/O that = would >>> > otherwise take place. >>> > >>> > Where it sucks is when the system evicts working set to preserve A= RC. >>> > That's always wrong in that you're trading a speculative I/O (if t= he >>> > cache is hit later) for a *guaranteed* one (to page out) and maybe= *two* >>> > (to page back in.) >>> >>> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU.= There are a lot of issues stemming from the fact that ZFS is a transacti= onal object store with a POSIX FS on top. One is that it caches disk bloc= ks as opposed to file blocks. However, if one could resolve that and have= the page cache manage these blocks life would be much much better. Howev= er, you'd lose MFU. Hence my question. >>> >>> -M >>> >> I suspect there's an argument to be made there but the present problem= s >> make determining the impact of that difficult or impossible as those >> effects are swamped by the other issues. >> >> I can fairly-easily create workloads on the base code where simply >> typing "vi ", making a change and hitting ":w" will result = in >> a stall of tens of seconds or more while the cache flush that gets >> requested is run down. I've resolved a good part (but not all >> instances) of this through my work. >> >> My understanding is that 11- has had additional work done to the base >> code, but three underlying issues are not, from what I can see in the >> commit logs and discussions, addressed: The VM system will page out >> working set while leaving ARC alone, UMA reserved-but-not-in-use space= >> is not policed adequately when memory pressure exists *before* the pag= er >> starts considering evicting working set and the write-back cache is fo= r >> many machine configurations grossly inappropriate and cannot be tuned >> adequately by hand (particularly being true on a system with vdevs tha= t >> have materially-varying performance levels.) >> >> I have more-or-less stopped work on the tree on a forward basis since = I >> got to a place with 10.2 that (1) works for my production requirements= , >> resolving the problems and (2) ran into what I deemed to be intractabl= e >> political issues within core on progress toward eradicating the root o= f >> the problem. >> >> I will probably revisit the situation with 11- at some point, as I'll >> want to roll my production systems forward. However, I don't know whe= n >> that will be -- right now 11- is stable enough for some of my embedded= >> work (e.g. on the Raspberry Pi2) but is not on my server and >> client-class machines. Indeed just yesterday I got a lock-order >> reversal panic while doing a shutdown after a kernel update on one of = my >> lab boxes running a just-updated 11- codebase. >> >> -- >> Karl Denninger >> karl@denninger.net >> /The Market Ticker/ >> /[S/MIME encrypted email preferred]/ > > --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms070700000603020002030909 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDUxOTA4NTVaME8GCSqGSIb3DQEJBDFCBED7 nX0i7Tq1EG+NZ07b4ciG2M2dlPOJhMp7qWSAVTBZF7zpU1fWik5soqXY+W3tvS8F1b0AM0fw AItIRDGAAGjnMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAMzDbYeTj MuQjFIFwt58V8f59IO003Oz6kMDf17uEqhVFg8mr+fd8x01kbb/PVdl5JdY7Yao3xGNUHl3X /Sy/yAdQQlgCtrpycO/GBrycnkK5tLh8DlluKisxWIarwaHiwwXIwl8xwAgc0KevBkqVuuiW VYTJMToLwnMbkXFZbLY6AovBUX6aPucjhlROXvUXWl7wG8/+g96rpDZHoHmE6DNK9bhZhekj UQHcDARuhYa/0aQGZcAPzndpba8RVnPOgY+OqxnL1XJrsTPbVi4pvymcYz4oSKNVdps8vt9L aZDJUh1vcWTVh+4rDXQWHTPDtarJBUiYKUpErzIQtgPzfClvBtfm0VMm3aGCCFDciD1gndVo nqo5cH4dyUmxxivWVniLU14CuWBcL/fEbSljRp+Gd5BgGk7/QD8UdAU3uiby6TolZvQ5S0Sk k0p3edFUQc8OeerZ5BoFU5jD5ogwjzgF+A8ot6qmisq9CcB+2cLHF3L6l+sCDz2grmVu8kGB iVmIdXrc4qKdIB/yOzjjluNCywvUSrjFsL3FCAJObc/ydEoymDFfSfY2rfyFs120DkNkaQry 3JCriuwUYqfV7ZzEvSK7yjp4fXRhVhi9Ez56iuFJRH/y9A1Ydv7xxyCtCcVzqWB7xHkj43Hg lrv1CUp++UIuFbt9XCRI1tgAQwoAAAAAAAA= --------------ms070700000603020002030909-- From owner-freebsd-hackers@freebsd.org Wed Jul 6 15:18:36 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1C89FB75E29; Wed, 6 Jul 2016 15:18:36 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A13B21865; Wed, 6 Jul 2016 15:18:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u66FIMY6079547 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 6 Jul 2016 18:18:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u66FIMY6079547 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u66FIM1Y079519; Wed, 6 Jul 2016 18:18:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 6 Jul 2016 18:18:22 +0300 From: Konstantin Belousov To: David Cross Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) Message-ID: <20160706151822.GC38613@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 15:18:36 -0000 On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote: > Ok.. to reply to my own message, I using ktr and debugging printfs I have > found the culprit.. but I am still at a loss to 'why', or what the > appropriate fix is. > > Lets go back to the panic (simplified) > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > #1 0xffffffff80401454 at vpanic+0x124 > #2 0xffffffff804014e3 at panic+0x43 > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a > #4 0xffffffff80499cc1 at brelse+0x151 > #5 0xffffffff804979b1 at bufwrite+0x81 > #6 0xffffffff80623c80 at ffs_write+0x4b0 > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > #11 0xffffffff806588e6 at vm_pageout_flush+0x116 > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > #13 0xffffffff80651519 at vm_object_page_clean+0x179 > #14 0xffffffff80651102 at vm_object_terminate+0xa2 > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > #16 0xffffffff8062a52f at ufs_reclaim+0x1f > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > > Via KTR logging I determined that the dangling dependedency was on a > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in VOP_RECLAIM > itself), called by the vnode lru cleanup process; I further noticed that it > was in a newbuf that recycled a bp (unimportant, except it let me narrow > down my logging to something managable), from there I get this stacktrace > (simplified) > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > #1 0xffffffff8049c98e at getnewbuf+0x4be > #2 0xffffffff804996a0 at getblk+0x830 > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 > #4 0xffffffff80623b0b at ffs_write+0x33b > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > #9 0xffffffff806588e6 at vm_pageout_flush+0x116 > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > #11 0xffffffff80651519 at vm_object_page_clean+0x179 > #12 0xffffffff80651102 at vm_object_terminate+0xa2 > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > #14 0xffffffff8062a52f at ufs_reclaim+0x1f > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > #16 0xffffffff804b6c6e at vgonel+0x2ee > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 > > addr2line on the ffs_balloc_ufs2 gives: > /usr/src/sys/ufs/ffs/ffs_balloc.c:778: > > bp = getblk(vp, lbn, nsize, 0, 0, gbflags); > bp->b_blkno = fsbtodb(fs, newb); > if (flags & BA_CLRBUF) > vfs_bio_clrbuf(bp); > if (DOINGSOFTDEP(vp)) > softdep_setup_allocdirect(ip, lbn, newb, 0, > nsize, 0, bp); > > > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM > handles this, this is after vinvalbuf is called, it expects that everything > is flushed to disk and its just about releasing structures (is my read of > the code). > > Now, perhaps this is a good assumption? the question then is how is this > buffer hanging out there surviving a a vinvalbuf. I will note that my > test-case that finds this runs and terminates *minutes* before... its not > just hanging out there in a race, its surviving background sync, fsync, > etc... wtf? Also, I *can* unmount the FS without an error, so that > codepath is either ignoring this buffer, or its forcing a sync in a way > that doesn't panic? Most typical cause for the buffer dependencies not flushed is a buffer write error. At least you could provide the printout of the buffer to confirm or reject this assumption. Were there any kernel messages right before the panic ? Just in case, did you fsck the volume before using it, after the previous panic ? > > Anyone have next steps? I am making progress here, but its really slow > going, this is probably the most complex portion of the kernel and some > pointers would be helpful. > > On Sat, Jul 2, 2016 at 2:31 PM, David Cross wrote: > > > Ok, I have been trying to trace this down for awhile..I know quite a bit > > about it.. but there's a lot I don't know, or I would have a patch. I have > > been trying to solve this on my own, but bringing in some outside > > assistance will let me move on with my life. > > > > First up: The stacktrace (from a debugging kernel, with coredump > > > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > > #1 0xffffffff8071018a in kern_reboot (howto=260) > > at /usr/src/sys/kern/kern_shutdown.c:486 > > #2 0xffffffff80710afc in vpanic ( > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", > > ap=0xfffffe023ae5cf40) > > at /usr/src/sys/kern/kern_shutdown.c:889 > > #3 0xffffffff807108c0 in panic ( > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") > > at /usr/src/sys/kern/kern_shutdown.c:818 > > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( > > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 > > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at > > buf.h:428 > > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) > > at /usr/src/sys/kern/vfs_bio.c:1599 > > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) > > at /usr/src/sys/kern/vfs_bio.c:1180 > > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 > > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) > > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, > > a=0xfffffe023ae5d2b8) at vnode_if.c:999 > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, > > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) > > at vnode_if.h:413 > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages > > (vp=0xfffff80077e7a000, > > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, > > rtvals=0xfffffe023ae5d580) > > at /usr/src/sys/vm/vnode_pager.c:1138 > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) > > at /usr/src/sys/kern/vfs_default.c:760 > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, > > a=0xfffffe023ae5d478) at vnode_if.c:2861 > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, > > m=0xfffffe023ae5d660, count=16384, sync=1, rtvals=0xfffffe023ae5d580, > > offset=0) at vnode_if.h:1189 > > #16 0xffffffff80b196f3 in vnode_pager_putpages (object=0xfffff8014a1fce00, > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > at /usr/src/sys/vm/vnode_pager.c:1016 > > #17 0xffffffff80b0a605 in vm_pager_put_pages (object=0xfffff8014a1fce00, > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > at vm_pager.h:144 > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, > > count=4, > > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, eio=0xfffffe023ae5d77c) > > at /usr/src/sys/vm/vm_pageout.c:533 > > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( > > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, > > flags=1, > > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) > > at /usr/src/sys/vm/vm_object.c:971 > > #20 0xffffffff80afe91e in vm_object_page_clean (object=0xfffff8014a1fce00, > > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 > > #21 0xffffffff80afe1fa in vm_object_terminate (object=0xfffff8014a1fce00) > > at /usr/src/sys/vm/vm_object.c:735 > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000) > > at /usr/src/sys/vm/vnode_pager.c:164 > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) > > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) > > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, > > a=0xfffffe023ae5d968) at vnode_if.c:2019 > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, > > td=0xfffff80008931960) at vnode_if.h:830 > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) > > at /usr/src/sys/kern/vfs_subr.c:2943 > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) > > at /usr/src/sys/kern/vfs_subr.c:882 > > #29 0xffffffff80828ea9 in vnlru_proc () at > > /usr/src/sys/kern/vfs_subr.c:1000 > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 > > , > > arg=0x0, frame=0xfffffe023ae5dc00) at > > /usr/src/sys/kern/kern_fork.c:1027 > > #31 0xffffffff80b21dce in fork_trampoline () > > at /usr/src/sys/amd64/amd64/exception.S:611 > > #32 0x0000000000000000 in ?? () > > > > This is a kernel compiled -O -g, its "almost" GENERIC; the only difference > > is some removed drivers, I have reproduced this on a few different kernels, > > including a BHYVE one so I can poke at it and not take out the main > > machine. The reproduction as it currently stands needs to have jails > > running, but I don't believe this is a jail interaction, I think its just > > that the process that sets up the problem happens to be running in a jail. > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the > > filesystem, when the vnlru forces the problem vnode out the system panics. > > > > I made a few modifications to the kernel to spit out information about the > > buf that causes the issue, but that is it. > > > > Information about the buf in question; it has a single softdependency > > worklist for direct allocation: > > (kgdb) print *bp->b_dep->lh_first > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, > > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} > > > > The file that maps to that buffer: > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 > > MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > > > Any help is appreciated, until then I will keep banging my head against > > the proverbial wall on this :) > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Wed Jul 6 15:49:29 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 726DDB75447 for ; Wed, 6 Jul 2016 15:49:29 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from kif.fubar.geek.nz (kif.fubar.geek.nz [178.62.119.249]) by mx1.freebsd.org (Postfix) with ESMTP id 31C321812; Wed, 6 Jul 2016 15:49:28 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from zapp (global-5-141.nat-2.net.cam.ac.uk [131.111.5.141]) by kif.fubar.geek.nz (Postfix) with ESMTPSA id 0F1CBD78E6; Wed, 6 Jul 2016 15:49:28 +0000 (UTC) Date: Wed, 6 Jul 2016 16:49:26 +0100 From: Andrew Turner To: Nathan Whitehorn Cc: "freebsd-hackers@freebsd.org" Subject: Re: Review request: sparse CPU ID maps Message-ID: <20160706164926.7c3d116c@zapp> In-Reply-To: <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> References: <57761101.3030101@freebsd.org> <5345fb94-91b8-5019-037e-d4825a694cfd@freebsd.org> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 15:49:29 -0000 On Sat, 2 Jul 2016 17:08:54 -0700 Nathan Whitehorn wrote: > A reasonable first pass at checking for this kind of bug is doing > grep -lR '< mp_ncpus'. Running that on sys/arm and sys/arm64 shows > the following files: > arm/mv/armadaxp/armadaxp_mp.c > arm/include/counter.h > arm/broadcom/bcm2835/bcm2836.c > arm/broadcom/bcm2835/bcm2836_mp.c > arm/freescale/imx/imx6_mp.c > arm/allwinner/aw_mp.c > arm/rockchip/rk30xx_mp.c > arm/amlogic/aml8726/aml8726_mp.c > arm/samsung/exynos/exynos5_mp.c > arm/arm/mp_machdep.c > arm/nvidia/tegra124/tegra124_mp.c I'm planning forcing people to clean up the arm code in 12. I can add this to the list of things that need to be fixed. > arm64/include/counter.h > arm64/arm64/gic_v3.c > arm64/arm64/gic_v3_its.c > arm64/arm64/gicv3_its.c I'll look at these in a few days when the code freeze is lifted. > > All of them should, in some sense, be CPU_FOREACH(), but it may not > matter. For example, it may not be possible to have sparse CPU IDs on > some or all of those SOCs. At least the generic ones (counter, > mp_machdep.c, gic (why are there both gic_v3_its.c and gicv3_its.c?)) > should be changed, I think. On arm it depends on the SoC. As far as I know no arm SoCs support sparse CPU IDs as they assign the ID based on the internal ID and, on a single cluster of CPUS, this seems to be contiguous. To boot on all CPUs on a multi-cluster SoC (e.g. big.LITTLE) we would need to rework the assignment of cpuids. As such I would expect us to keep a contiguous space. The place I would expect us to get a non-contiguous range on arm is if we grew support to offline CPUs. I think this will be needed on a few SoCs if we wish to run on many of the mobile chips. This may be needed for thermal and power reasons as many are only able to run for a short length of time before thermal throttling. On arm64 I'm planning on reworking the cpuid allocation code such that we may get sparse values, however I don't expect to have time for this in the next few months. Andrew From owner-freebsd-hackers@freebsd.org Wed Jul 6 14:51:30 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F8CFB75771; Wed, 6 Jul 2016 14:51:30 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x244.google.com (mail-yw0-x244.google.com [IPv6:2607:f8b0:4002:c05::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E4401869; Wed, 6 Jul 2016 14:51:30 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x244.google.com with SMTP id i12so11388258ywa.0; Wed, 06 Jul 2016 07:51:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=2W18IY4L2vU7WGhVf+mqioV0jxeyxVx64fPGK62GrME=; b=iCgXzL70stA25xEQyFCWP3+nAdvJukcQQs/2wUHO5DkQ9IEcDWjzk84DgGpfVa9Ozn i2d5iawMcB9mh0HySZxJlAot+5dpANnDjQ1pndjmN52XxtGzdq5wdHFoQ1LNs7MtbEmh 2lxYR75dPBYe8OQkfClyr1ab7kihFeQaKc5NPkolgYPnTzH63URwFtu0hx3rHDBSTrvR oMMrm0cuHeuQnCAvwcnrdLnKZl+t/2zmbJh/evJrpK08lZXCX1mgCi1tDhyC97pzJKjb GpcnkFYpqufmUjRo/e8liNrceyO8TWLB6Af10yHnwskeTM1fAtjbe3pKf2uSabCtmOTI +ojg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=2W18IY4L2vU7WGhVf+mqioV0jxeyxVx64fPGK62GrME=; b=S8oHXY36g1Lz8Z2ZnqUng0181qQrcbFuXTc94l4y/xq+dceO/rNEPHJDvOixd26tuy 2ujJ8EeRQrB8+RTLL4Lwl80FXEOU6ho/mTKJXpxyGR9VELfxlJUlV3xxLhrbS5qa6iEI kbPj6zI/h+jsimkqy9Hk4u8xkoHyVG77ZwTuEwcAyvPBYip2c8a3bSnmCSTEviD28Xe3 1lFk8SF3+Pm2Nb1wBoI+d7oiyr389acOYLAIMJDnPtRq3hRulLwn50ZaCpKDCKEzP6er uHAYvfNdghNwceftJ3Z7H3Vgxx0j/PnuFnCfJYcl4BCINHiYSqnJ8VMD8kr1iQDPNLWX lURg== X-Gm-Message-State: ALyK8tI1ddVDAcxk5v/zAHO7K/J42IhO+O7ZMtHS9NnoxJ8u6zcAPmxX1GmqD9yWvPyCfSSYjtn+I67NlXRvKQ== X-Received: by 10.129.82.21 with SMTP id g21mr14762175ywb.66.1467816688930; Wed, 06 Jul 2016 07:51:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 07:51:28 -0700 (PDT) In-Reply-To: References: From: David Cross Date: Wed, 6 Jul 2016 10:51:28 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) To: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Wed, 06 Jul 2016 16:16:01 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 14:51:30 -0000 Ok.. to reply to my own message, I using ktr and debugging printfs I have found the culprit.. but I am still at a loss to 'why', or what the appropriate fix is. Lets go back to the panic (simplified) #0 0xffffffff8043f160 at kdb_backtrace+0x60 #1 0xffffffff80401454 at vpanic+0x124 #2 0xffffffff804014e3 at panic+0x43 #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a #4 0xffffffff80499cc1 at brelse+0x151 #5 0xffffffff804979b1 at bufwrite+0x81 #6 0xffffffff80623c80 at ffs_write+0x4b0 #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 #11 0xffffffff806588e6 at vm_pageout_flush+0x116 #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 #13 0xffffffff80651519 at vm_object_page_clean+0x179 #14 0xffffffff80651102 at vm_object_terminate+0xa2 #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 #16 0xffffffff8062a52f at ufs_reclaim+0x1f #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 Via KTR logging I determined that the dangling dependedency was on a freshly allocated buf, *after* vinvalbuf in the vgonel() (so in VOP_RECLAIM itself), called by the vnode lru cleanup process; I further noticed that it was in a newbuf that recycled a bp (unimportant, except it let me narrow down my logging to something managable), from there I get this stacktrace (simplified) #0 0xffffffff8043f160 at kdb_backtrace+0x60 #1 0xffffffff8049c98e at getnewbuf+0x4be #2 0xffffffff804996a0 at getblk+0x830 #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 #4 0xffffffff80623b0b at ffs_write+0x33b #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 #9 0xffffffff806588e6 at vm_pageout_flush+0x116 #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 #11 0xffffffff80651519 at vm_object_page_clean+0x179 #12 0xffffffff80651102 at vm_object_terminate+0xa2 #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 #14 0xffffffff8062a52f at ufs_reclaim+0x1f #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 #16 0xffffffff804b6c6e at vgonel+0x2ee #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 addr2line on the ffs_balloc_ufs2 gives: /usr/src/sys/ufs/ffs/ffs_balloc.c:778: bp = getblk(vp, lbn, nsize, 0, 0, gbflags); bp->b_blkno = fsbtodb(fs, newb); if (flags & BA_CLRBUF) vfs_bio_clrbuf(bp); if (DOINGSOFTDEP(vp)) softdep_setup_allocdirect(ip, lbn, newb, 0, nsize, 0, bp); Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM handles this, this is after vinvalbuf is called, it expects that everything is flushed to disk and its just about releasing structures (is my read of the code). Now, perhaps this is a good assumption? the question then is how is this buffer hanging out there surviving a a vinvalbuf. I will note that my test-case that finds this runs and terminates *minutes* before... its not just hanging out there in a race, its surviving background sync, fsync, etc... wtf? Also, I *can* unmount the FS without an error, so that codepath is either ignoring this buffer, or its forcing a sync in a way that doesn't panic? Anyone have next steps? I am making progress here, but its really slow going, this is probably the most complex portion of the kernel and some pointers would be helpful. On Sat, Jul 2, 2016 at 2:31 PM, David Cross wrote: > Ok, I have been trying to trace this down for awhile..I know quite a bit > about it.. but there's a lot I don't know, or I would have a patch. I have > been trying to solve this on my own, but bringing in some outside > assistance will let me move on with my life. > > First up: The stacktrace (from a debugging kernel, with coredump > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > #1 0xffffffff8071018a in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:486 > #2 0xffffffff80710afc in vpanic ( > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", > ap=0xfffffe023ae5cf40) > at /usr/src/sys/kern/kern_shutdown.c:889 > #3 0xffffffff807108c0 in panic ( > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling deps > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") > at /usr/src/sys/kern/kern_shutdown.c:818 > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at > buf.h:428 > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) > at /usr/src/sys/kern/vfs_bio.c:1599 > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) > at /usr/src/sys/kern/vfs_bio.c:1180 > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, > a=0xfffffe023ae5d2b8) at vnode_if.c:999 > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) > at vnode_if.h:413 > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages > (vp=0xfffff80077e7a000, > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, > rtvals=0xfffffe023ae5d580) > at /usr/src/sys/vm/vnode_pager.c:1138 > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) > at /usr/src/sys/kern/vfs_default.c:760 > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, > a=0xfffffe023ae5d478) at vnode_if.c:2861 > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, > m=0xfffffe023ae5d660, count=16384, sync=1, rtvals=0xfffffe023ae5d580, > offset=0) at vnode_if.h:1189 > #16 0xffffffff80b196f3 in vnode_pager_putpages (object=0xfffff8014a1fce00, > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > at /usr/src/sys/vm/vnode_pager.c:1016 > #17 0xffffffff80b0a605 in vm_pager_put_pages (object=0xfffff8014a1fce00, > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > at vm_pager.h:144 > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, > count=4, > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, eio=0xfffffe023ae5d77c) > at /usr/src/sys/vm/vm_pageout.c:533 > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, > flags=1, > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) > at /usr/src/sys/vm/vm_object.c:971 > #20 0xffffffff80afe91e in vm_object_page_clean (object=0xfffff8014a1fce00, > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 > #21 0xffffffff80afe1fa in vm_object_terminate (object=0xfffff8014a1fce00) > at /usr/src/sys/vm/vm_object.c:735 > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000) > at /usr/src/sys/vm/vnode_pager.c:164 > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, > a=0xfffffe023ae5d968) at vnode_if.c:2019 > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, > td=0xfffff80008931960) at vnode_if.h:830 > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) > at /usr/src/sys/kern/vfs_subr.c:2943 > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) > at /usr/src/sys/kern/vfs_subr.c:882 > #29 0xffffffff80828ea9 in vnlru_proc () at > /usr/src/sys/kern/vfs_subr.c:1000 > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 > , > arg=0x0, frame=0xfffffe023ae5dc00) at > /usr/src/sys/kern/kern_fork.c:1027 > #31 0xffffffff80b21dce in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:611 > #32 0x0000000000000000 in ?? () > > This is a kernel compiled -O -g, its "almost" GENERIC; the only difference > is some removed drivers, I have reproduced this on a few different kernels, > including a BHYVE one so I can poke at it and not take out the main > machine. The reproduction as it currently stands needs to have jails > running, but I don't believe this is a jail interaction, I think its just > that the process that sets up the problem happens to be running in a jail. > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the > filesystem, when the vnlru forces the problem vnode out the system panics. > > I made a few modifications to the kernel to spit out information about the > buf that causes the issue, but that is it. > > Information about the buf in question; it has a single softdependency > worklist for direct allocation: > (kgdb) print *bp->b_dep->lh_first > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} > > The file that maps to that buffer: > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 > MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > Any help is appreciated, until then I will keep banging my head against > the proverbial wall on this :) > From owner-freebsd-hackers@freebsd.org Wed Jul 6 15:30:21 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 334C8B750BB; Wed, 6 Jul 2016 15:30:21 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x233.google.com (mail-yw0-x233.google.com [IPv6:2607:f8b0:4002:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D78D31E15; Wed, 6 Jul 2016 15:30:20 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x233.google.com with SMTP id v77so90432476ywg.0; Wed, 06 Jul 2016 08:30:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=2zIgEdwfw7Z1V6OGww1ImYgIa1uYbU7N6obh6XYYBrc=; b=SmIhSb1Q+w7zEtvd+HQHM5Jf2FdhefWaJtwsUrujd0SJQPQ/GSwrYx2Ej+50rrx0j+ zejoHuQ5OCkdAtQdZnKGZel/vG8ByU4C3DGxkbsl4+l2Iezv91Eiz/AF66KbR2guBvGL o4HPsnmR10zGDnIEx9mm2w4dIeSdxL0FZjObOI2Zbq14fope0PoG+0hxbaG1JU15XTWu MWzDC/KPChrc963y51oiMD3ODba3zJjz17CmEbk7Kezzd5OEvJHy8cWh2jdgRiyDChMs vAGt9CB+bMO2cCX7Kqw11Qu3zPHsaiAgxNV4T1Fwa/3ErRq4nGZbtqi9ym6ZsrgjwlPH ZQrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=2zIgEdwfw7Z1V6OGww1ImYgIa1uYbU7N6obh6XYYBrc=; b=NUF5ViTbPB4FlQLzdJfWKQKzfdUDM/gbs3NTvoLWQH4BjCypvYMhw8+JCJf9fzjq4C M6plTM7uZJHcj9U9Q+zn3/ulilXFsuXf4aPzTklGhXOofPE8pBnJ6IHGZPkKwFDPdrw+ /tesYSccOJEX6VNX55XgzlAnCPXPorx3vkdgMgqIQtpYuIYhE9QjMWNFUcAz7iJaOo+3 wW9dkyIfHN00SK1WrxCrYNfeJQIfAKPDlO76ZZ5QtPb7Fi36w83gsB32RDWkQk00qkXA 1lOs7A+RR5SSUtFAPjH9VYDzMKDJ77SrzpYZFVM+/ej4XJfLCW/9mm25pgn7bBVcgZaW kApA== X-Gm-Message-State: ALyK8tIuoJs71uy2FHLo/dBN0f99pH1lgDg8vOu1OjvbsArcnGfAi0qnvKZs8vC99ZYPpUTOrHE8AKV5B/HxVA== X-Received: by 10.129.5.215 with SMTP id 206mr16078637ywf.210.1467819019999; Wed, 06 Jul 2016 08:30:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 08:30:19 -0700 (PDT) In-Reply-To: <20160706151822.GC38613@kib.kiev.ua> References: <20160706151822.GC38613@kib.kiev.ua> From: David Cross Date: Wed, 6 Jul 2016 11:30:19 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) To: Konstantin Belousov Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Wed, 06 Jul 2016 16:23:31 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 15:30:21 -0000 No kernel messages before (if there were I would have written this off a long time ago); And as of right now, this is probably the most fsck-ed filesystem on the planet!.. I have an 'image' that I am going on that is ggate mounted, so I can access it in a bhyve VM to ease debuging so I am not crashing my real machine (with the real filesystem) all the time. One of my initial guesses was that this was a CG allocation error, but a dumpfs seems to show plenty of blocks in the CG to meet this need. Quick note on the testcase, I haven't totally isolated it yet, but the minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a ktrace on that shows that it unlinks 3 files, opens them, lseeks then, writes a block, and then mmaps them (but leaves them open). At process termination is munmaps, and then closes. I have tried to write a shorter reproduction that opens, seeks, mmaps (with the same flags), writes the mmaped memory, munmaps, closes and exits, but this has been insufficient to reproduce the issue; There is likely some specific pattern in the bdb5 code tickling this, and behind the mmap-ed interface it is all opaque, and the bdb5 code is pretty complex itself On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov wrote: > On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote: > > Ok.. to reply to my own message, I using ktr and debugging printfs I have > > found the culprit.. but I am still at a loss to 'why', or what the > > appropriate fix is. > > > > Lets go back to the panic (simplified) > > > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > > #1 0xffffffff80401454 at vpanic+0x124 > > #2 0xffffffff804014e3 at panic+0x43 > > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a > > #4 0xffffffff80499cc1 at brelse+0x151 > > #5 0xffffffff804979b1 at bufwrite+0x81 > > #6 0xffffffff80623c80 at ffs_write+0x4b0 > > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > > #11 0xffffffff806588e6 at vm_pageout_flush+0x116 > > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > > #13 0xffffffff80651519 at vm_object_page_clean+0x179 > > #14 0xffffffff80651102 at vm_object_terminate+0xa2 > > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > > #16 0xffffffff8062a52f at ufs_reclaim+0x1f > > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > > > > Via KTR logging I determined that the dangling dependedency was on a > > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in > VOP_RECLAIM > > itself), called by the vnode lru cleanup process; I further noticed that > it > > was in a newbuf that recycled a bp (unimportant, except it let me narrow > > down my logging to something managable), from there I get this stacktrace > > (simplified) > > > > #0 0xffffffff8043f160 at kdb_backtrace+0x60 > > #1 0xffffffff8049c98e at getnewbuf+0x4be > > #2 0xffffffff804996a0 at getblk+0x830 > > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 > > #4 0xffffffff80623b0b at ffs_write+0x33b > > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 > > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 > > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 > > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 > > #9 0xffffffff806588e6 at vm_pageout_flush+0x116 > > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 > > #11 0xffffffff80651519 at vm_object_page_clean+0x179 > > #12 0xffffffff80651102 at vm_object_terminate+0xa2 > > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 > > #14 0xffffffff8062a52f at ufs_reclaim+0x1f > > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 > > #16 0xffffffff804b6c6e at vgonel+0x2ee > > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 > > > > addr2line on the ffs_balloc_ufs2 gives: > > /usr/src/sys/ufs/ffs/ffs_balloc.c:778: > > > > bp = getblk(vp, lbn, nsize, 0, 0, gbflags); > > bp->b_blkno = fsbtodb(fs, newb); > > if (flags & BA_CLRBUF) > > vfs_bio_clrbuf(bp); > > if (DOINGSOFTDEP(vp)) > > softdep_setup_allocdirect(ip, lbn, newb, > 0, > > nsize, 0, bp); > > > > > > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM > > handles this, this is after vinvalbuf is called, it expects that > everything > > is flushed to disk and its just about releasing structures (is my read of > > the code). > > > > Now, perhaps this is a good assumption? the question then is how is this > > buffer hanging out there surviving a a vinvalbuf. I will note that my > > test-case that finds this runs and terminates *minutes* before... its not > > just hanging out there in a race, its surviving background sync, fsync, > > etc... wtf? Also, I *can* unmount the FS without an error, so that > > codepath is either ignoring this buffer, or its forcing a sync in a way > > that doesn't panic? > Most typical cause for the buffer dependencies not flushed is a buffer > write error. At least you could provide the printout of the buffer to > confirm or reject this assumption. > > Were there any kernel messages right before the panic ? Just in case, > did you fsck the volume before using it, after the previous panic ? > > > > > Anyone have next steps? I am making progress here, but its really slow > > going, this is probably the most complex portion of the kernel and some > > pointers would be helpful. > > > > On Sat, Jul 2, 2016 at 2:31 PM, David Cross > wrote: > > > > > Ok, I have been trying to trace this down for awhile..I know quite a > bit > > > about it.. but there's a lot I don't know, or I would have a patch. I > have > > > been trying to solve this on my own, but bringing in some outside > > > assistance will let me move on with my life. > > > > > > First up: The stacktrace (from a debugging kernel, with coredump > > > > > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 > > > #1 0xffffffff8071018a in kern_reboot (howto=260) > > > at /usr/src/sys/kern/kern_shutdown.c:486 > > > #2 0xffffffff80710afc in vpanic ( > > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling > deps > > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", > > > ap=0xfffffe023ae5cf40) > > > at /usr/src/sys/kern/kern_shutdown.c:889 > > > #3 0xffffffff807108c0 in panic ( > > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling > deps > > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") > > > at /usr/src/sys/kern/kern_shutdown.c:818 > > > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( > > > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 > > > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at > > > buf.h:428 > > > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) > > > at /usr/src/sys/kern/vfs_bio.c:1599 > > > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) > > > at /usr/src/sys/kern/vfs_bio.c:1180 > > > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 > > > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) > > > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 > > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, > > > a=0xfffffe023ae5d2b8) at vnode_if.c:999 > > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, > > > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) > > > at vnode_if.h:413 > > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages > > > (vp=0xfffff80077e7a000, > > > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, > > > rtvals=0xfffffe023ae5d580) > > > at /usr/src/sys/vm/vnode_pager.c:1138 > > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) > > > at /usr/src/sys/kern/vfs_default.c:760 > > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, > > > a=0xfffffe023ae5d478) at vnode_if.c:2861 > > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, > > > m=0xfffffe023ae5d660, count=16384, sync=1, > rtvals=0xfffffe023ae5d580, > > > offset=0) at vnode_if.h:1189 > > > #16 0xffffffff80b196f3 in vnode_pager_putpages > (object=0xfffff8014a1fce00, > > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > > at /usr/src/sys/vm/vnode_pager.c:1016 > > > #17 0xffffffff80b0a605 in vm_pager_put_pages > (object=0xfffff8014a1fce00, > > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) > > > at vm_pager.h:144 > > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, > > > count=4, > > > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, > eio=0xfffffe023ae5d77c) > > > at /usr/src/sys/vm/vm_pageout.c:533 > > > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( > > > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, > > > flags=1, > > > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) > > > at /usr/src/sys/vm/vm_object.c:971 > > > #20 0xffffffff80afe91e in vm_object_page_clean > (object=0xfffff8014a1fce00, > > > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 > > > #21 0xffffffff80afe1fa in vm_object_terminate > (object=0xfffff8014a1fce00) > > > at /usr/src/sys/vm/vm_object.c:735 > > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject (vp=0xfffff80077e7a000) > > > at /usr/src/sys/vm/vnode_pager.c:164 > > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) > > > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 > > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) > > > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 > > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, > > > a=0xfffffe023ae5d968) at vnode_if.c:2019 > > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, > > > td=0xfffff80008931960) at vnode_if.h:830 > > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) > > > at /usr/src/sys/kern/vfs_subr.c:2943 > > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) > > > at /usr/src/sys/kern/vfs_subr.c:882 > > > #29 0xffffffff80828ea9 in vnlru_proc () at > > > /usr/src/sys/kern/vfs_subr.c:1000 > > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 > > > , > > > arg=0x0, frame=0xfffffe023ae5dc00) at > > > /usr/src/sys/kern/kern_fork.c:1027 > > > #31 0xffffffff80b21dce in fork_trampoline () > > > at /usr/src/sys/amd64/amd64/exception.S:611 > > > #32 0x0000000000000000 in ?? () > > > > > > This is a kernel compiled -O -g, its "almost" GENERIC; the only > difference > > > is some removed drivers, I have reproduced this on a few different > kernels, > > > including a BHYVE one so I can poke at it and not take out the main > > > machine. The reproduction as it currently stands needs to have jails > > > running, but I don't believe this is a jail interaction, I think its > just > > > that the process that sets up the problem happens to be running in a > jail. > > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on the > > > filesystem, when the vnlru forces the problem vnode out the system > panics. > > > > > > I made a few modifications to the kernel to spit out information about > the > > > buf that causes the issue, but that is it. > > > > > > Information about the buf in question; it has a single softdependency > > > worklist for direct allocation: > > > (kgdb) print *bp->b_dep->lh_first > > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, > > > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} > > > > > > The file that maps to that buffer: > > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 > > > MOUNTPOINT/jails/mail/var/imap/db/__db.002 > > > > > > Any help is appreciated, until then I will keep banging my head against > > > the proverbial wall on this :) > > > > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " > From owner-freebsd-hackers@freebsd.org Wed Jul 6 16:02:01 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8688B7591D; Wed, 6 Jul 2016 16:02:01 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x22c.google.com (mail-yw0-x22c.google.com [IPv6:2607:f8b0:4002:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 932C61F1F; Wed, 6 Jul 2016 16:02:01 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x22c.google.com with SMTP id b72so91393961ywa.3; Wed, 06 Jul 2016 09:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=UrrFRO5z0hKpEZcOOJfaXV2i/GLH5kmJDlYFWxfHa3I=; b=Aafi8DHqDhuc+7z7gRdCLn2ReKH2MV6Q5RM6C+ZkAkj1O8P69MIv0ZFkd7hm/8irqt SqgBXWU8b8PLkvArerhesHh173Ti+sWa4urHdC4y1M49t+W4bEuMnGplMNKNAXlHKmBW wJVBLG1FjWoP34noF541b/905BK3ncs4ip0vWEWJQ7/Lykf1SNHOiWsEMbEA1I33JE5B Ksxnad2Ri5GN7ENHPWP/rtRilTA5w3wZdIWVVinJsR0Ff0dRmQxkK3CqqOZpda5n+fcB P/fC3dHbbwk22a4zjXxE+msb6Rgfs1V7sN7gVtkkojDvvmxpwe38xFk3MKi0cPUiKroI 2+cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=UrrFRO5z0hKpEZcOOJfaXV2i/GLH5kmJDlYFWxfHa3I=; b=Qm53g9XgoLFPaBGYQcC8U2dV8pnTnkT9ZA3pgLoZzRswVRVgrg7v5fBR+tqdmGIg7w DdyOlTzWxW5HG0ewTRMrNvVXqx6owoh//rYD4tOUgHJZK3l09okrjo1FNNOIe+Bx9yCS W1kpaxFgR0qyQeBsQgCSKFRYg34RsBekvcJJ0QSo4+1gMdSAHJTClInWX0BuvI1AidH0 UwMGaLkztwjnwVPj3/PJ+RC1e2UcsuURmN7zaktzaTqpXAYFdwOzH9KNAIrrCwRrUYBR Uj+rbzzrO9TtdXVSLKETpsseJAlXPh2FobLDvpiur2gw2lNvPALF+cya1CmONLfgGbeb t4Ww== X-Gm-Message-State: ALyK8tLO6d+wAhAfnlW3xCfaX06+1tMks4/1IsIFgXqZQlDEggugt5biuqRApBxGzyB0NkflEqA8fdP21FKnhQ== X-Received: by 10.37.211.136 with SMTP id e130mr15105100ybf.62.1467820920698; Wed, 06 Jul 2016 09:02:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 09:02:00 -0700 (PDT) In-Reply-To: References: <20160706151822.GC38613@kib.kiev.ua> From: David Cross Date: Wed, 6 Jul 2016 12:02:00 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) To: Konstantin Belousov Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Wed, 06 Jul 2016 16:35:55 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 16:02:01 -0000 Oh, whoops; how do I printout the buffer? On Wed, Jul 6, 2016 at 11:30 AM, David Cross wrote: > No kernel messages before (if there were I would have written this off a > long time ago); > And as of right now, this is probably the most fsck-ed filesystem on the > planet!.. I have an 'image' that I am going on that is ggate mounted, so I > can access it in a bhyve VM to ease debuging so I am not crashing my real > machine (with the real filesystem) all the time. > > One of my initial guesses was that this was a CG allocation error, but a > dumpfs seems to show plenty of blocks in the CG to meet this need. > > Quick note on the testcase, I haven't totally isolated it yet, but the > minimal reproduction is a 'ctl_cyrusdb -r", which runs a bdb5 recover op, a > ktrace on that shows that it unlinks 3 files, opens them, lseeks then, > writes a block, and then mmaps them (but leaves them open). At process > termination is munmaps, and then closes. I have tried to write a shorter > reproduction that opens, seeks, mmaps (with the same flags), writes the > mmaped memory, munmaps, closes and exits, but this has been insufficient to > reproduce the issue; There is likely some specific pattern in the bdb5 code > tickling this, and behind the mmap-ed interface it is all opaque, and the > bdb5 code is pretty complex itself > > On Wed, Jul 6, 2016 at 11:18 AM, Konstantin Belousov > wrote: > >> On Wed, Jul 06, 2016 at 10:51:28AM -0400, David Cross wrote: >> > Ok.. to reply to my own message, I using ktr and debugging printfs I >> have >> > found the culprit.. but I am still at a loss to 'why', or what the >> > appropriate fix is. >> > >> > Lets go back to the panic (simplified) >> > >> > #0 0xffffffff8043f160 at kdb_backtrace+0x60 >> > #1 0xffffffff80401454 at vpanic+0x124 >> > #2 0xffffffff804014e3 at panic+0x43 >> > #3 0xffffffff8060719a at softdep_deallocate_dependencies+0x6a >> > #4 0xffffffff80499cc1 at brelse+0x151 >> > #5 0xffffffff804979b1 at bufwrite+0x81 >> > #6 0xffffffff80623c80 at ffs_write+0x4b0 >> > #7 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 >> > #8 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 >> > #9 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 >> > #10 0xffffffff80661cc1 at vnode_pager_putpages+0x91 >> > #11 0xffffffff806588e6 at vm_pageout_flush+0x116 >> > #12 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 >> > #13 0xffffffff80651519 at vm_object_page_clean+0x179 >> > #14 0xffffffff80651102 at vm_object_terminate+0xa2 >> > #15 0xffffffff806621a5 at vnode_destroy_vobject+0x85 >> > #16 0xffffffff8062a52f at ufs_reclaim+0x1f >> > #17 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 >> > >> > Via KTR logging I determined that the dangling dependedency was on a >> > freshly allocated buf, *after* vinvalbuf in the vgonel() (so in >> VOP_RECLAIM >> > itself), called by the vnode lru cleanup process; I further noticed >> that it >> > was in a newbuf that recycled a bp (unimportant, except it let me narrow >> > down my logging to something managable), from there I get this >> stacktrace >> > (simplified) >> > >> > #0 0xffffffff8043f160 at kdb_backtrace+0x60 >> > #1 0xffffffff8049c98e at getnewbuf+0x4be >> > #2 0xffffffff804996a0 at getblk+0x830 >> > #3 0xffffffff805fb207 at ffs_balloc_ufs2+0x1327 >> > #4 0xffffffff80623b0b at ffs_write+0x33b >> > #5 0xffffffff806ce9a4 at VOP_WRITE_APV+0x1c4 >> > #6 0xffffffff806639e3 at vnode_pager_generic_putpages+0x293 >> > #7 0xffffffff806d2102 at VOP_PUTPAGES_APV+0x142 >> > #8 0xffffffff80661cc1 at vnode_pager_putpages+0x91 >> > #9 0xffffffff806588e6 at vm_pageout_flush+0x116 >> > #10 0xffffffff806517e2 at vm_object_page_collect_flush+0x1c2 >> > #11 0xffffffff80651519 at vm_object_page_clean+0x179 >> > #12 0xffffffff80651102 at vm_object_terminate+0xa2 >> > #13 0xffffffff806621a5 at vnode_destroy_vobject+0x85 >> > #14 0xffffffff8062a52f at ufs_reclaim+0x1f >> > #15 0xffffffff806d0782 at VOP_RECLAIM_APV+0x142 >> > #16 0xffffffff804b6c6e at vgonel+0x2ee >> > #17 0xffffffff804ba6f5 at vnlru_proc+0x4b5 >> > >> > addr2line on the ffs_balloc_ufs2 gives: >> > /usr/src/sys/ufs/ffs/ffs_balloc.c:778: >> > >> > bp = getblk(vp, lbn, nsize, 0, 0, gbflags); >> > bp->b_blkno = fsbtodb(fs, newb); >> > if (flags & BA_CLRBUF) >> > vfs_bio_clrbuf(bp); >> > if (DOINGSOFTDEP(vp)) >> > softdep_setup_allocdirect(ip, lbn, >> newb, 0, >> > nsize, 0, bp); >> > >> > >> > Boom, freshly allocated buffer with a dependecy; nothing in VOP_RECLAIM >> > handles this, this is after vinvalbuf is called, it expects that >> everything >> > is flushed to disk and its just about releasing structures (is my read >> of >> > the code). >> > >> > Now, perhaps this is a good assumption? the question then is how is >> this >> > buffer hanging out there surviving a a vinvalbuf. I will note that my >> > test-case that finds this runs and terminates *minutes* before... its >> not >> > just hanging out there in a race, its surviving background sync, fsync, >> > etc... wtf? Also, I *can* unmount the FS without an error, so that >> > codepath is either ignoring this buffer, or its forcing a sync in a way >> > that doesn't panic? >> Most typical cause for the buffer dependencies not flushed is a buffer >> write error. At least you could provide the printout of the buffer to >> confirm or reject this assumption. >> >> Were there any kernel messages right before the panic ? Just in case, >> did you fsck the volume before using it, after the previous panic ? >> >> > >> > Anyone have next steps? I am making progress here, but its really slow >> > going, this is probably the most complex portion of the kernel and some >> > pointers would be helpful. >> > >> > On Sat, Jul 2, 2016 at 2:31 PM, David Cross >> wrote: >> > >> > > Ok, I have been trying to trace this down for awhile..I know quite a >> bit >> > > about it.. but there's a lot I don't know, or I would have a patch. >> I have >> > > been trying to solve this on my own, but bringing in some outside >> > > assistance will let me move on with my life. >> > > >> > > First up: The stacktrace (from a debugging kernel, with coredump >> > > >> > > #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> > > #1 0xffffffff8071018a in kern_reboot (howto=260) >> > > at /usr/src/sys/kern/kern_shutdown.c:486 >> > > #2 0xffffffff80710afc in vpanic ( >> > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling >> deps >> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d", >> > > ap=0xfffffe023ae5cf40) >> > > at /usr/src/sys/kern/kern_shutdown.c:889 >> > > #3 0xffffffff807108c0 in panic ( >> > > fmt=0xffffffff80c7a325 "softdep_deallocate_dependencies: dangling >> deps >> > > b_ioflags: %d, b_bufsize: %ld, b_flags: %d, bo_flag: %d") >> > > at /usr/src/sys/kern/kern_shutdown.c:818 >> > > #4 0xffffffff80a7c841 in softdep_deallocate_dependencies ( >> > > bp=0xfffffe01f030e148) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14099 >> > > #5 0xffffffff807f793f in buf_deallocate (bp=0xfffffe01f030e148) at >> > > buf.h:428 >> > > #6 0xffffffff807f59c9 in brelse (bp=0xfffffe01f030e148) >> > > at /usr/src/sys/kern/vfs_bio.c:1599 >> > > #7 0xffffffff807f3132 in bufwrite (bp=0xfffffe01f030e148) >> > > at /usr/src/sys/kern/vfs_bio.c:1180 >> > > #8 0xffffffff80ab226a in bwrite (bp=0xfffffe01f030e148) at buf.h:395 >> > > #9 0xffffffff80aafb1b in ffs_write (ap=0xfffffe023ae5d2b8) >> > > at /usr/src/sys/ufs/ffs/ffs_vnops.c:800 >> > > #10 0xffffffff80bdf0ed in VOP_WRITE_APV (vop=0xffffffff80f15480, >> > > a=0xfffffe023ae5d2b8) at vnode_if.c:999 >> > > #11 0xffffffff80b1d02e in VOP_WRITE (vp=0xfffff80077e7a000, >> > > uio=0xfffffe023ae5d378, ioflag=8323232, cred=0xfffff80004235000) >> > > at vnode_if.h:413 >> > > #12 0xffffffff80b1ce97 in vnode_pager_generic_putpages >> > > (vp=0xfffff80077e7a000, >> > > ma=0xfffffe023ae5d660, bytecount=16384, flags=1, >> > > rtvals=0xfffffe023ae5d580) >> > > at /usr/src/sys/vm/vnode_pager.c:1138 >> > > #13 0xffffffff80805a57 in vop_stdputpages (ap=0xfffffe023ae5d478) >> > > at /usr/src/sys/kern/vfs_default.c:760 >> > > #14 0xffffffff80be201e in VOP_PUTPAGES_APV (vop=0xffffffff80f00218, >> > > a=0xfffffe023ae5d478) at vnode_if.c:2861 >> > > #15 0xffffffff80b1d7e3 in VOP_PUTPAGES (vp=0xfffff80077e7a000, >> > > m=0xfffffe023ae5d660, count=16384, sync=1, >> rtvals=0xfffffe023ae5d580, >> > > offset=0) at vnode_if.h:1189 >> > > #16 0xffffffff80b196f3 in vnode_pager_putpages >> (object=0xfffff8014a1fce00, >> > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) >> > > at /usr/src/sys/vm/vnode_pager.c:1016 >> > > #17 0xffffffff80b0a605 in vm_pager_put_pages >> (object=0xfffff8014a1fce00, >> > > m=0xfffffe023ae5d660, count=4, flags=1, rtvals=0xfffffe023ae5d580) >> > > at vm_pager.h:144 >> > > #18 0xffffffff80b0a18c in vm_pageout_flush (mc=0xfffffe023ae5d660, >> > > count=4, >> > > flags=1, mreq=0, prunlen=0xfffffe023ae5d6f8, >> eio=0xfffffe023ae5d77c) >> > > at /usr/src/sys/vm/vm_pageout.c:533 >> > > #19 0xffffffff80afec76 in vm_object_page_collect_flush ( >> > > object=0xfffff8014a1fce00, p=0xfffff8023a882370, pagerflags=1, >> > > flags=1, >> > > clearobjflags=0xfffffe023ae5d780, eio=0xfffffe023ae5d77c) >> > > at /usr/src/sys/vm/vm_object.c:971 >> > > #20 0xffffffff80afe91e in vm_object_page_clean >> (object=0xfffff8014a1fce00, >> > > start=0, end=0, flags=1) at /usr/src/sys/vm/vm_object.c:897 >> > > #21 0xffffffff80afe1fa in vm_object_terminate >> (object=0xfffff8014a1fce00) >> > > at /usr/src/sys/vm/vm_object.c:735 >> > > #22 0xffffffff80b1a0f1 in vnode_destroy_vobject >> (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/vm/vnode_pager.c:164 >> > > #23 0xffffffff80abb191 in ufs_prepare_reclaim (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/ufs/ufs/ufs_inode.c:190 >> > > #24 0xffffffff80abb1f9 in ufs_reclaim (ap=0xfffffe023ae5d968) >> > > at /usr/src/sys/ufs/ufs/ufs_inode.c:219 >> > > #25 0xffffffff80be0ade in VOP_RECLAIM_APV (vop=0xffffffff80f15ec0, >> > > a=0xfffffe023ae5d968) at vnode_if.c:2019 >> > > #26 0xffffffff80827849 in VOP_RECLAIM (vp=0xfffff80077e7a000, >> > > td=0xfffff80008931960) at vnode_if.h:830 >> > > #27 0xffffffff808219a9 in vgonel (vp=0xfffff80077e7a000) >> > > at /usr/src/sys/kern/vfs_subr.c:2943 >> > > #28 0xffffffff808294e8 in vlrureclaim (mp=0xfffff80008b2e000) >> > > at /usr/src/sys/kern/vfs_subr.c:882 >> > > #29 0xffffffff80828ea9 in vnlru_proc () at >> > > /usr/src/sys/kern/vfs_subr.c:1000 >> > > #30 0xffffffff806b66c5 in fork_exit (callout=0xffffffff80828c50 >> > > , >> > > arg=0x0, frame=0xfffffe023ae5dc00) at >> > > /usr/src/sys/kern/kern_fork.c:1027 >> > > #31 0xffffffff80b21dce in fork_trampoline () >> > > at /usr/src/sys/amd64/amd64/exception.S:611 >> > > #32 0x0000000000000000 in ?? () >> > > >> > > This is a kernel compiled -O -g, its "almost" GENERIC; the only >> difference >> > > is some removed drivers, I have reproduced this on a few different >> kernels, >> > > including a BHYVE one so I can poke at it and not take out the main >> > > machine. The reproduction as it currently stands needs to have jails >> > > running, but I don't believe this is a jail interaction, I think its >> just >> > > that the process that sets up the problem happens to be running in a >> jail. >> > > The step is "start jail; run "find /mountpoint -xdev >/dev/null" on >> the >> > > filesystem, when the vnlru forces the problem vnode out the system >> panics. >> > > >> > > I made a few modifications to the kernel to spit out information >> about the >> > > buf that causes the issue, but that is it. >> > > >> > > Information about the buf in question; it has a single softdependency >> > > worklist for direct allocation: >> > > (kgdb) print *bp->b_dep->lh_first >> > > $6 = {wk_list = {le_next = 0x0, le_prev = 0xfffffe01f030e378}, >> > > wk_mp = 0xfffff80008b2e000, wk_type = 4, wk_state = 163841} >> > > >> > > The file that maps to that buffer: >> > > ls -lh MOUNTPOINT/jails/mail/var/imap/db/__db.002 >> > > -rw------- 1 cyrus cyrus 24K Jul 1 20:32 >> > > MOUNTPOINT/jails/mail/var/imap/db/__db.002 >> > > >> > > Any help is appreciated, until then I will keep banging my head >> against >> > > the proverbial wall on this :) >> > > >> > _______________________________________________ >> > freebsd-stable@freebsd.org mailing list >> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> > To unsubscribe, send any mail to " >> freebsd-stable-unsubscribe@freebsd.org" >> > > From owner-freebsd-hackers@freebsd.org Wed Jul 6 17:38:04 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 429DBB75D61; Wed, 6 Jul 2016 17:38:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CF32B151D; Wed, 6 Jul 2016 17:38:03 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u66Hbwgs063744 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 6 Jul 2016 20:37:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u66Hbwgs063744 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u66HbwNk063743; Wed, 6 Jul 2016 20:37:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 6 Jul 2016 20:37:58 +0300 From: Konstantin Belousov To: David Cross Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) Message-ID: <20160706173758.GF38613@kib.kiev.ua> References: <20160706151822.GC38613@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 17:38:04 -0000 On Wed, Jul 06, 2016 at 12:02:00PM -0400, David Cross wrote: > Oh, whoops; how do I printout the buffer? In kgdb, p/x *(struct buf *)address From owner-freebsd-hackers@freebsd.org Wed Jul 6 18:21:22 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 21921B758E1; Wed, 6 Jul 2016 18:21:22 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com [IPv6:2607:f8b0:4002:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D0B001E4D; Wed, 6 Jul 2016 18:21:21 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x22b.google.com with SMTP id i12so93695696ywa.1; Wed, 06 Jul 2016 11:21:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ikdf9Rq5KobUSGbfr4AZiaSUJp18yozYm7mOPTwnuUY=; b=DEqHrWCXbjcIYFI1pGyosuciVRwkrQH/ArI4mdJjdasWdIr0xdtZkpcO2QXD+o0gnS WJAnoNND6plO8Njl2BVvxNjasYKWt986YwtpFF6GCHvzNSwWo1zqeW7EzQcnd6qmMHq9 YXG7IKYGyh1g0GtgfahKUwnUzPX5c6T7dI9+E23Og7/cj3VnXCtrGc8Q3P0UHcrq/Y5T kFAO0StGgoskh2HGWPezYJYTCCcwoYdraqtEoCWw60/ej6cGN9J7ptl+44SP0gP4Qsxk AH/WWryVPoJhrweV9ctEkk7Rr8hsUyOB3yRYwCLAlSarduyTrsm0BuUYF5k+ra111xp1 Yw5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ikdf9Rq5KobUSGbfr4AZiaSUJp18yozYm7mOPTwnuUY=; b=Lk7j6Afh5UxY+3bMfhdFN1pnRP12mP0tDr46r4X182IMQQcmzIrGXERSXOOprYAtnn omdBR/azu5KChkAoKsJKhJFg0zjVYCMxeiW7QQ4Ht2i9DVNAXDAVwpgjcxf2H0NuXwff 8Dplat+eaD4PJn2Fh4aca+X8oGDVmj8BHKaa7U/jXZ0PnE9Db8hFFazTDJ9g6kBq1Ocl lkTkOdE4kPzkqYj5dHZCmi8NECuanjCTVqRO0VVJpmV0189E5/d1/iw70FeOG9ijh40S 444rcMhrfz+p9V0amPdNyMdC5qF55zYb8VPczP++N8L2ZzQF1fuso8dgMeHG+29QlhYS 3Opw== X-Gm-Message-State: ALyK8tIr8B4SmAoSIsmDAFIZfEaoMsXeq1XVYe3MzxjanVoR6pZWrN0u6Q8gz3QCxRffJO4rwwMiHuCp5JuTQQ== X-Received: by 10.129.102.195 with SMTP id a186mr15678812ywc.76.1467829281073; Wed, 06 Jul 2016 11:21:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Wed, 6 Jul 2016 11:21:20 -0700 (PDT) In-Reply-To: <20160706173758.GF38613@kib.kiev.ua> References: <20160706151822.GC38613@kib.kiev.ua> <20160706173758.GF38613@kib.kiev.ua> From: David Cross Date: Wed, 6 Jul 2016 14:21:20 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) To: Konstantin Belousov Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Wed, 06 Jul 2016 18:22:55 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 18:21:22 -0000 (kgdb) up 5 #5 0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428 428 (*bioops.io_deallocate)(bp); Current language: auto; currently minimal (kgdb) p/x *(struct buf *)0xfffffe00f77457d0 $1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0, b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags = 0x0, b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400, b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev = 0xfffff80002e884d0}, b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28, tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800, b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b, lo_flags = 0x6730000, lo_data = 0x0, lo_witness = 0xfffffe0000602f00}, lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0, lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0, b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000, b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0, b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0, b_pager = { pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0, tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}}, b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180, 0xfffff800b99b31e8, 0x0 }, b_npages = 0x4, b_dep = { lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, b_pin_count = 0x0} This is the freshly allocated buf that causes the panic; is this what is needed? I "know" which vnode will cause the panic on vnlru cleanup, but I don't know how to walk the memory list without a 'hook'.. as in, i can setup the kernel in a state that I know will panic when the vnode is cleaned up, I can force a panic 'early' (kill -9 1), and then I could get that vnode.. if I could get the vnode list to walk. On Wed, Jul 6, 2016 at 1:37 PM, Konstantin Belousov wrote: > On Wed, Jul 06, 2016 at 12:02:00PM -0400, David Cross wrote: > > Oh, whoops; how do I printout the buffer? > > In kgdb, p/x *(struct buf *)address > From owner-freebsd-hackers@freebsd.org Wed Jul 6 18:49:54 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B39FB75DCC for ; Wed, 6 Jul 2016 18:49:54 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0560A1CFC for ; Wed, 6 Jul 2016 18:49:53 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 8C21622309B for ; Wed, 6 Jul 2016 13:49:45 -0500 (CDT) To: freebsd-hackers@freebsd.org From: Karl Denninger Subject: Huh? Message-ID: Date: Wed, 6 Jul 2016 13:49:26 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms030607060708050606010802" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 18:49:54 -0000 This is a cryptographically signed message in MIME format. --------------ms030607060708050606010802 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ok, what did I break... On my development box with 11-Alpha6: root@Dbms2:/usr/src # svn update . Updating '.': svn: E170013: Unable to connect to a repository at URL 'https://svn.freebsd.org/base/head' svn: E000065: Error running context: No route to host svnlite works..... so yeah, that path is good (obviously) --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030607060708050606010802 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDYxODQ5MjZaME8GCSqGSIb3DQEJBDFCBEBO tr9OFyd25jV3tctupLH3iTrcJmIz3kCo4EocmjDglPAbe7AWzZ0BnRhl+oaFg3QTbPw+2ZBh N1E1gxG+3IHRMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAtxVllSp8 jlMA8WwO3gnfCAZuBuBY9PmXXnFkwJgUixoCOG0ZYp1iZtYkesY3KmYCqav0C1gZiZyz7jA5 ogtfdBzOJ6gWmhncK7Z6CqgfuZL9y4xstDTrbRHHMHOFp9EBChUEWWlOte9PuviD4MalzSWW l9DeM3DQ2k0WnGVoXKLjRmhBCAZD8pVkYyfskkoErECk1K5xY4A4goV5jCu3kYD4IQ104qYV 6dcRpf8VUnGQyx48br7L1DmnNFqdL6LqA351P88yxzqCE4hhVuniKc8LwCtmqja4y2QUOi5b +7Lw9s05jJNhzh4qn/Z0Vdqk2vtEDqnWwO14cn3anjHRS0eCTLLJZ7m+EqkCDpBCbQEkSF5W F4iXifegikzK7suubY/qu/2ZDDIhqGGAoapjHXFACqUnUiKgMG5A1Bx8RcTKHf+roh1N0O6A ZKvS1a0EQfKH0lV91uXwZJkb1ODvUIutH6+wks0wegaREq5+C+7ygP47xho2RdyU/hvdM6Ac /IT2ziLNLpCmAoK1Pwgz3RB7PeUF65TjgPJiu+MZW6wFlCRt48Sbw+1zoiUFM7SWSYL+OZHO pPvQqq1Nuxcc069mMOLTo8GoHrmR7FBeUrb+HBo6Z6t5YF4A7yfoNrrmFGmlE/a8Al1TzKfT pYNeeexO/dYsArjStO+wKjoLBk4AAAAAAAA= --------------ms030607060708050606010802-- From owner-freebsd-hackers@freebsd.org Wed Jul 6 19:52:07 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05D14B75C0E for ; Wed, 6 Jul 2016 19:52:07 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: from mail-qt0-x233.google.com (mail-qt0-x233.google.com [IPv6:2607:f8b0:400d:c0d::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B70B51986 for ; Wed, 6 Jul 2016 19:52:06 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: by mail-qt0-x233.google.com with SMTP id m2so122191195qtd.1 for ; Wed, 06 Jul 2016 12:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=0I+Pibr4lZIuxzTF/6utmkLulpZ2CRz3WbaWfv7lj6g=; b=Vx3s9mjLmko1Nzq9TdbVXbEZN9HY6eJef5ykBthiVtrkRtt7duH11G/zRB6zcnP5UQ RNXqyQFD0Sl55mwbh0eCdK1jtHflOIZSApu3PBO1tc2xneZP4bgNGUyXRzZhA+2m+ca+ Ai+EyspSieqGcWmSTB8shuxTo3Im3lvzUjIXCQyj2VdgY2kvktvIUkd50SOIul/oysmD jT2Py3oXbWTuq19f/ii2aLjf/JHGSNN0Cd3AU4o7lIcQx/99VkQNlsYv+gVxWmPUlPHC LF54gScWUtI8A8D4KecavLPBQ8wVMHSyJsAbkB91Erp05qTZuaEOmxZxbniYZD0UBS81 xjYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0I+Pibr4lZIuxzTF/6utmkLulpZ2CRz3WbaWfv7lj6g=; b=WdEFSoPGSwrGPi9S5htbdf2BwGJr8jQrqSw+kKPdK/uxAdM13taLgZbwEK6CRinola tg+wvXujWxSbrlVXg/vlda3lMQYlcncN9Ke2tKVIjAmeqEp0dplrB4SQnShWQfyOyK7+ 4yumEH7toU/zp9FIlMRNleywbIsVqrrzG+uSVLgHYlodaq+LfVRKFC2wAAtaBo68d8ID 6YKF/N6NGSTMkvIWYeDz8CxOEJmv/9iKhjqBbGYXwuI99Omo7HvLFSVG3a10bJWgRyRP iAZ2QyXZUzgUOxsJYwz7JR0MUdl3G2NVDzOiOcuq8mojcKCY6Yn/2YuqAa427wy53uRi GG9A== X-Gm-Message-State: ALyK8tK/6uYI9eAyiMNyLn46Del7OfTjm17FKvDXRMuvWB27X66huPCgxTrW52aWD91Su8PwfTqmpCQw1No0Fg== X-Received: by 10.200.34.157 with SMTP id f29mr38321468qta.46.1467834725869; Wed, 06 Jul 2016 12:52:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.55.148.131 with HTTP; Wed, 6 Jul 2016 12:52:05 -0700 (PDT) In-Reply-To: References: From: Ngie Cooper Date: Wed, 6 Jul 2016 12:52:05 -0700 Message-ID: Subject: Re: Huh? To: Karl Denninger Cc: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 19:52:07 -0000 On Wed, Jul 6, 2016 at 11:49 AM, Karl Denninger wrote: > Ok, what did I break... > > On my development box with 11-Alpha6: > > root@Dbms2:/usr/src # svn update . > Updating '.': > svn: E170013: Unable to connect to a repository at URL > 'https://svn.freebsd.org/base/head' > svn: E000065: Error running context: No route to host > > svnlite works..... so yeah, that path is good (obviously) Could you please run "svn --version" and put the output here? Thanks, -Ngie From owner-freebsd-hackers@freebsd.org Wed Jul 6 20:24:55 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7A48B751C8 for ; Wed, 6 Jul 2016 20:24:55 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) by mx1.freebsd.org (Postfix) with ESMTP id 9FE1018B6 for ; Wed, 6 Jul 2016 20:24:55 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (unknown [10.1.1.2]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 7D096D169 for ; Wed, 6 Jul 2016 20:24:54 +0000 (UTC) Subject: Re: Huh? To: freebsd-hackers@freebsd.org References: From: Allan Jude Message-ID: <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org> Date: Wed, 6 Jul 2016 16:24:54 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 20:24:55 -0000 On 2016-07-06 14:49, Karl Denninger wrote: > Ok, what did I break... > > On my development box with 11-Alpha6: > > root@Dbms2:/usr/src # svn update . > Updating '.': > svn: E170013: Unable to connect to a repository at URL > 'https://svn.freebsd.org/base/head' > svn: E000065: Error running context: No route to host > > svnlite works..... so yeah, that path is good (obviously) > > Do you have broken ipv6? svn will try v6 first, and you maybe don't have a route. -- Allan Jude From owner-freebsd-hackers@freebsd.org Wed Jul 6 20:29:08 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 049C6B7531E for ; Wed, 6 Jul 2016 20:29:08 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CB91D1AC3 for ; Wed, 6 Jul 2016 20:29:06 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 8D47F22379F for ; Wed, 6 Jul 2016 15:29:05 -0500 (CDT) Subject: Re: Huh? To: freebsd-hackers@freebsd.org References: <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org> From: Karl Denninger Message-ID: Date: Wed, 6 Jul 2016 15:28:46 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <5b1f6b13-e3b9-d4ad-6909-5fd728b94482@freebsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms020404030302000605070109" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 20:29:08 -0000 This is a cryptographically signed message in MIME format. --------------ms020404030302000605070109 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/6/2016 15:24, Allan Jude wrote: > On 2016-07-06 14:49, Karl Denninger wrote: >> Ok, what did I break... >> >> On my development box with 11-Alpha6: >> >> root@Dbms2:/usr/src # svn update . >> Updating '.': >> svn: E170013: Unable to connect to a repository at URL >> 'https://svn.freebsd.org/base/head' >> svn: E000065: Error running context: No route to host >> >> svnlite works..... so yeah, that path is good (obviously) >> >> > Do you have broken ipv6? > > svn will try v6 first, and you maybe don't have a route. > Actually it tries ipv6 first and never tries ipv4! I have no Ipv6 service here so the "no route" is correct. However, the resolver DID return an Ipv4 address as well (I snooped the line with tcpdump) but svn never attempts the v4 connection. That sure looks broken to me. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms020404030302000605070109 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MDYyMDI4NDZaME8GCSqGSIb3DQEJBDFCBEDX 0CyfbJDxIMbalCj8CkYKBKUi60cO1xiIaDL086rEmQsjkrvr/s70mf8ZZ5g+FpVoEvR8Jx+t tyCt+ghD4sO/MGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAYiMch17L 7HrMNUByEE9sKGg0J4Savu6OeIv7u9Sr5GPmXdqZDgWOJmJTUuvfpKtkFzpIZiD1pToAojtl sLKuYlnQ9viUWdrXOlDA5ra3tvG7B1B5EoS+EEba/3wR5xwWYjQq+1bmKyoG6TtOuhbsDZwJ DAZKnPfh+fFVzIj7EzD/tzFQOlOnaEu+E+VxEp0IBEjtarE7ghbt4arHe8AUm4+qYD/8Yd// GSshmoAaZspwk0qBWShra2D7C1fwg+rdRIGWP1DUhLWQTTNmdjc4ZGZmiivS0Fk5cYjDqtVP MBBjgM+rw/MCIu8G8NiByMf/nDIF8b1QDXdhT0nTLHDLmBWs8i0WC0noQGeOjvMNWId/QHWL OGq4ITVj6Agu7kYpBjDdTGlLg9NTv6RsF1YD73BeEhFfm00pi/FI7MnnfgAIKH4DiyhsPyF3 1iCr255KWCSrjzmmOxQHo7Y1CqMfhotbUmabLltcVHQ4QI/zAOL+1sioy0Htp3LhFp1+b7Q0 vedEuAaPtlCP/dpaVV+qPEQOPhXZyDR9QmwjAB5t7MxSuvFa9dbi46Qodyg/C0OsU4B7815j itnV06BWCYM41DIBCkx7pZSAGwLousVn9GOyVaxrRILdZmMA9VxxfP7s8HBO61u1iWzNGBoX DhdHayUmo2doWi2ogWzBABQ4lF8AAAAAAAA= --------------ms020404030302000605070109-- From owner-freebsd-hackers@freebsd.org Wed Jul 6 22:13:07 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CA42B75D3C for ; Wed, 6 Jul 2016 22:13:07 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5E655160A for ; Wed, 6 Jul 2016 22:13:07 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190]) (authenticated bits=0) by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u66MD0oo015679 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 6 Jul 2016 15:13:01 -0700 (PDT) (envelope-from yuri@rawbw.com) X-Authentication-Warning: shell1.rawbw.com: Host c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be yuri.doctorlan.com From: Yuri Subject: Why kinfo_getvmmap is sometimes so expensive? To: Freebsd hackers list Message-ID: Date: Wed, 6 Jul 2016 15:12:59 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 22:13:07 -0000 The function getProcessSizeBytes, calculating the total size of the process, runs once per second. I have two processes of the same kind, but with the different run history. Process #1 didn't do much work, its total size is 1.5 GB, google perftools library says that it currently has 1.2GB allocated. Process #2 did a lot of work, its total size is 6.9 GB, but most of the used memory was freed, and google perftools library also says that it currently has only 1.2GB allocated. Both processes have about 140 lines in /proc//map. What bothers me is that getProcessSizeBytes run once per second makes process #1 to consume ~0.5% CPU, and process #2 to consume ~14% CPU. When I stop running getProcessSizeBytes, CPU times of both processes go to zero. Obviously, google perftools doesn't unmap the memory, and the totals of block sizes in /proc//map is much higher for process #2 with about the same block count. But why does this cause 14% of CPU consumption? And why another, similar process that goes through about the same number of blocks only has 0.5% CPU consumption? uint64_t getProcessSizeBytes() { int i, cnt = 0; struct kinfo_vmentry *kvm0, *kvm; m_uint64_t memSz = 0; kvm0 = ::kinfo_getvmmap(::getpid(), &cnt); for (i = 0, kvm = kvm0; ikve_end-kvm->kve_start); free(kvm0); return (memSz); } FreeBSD 10.3 Yuri From owner-freebsd-hackers@freebsd.org Thu Jul 7 00:12:29 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4B63B758D1; Thu, 7 Jul 2016 00:12:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 857D5126D; Thu, 7 Jul 2016 00:12:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u670CLT1011655 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 7 Jul 2016 03:12:21 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u670CLT1011655 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u670CIZE011654; Thu, 7 Jul 2016 03:12:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 7 Jul 2016 03:12:18 +0300 From: Konstantin Belousov To: David Cross Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) Message-ID: <20160707001218.GI38613@kib.kiev.ua> References: <20160706151822.GC38613@kib.kiev.ua> <20160706173758.GF38613@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 00:12:30 -0000 On Wed, Jul 06, 2016 at 02:21:20PM -0400, David Cross wrote: > (kgdb) up 5 > #5 0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428 > 428 (*bioops.io_deallocate)(bp); > Current language: auto; currently minimal > (kgdb) p/x *(struct buf *)0xfffffe00f77457d0 > $1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0, > b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags = > 0x0, > b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400, > b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev = > 0xfffff80002e884d0}, > b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28, > tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800, > b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b, > lo_flags = 0x6730000, lo_data = 0x0, lo_witness = > 0xfffffe0000602f00}, > lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0, > lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0, > b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000, > b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0, > b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0, b_pager > = { > pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0, > tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}}, > b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180, > 0xfffff800b99b31e8, 0x0 }, b_npages = 0x4, b_dep = { > lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, > b_fsprivate3 = 0x0, b_pin_count = 0x0} > > > This is the freshly allocated buf that causes the panic; is this what is > needed? I "know" which vnode will cause the panic on vnlru cleanup, but I > don't know how to walk the memory list without a 'hook'.. as in, i can > setup the kernel in a state that I know will panic when the vnode is > cleaned up, I can force a panic 'early' (kill -9 1), and then I could get > that vnode.. if I could get the vnode list to walk. Was the state printed after the panic occured ? What is strange is that buffer was not even tried for i/o, AFAIS. Apart from empty b_error/b_iocmd, the b_lblkno is zero, which means that the buffer was never allocated on the disk. The b_blkno looks strangely high. Can you print *(bp->b_vp) ? If it is UFS vnode, do p *(struct inode)(->v_data). I am esp. interested in the vnode size. Can you reproduce the problem on HEAD ? From owner-freebsd-hackers@freebsd.org Thu Jul 7 00:19:19 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0D8E5B75B38 for ; Thu, 7 Jul 2016 00:19:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 834E615A5 for ; Thu, 7 Jul 2016 00:19:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u670JEoX012886 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 7 Jul 2016 03:19:14 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u670JEoX012886 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u670JDvO012885; Thu, 7 Jul 2016 03:19:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 7 Jul 2016 03:19:13 +0300 From: Konstantin Belousov To: Yuri Cc: Freebsd hackers list Subject: Re: Why kinfo_getvmmap is sometimes so expensive? Message-ID: <20160707001913.GJ38613@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 00:19:19 -0000 On Wed, Jul 06, 2016 at 03:12:59PM -0700, Yuri wrote: > The function getProcessSizeBytes, calculating the total size of the > process, runs once per second. I have two processes of the same kind, > but with the different run history. > > Process #1 didn't do much work, its total size is 1.5 GB, google > perftools library says that it currently has 1.2GB allocated. > > Process #2 did a lot of work, its total size is 6.9 GB, but most of the > used memory was freed, and google perftools library also says that it > currently has only 1.2GB allocated. > > Both processes have about 140 lines in /proc//map. > > > What bothers me is that getProcessSizeBytes run once per second makes > process #1 to consume ~0.5% CPU, and process #2 to consume ~14% CPU. > When I stop running getProcessSizeBytes, CPU times of both processes go > to zero. > > > Obviously, google perftools doesn't unmap the memory, and the totals of > block sizes in /proc//map is much higher for process #2 with about > the same block count. But why does this cause 14% of CPU consumption? > And why another, similar process that goes through about the same number > of blocks only has 0.5% CPU consumption? To calculate residency count for the process map entries, kernel has to iterate over all pages. This operation was somewhat optimized in 10.3 and HEAD, particularly for the large sparce mappings. But for large populated mappings there is no other way then to check each page. You may confirm my hypothesis by setting sysctl kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU consumption changed. Of course, you will not get the resident count in the returned structure, after the knob is tweaked. From owner-freebsd-hackers@freebsd.org Thu Jul 7 01:29:59 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19032B21ACA for ; Thu, 7 Jul 2016 01:29:59 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B88A2118A for ; Thu, 7 Jul 2016 01:29:58 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1bKy8S-00021F-GS for freebsd-hackers@freebsd.org; Thu, 07 Jul 2016 03:29:48 +0200 Received: from ip184-189-249-34.sb.sd.cox.net ([184.189.249.34]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Jul 2016 03:29:48 +0200 Received: from julian by ip184-189-249-34.sb.sd.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 07 Jul 2016 03:29:48 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Julian Hsiao Subject: Re: ggatel(8) extension for binding multiple files Date: Thu, 7 Jul 2016 01:29:34 +0000 (UTC) Lines: 913 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 184.189.249.34 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:48.0) Gecko/20100101 Firefox/48.0) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 01:29:59 -0000 Here's an updated patch, with some minor refactors, and addresses some known issues: > - I use alloca(3) instead of malloc(3) in map_bundle() because using the > latter causes incorrect behavior somehow. It's probably buffer > overruns and / or UBs somewhere in my code. Fixed: I was using realloc(3) incorrectly elsewhere. > - Both ggatel(8) and md(4) implement BIO_DELETE by zeroing the requested > range [...] I didn't implement it. This is now implemented. By default BIO_DELETE will write zeros, and this can be disabled with the -n option. Incidentally, while testing, I found out that ggatel(8)'s BIO_DELETE code is actually broken. It works in my extension, but this patch doesn't fix the original code. I've filed a bug report[0] outlining the issue. Lastly, I've added a license block so the few people who might find this feature useful are free to use it. Julian Hsiao [0] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210864 Index: sbin/ggate/ggatel/Makefile =================================================================== diff --git a/stable/10/sbin/ggate/ggatel/Makefile b/stable/10/sbin/ggate/ggatel/Makefile --- a/stable/10/sbin/ggate/ggatel/Makefile (revision 302332) +++ b/stable/10/sbin/ggate/ggatel/Makefile (working copy) @@ -4,7 +4,7 @@ PROG= ggatel MAN= ggatel.8 -SRCS= ggatel.c ggate.c +SRCS= ggatel.c ggatel2.c ggate.c CFLAGS+= -DLIBGEOM CFLAGS+= -I${.CURDIR}/../shared Index: sbin/ggate/ggatel/ggatel.c =================================================================== diff --git a/stable/10/sbin/ggate/ggatel/ggatel.c b/stable/10/sbin/ggate/ggatel/ggatel.c --- a/stable/10/sbin/ggate/ggatel/ggatel.c (revision 302332) +++ b/stable/10/sbin/ggate/ggatel/ggatel.c (working copy) @@ -46,6 +46,10 @@ #include #include "ggate.h" +int check_divs(const char *const, unsigned int *const, size_t *const, + size_t *const); +void g_gatel_serve_bundle(const int , const unsigned int, const size_t, + const size_t, const int, const int, const unsigned int); static enum { UNSET, CREATE, DESTROY, LIST, RESCUE } action = UNSET; @@ -55,12 +59,13 @@ static int force = 0; static unsigned sectorsize = 0; static unsigned timeout = G_GATE_TIMEOUT; +static unsigned delete_zero = 1; static void usage(void) { - fprintf(stderr, "usage: %s create [-v] [-o ] " + fprintf(stderr, "usage: %s create [-v] [-n] [-o ] " "[-s sectorsize] [-t timeout] [-u unit] \n", getprogname()); fprintf(stderr, " %s rescue [-v] [-o ] <-u unit> " "\n", getprogname()); @@ -149,6 +154,11 @@ } break; case BIO_DELETE: + if (!delete_zero) { + error = EOPNOTSUPP; + break; + } + // FIXME: Bug 210864 case BIO_WRITE: if (pwrite(fd, ggio.gctl_data, ggio.gctl_length, ggio.gctl_offset) == -1) { @@ -168,17 +178,39 @@ g_gatel_create(void) { struct g_gate_ctl_create ggioc; - int fd; + int fd, isdir = -1; + size_t div_size, num_divs; fd = open(path, g_gate_openflags(flags) | O_DIRECT | O_FSYNC); - if (fd == -1) - err(EXIT_FAILURE, "Cannot open %s", path); + if (fd == -1) { + if (errno == EISDIR) { + isdir = 1; + } else { + err(EXIT_FAILURE, "Cannot open %s", path); + } + } else { + struct stat sb; + if (fstat(fd, &sb) == -1) { + err(EXIT_FAILURE, "stat(%s) failed", path); + } + isdir = S_ISDIR(sb.st_mode); + } + assert(isdir != -1); + memset(&ggioc, 0, sizeof(ggioc)); ggioc.gctl_version = G_GATE_VERSION; ggioc.gctl_unit = unit; - ggioc.gctl_mediasize = g_gate_mediasize(fd); - if (sectorsize == 0) - sectorsize = g_gate_sectorsize(fd); + if (isdir) { + if (fd != -1 && close(fd) == -1) { + err(EXIT_FAILURE, "close(%s) failed", path); + } + fd = check_divs(path, §orsize, &div_size, &num_divs); + ggioc.gctl_mediasize = (off_t) div_size * num_divs; + } else { + ggioc.gctl_mediasize = g_gate_mediasize(fd); + if (sectorsize == 0) + sectorsize = g_gate_sectorsize(fd); + } ggioc.gctl_sectorsize = sectorsize; ggioc.gctl_timeout = timeout; ggioc.gctl_flags = flags; @@ -188,7 +220,12 @@ if (unit == -1) printf("%s%u\n", G_GATE_PROVIDER_NAME, ggioc.gctl_unit); unit = ggioc.gctl_unit; - g_gatel_serve(fd); + if (isdir) { + g_gatel_serve_bundle(fd, sectorsize, div_size, num_divs, unit, + g_gate_openflags(flags), delete_zero); + } else { + g_gatel_serve(fd); + } } static void @@ -230,7 +267,7 @@ for (;;) { int ch; - ch = getopt(argc, argv, "fo:s:t:u:v"); + ch = getopt(argc, argv, "fo:s:t:u:vn"); if (ch == -1) break; switch (ch) { @@ -280,6 +317,11 @@ usage(); g_gate_verbose++; break; + case 'n': + if (action != CREATE) + usage(); + delete_zero = 0; + break; default: usage(); } Index: sbin/ggate/ggatel/ggatel2.c =================================================================== diff --git a/stable/10/sbin/ggate/ggatel/ggatel2.c b/stable/10/sbin/ggate/ggatel/ggatel2.c new file mode 10644 --- /dev/null (nonexistent) +++ b/stable/10/sbin/ggate/ggatel/ggatel2.c (working copy) @@ -0,0 +1,724 @@ +/* +Copyright (c) 2016, Julian Hsiao +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include "ggate.h" + +/* ======== */ + +/* + Doesn't work with 3.4.1; clang-devel is currently 3.9.0, and I couldn't be + bothered to find out which version support was first added. +*/ +#if defined(__clang__) && \ + (__clang_major__ > 3 || \ + (__clang_major__ == 3 && __clang_minor__ >= 9)) +#pragma clang diagnostic push +#pragma clang diagnostic error "-Weverything" +#endif + +/* ======== */ + +int check_divs(const char *const, unsigned int *const, size_t *const, + size_t *const); +void g_gatel_serve_bundle(const int , const unsigned int, const size_t, + const size_t, const int, const int, const unsigned int); + +/* ======== */ + +static void +g_gate_verbose_log(const int v, const int p, const char *const m, ...) +{ + if (g_gate_verbose >= v) { + va_list ap; + va_start(ap, m); + g_gate_vlog(p, m, ap); + va_end(ap); + } +} + +#ifdef NDEBUG +__attribute__((unused)) +#endif +static inline bool +mul_overflow(const size_t a, const size_t b) +{ + return(a != 0 && (a * b) / a != b); +} + +static unsigned int +MINDIV_SIZE(void) +{ + static unsigned int ps; + if (ps == 0) { + const long ps2 = sysconf(_SC_PAGESIZE); + static_assert(sizeof(size_t) >= sizeof(long), ""); + assert(ps2 > 0); + assert(ps2 <= UINT_MAX); + ps = (unsigned int) ps2; + } + return(ps); +} + +static size_t +DIV_NAME_BUFSIZE(void) +{ + static size_t dnbs; + if (dnbs == 0) { + dnbs = (size_t) (ceil(log(SIZE_MAX) / log(16))); + ++dnbs; + dnbs *= sizeof(char); + + } + return(dnbs); +} + +/* ======== */ + +static void +numtohexstr(const size_t num, char *const buf, const size_t buflen) +{ +#ifdef NDEBUG + __attribute__((unused)) +#endif + const int r = snprintf(buf, buflen, "%zx", num); + assert(r > 0); + assert((unsigned int) r < buflen); +} + +int +check_divs(const char *const bundle, unsigned int *const blk_size, + size_t *const div_size, size_t *const num_divs) +{ + assert(blk_size != NULL); + assert(div_size != NULL); + assert(num_divs != NULL); + + int dfd; + if ((dfd = open(bundle, O_RDONLY | O_DIRECTORY | O_CLOEXEC)) == -1) { + err(5, "open(%s) failed", bundle); + } + char *buf = malloc(DIV_NAME_BUFSIZE()); + if (buf == NULL) { + err(5, "malloc(DIV_NAME_BUFSIZE) failed"); + } + + for (*num_divs = 0; *num_divs < SIZE_MAX; ++(*num_divs)) { + int fd; + struct stat sb = { .st_dev = 0 }; + + numtohexstr(*num_divs, buf, DIV_NAME_BUFSIZE()); + if ((fd = openat(dfd, buf, O_RDONLY | O_CLOEXEC)) == -1) { + if (errno == ENOENT) { + break; + } + err(5, "open(%s/%s) failed", bundle, buf); + } + if (fstat(fd, &sb) == -1) { + err(5, "fstat(%s/%s) failed", bundle, buf); + } + + if (S_ISCHR(sb.st_mode)) { + if (ioctl(fd, DIOCGMEDIASIZE, &sb.st_size) == -1) { + err(5, "ioctl(%s/%s, DIOCGMEDIASIZE) failed", bundle, buf); + } + + unsigned int bs; + if (ioctl(fd, DIOCGSECTORSIZE, &bs) == -1) { + err(5, "ioctl(%s/%s, DIOCGSECTORSIZE) failed", bundle, buf); + } + if (*blk_size == 0) { + *blk_size = bs; + } else if (*blk_size != bs) { + errx(5, "sector size of %s/%s (%u bytes) is not the same as " + "requested size or that of other divs (%u bytes).", + bundle, buf, bs, *blk_size); + } + } else if (!S_ISREG(sb.st_mode)) { + errx(5, "%s/%s must be a file or character device.", bundle, buf); + } + + if (close(fd) == -1) { + err(5, "close(%s/%s) failed", bundle, buf); + } + + assert(sb.st_size > 0); + static_assert(sizeof(size_t) >= sizeof(sb.st_size), ""); + const size_t st_size = (size_t) sb.st_size; + + if (st_size < MINDIV_SIZE()) { + errx(5, "size of %s/%s is less than %u bytes.", + bundle, buf, MINDIV_SIZE()); + } + if (st_size % MINDIV_SIZE() != 0) { + errx(5, "size of %s/%s is not a multiple of %u.", + bundle, buf, MINDIV_SIZE()); + } + + if (*num_divs == 0) { + *div_size = st_size; + } else if (st_size != *div_size) { + errx(5, "%s/%s is not the same size as other divs (%zu bytes).", + bundle, buf, *div_size); + } + } + + if (*num_divs == 0) { + errx(5, "No divs found in %s.", bundle); + } + + *blk_size = (*blk_size == 0) ? MINDIV_SIZE() : *blk_size; + if (*blk_size < MINDIV_SIZE()) { + errx(5, "sector size must be at least %u bytes.", MINDIV_SIZE()); + } + if (*blk_size % MINDIV_SIZE() != 0) { + errx(5, "sector size must be a multiple of %u bytes.", MINDIV_SIZE()); + } + if (*blk_size > *div_size) { + errx(5, "sector size cannot be greater than div size (%zu bytes).", + *div_size); + } + + struct g_gate_ctl_create ggcc = { .gctl_mediasize = 0 }; + static_assert(sizeof(long) == sizeof(ggcc.gctl_mediasize), ""); + assert(!mul_overflow(*div_size, *num_divs)); + assert(*div_size * *num_divs <= LONG_MAX); + static_assert(sizeof(unsigned int) == sizeof(ggcc.gctl_sectorsize), ""); + g_gate_verbose_log(1, LOG_DEBUG, "blk_size = %u", *blk_size); + g_gate_verbose_log(1, LOG_DEBUG, "div_size = %zu", *div_size); + g_gate_verbose_log(1, LOG_DEBUG, "num_divs = %zu", *num_divs); + + free(buf); + return(dfd); +} + +static void +map_fd(void *const addr, const int fd, const int prot, +#ifdef NDEBUG + __attribute__((unused)) +#endif + const size_t div_size) +{ + assert(div_size != 0); + + struct stat sb; + if (fstat(fd, &sb) == -1) { + err(1, "fstat() failed"); + } + + assert(sb.st_size > 0); + static_assert(sizeof(size_t) >= sizeof(sb.st_size), ""); + const size_t st_size = (size_t) sb.st_size; + assert(st_size == div_size); + assert(st_size % MINDIV_SIZE() == 0); + + void *m; + const int flags = MAP_SHARED | MAP_FIXED | MAP_NOCORE/* | MAP_NOSYNC*/; + if ((m = mmap(addr, st_size, prot, flags, fd, 0)) == MAP_FAILED) { + err(1, "mmap() failed"); + } +} + +static void +map_bundle(const uintptr_t base, const uintptr_t addr, const size_t div_size, + const int bundlefd, const int open_flags) +{ + assert(base % MINDIV_SIZE() == 0); + assert(addr % MINDIV_SIZE() == 0); + assert(addr >= base); + assert((addr - base) % MINDIV_SIZE() == 0); + + int divfd; + + static char *div; + if (div == NULL) { + div = malloc(DIV_NAME_BUFSIZE()); + assert(div != NULL); + } + numtohexstr((addr - base) / div_size, div, DIV_NAME_BUFSIZE()); + + g_gate_verbose_log(3, LOG_DEBUG, + "-> [ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx; %s", + addr, addr + div_size, div_size, div); + + if ((divfd = openat(bundlefd, div, open_flags | O_CLOEXEC)) == -1) { + err(6, "open(%s) failed", div); + } + + int prot; + switch (open_flags & O_ACCMODE) { + case O_RDWR: + prot = PROT_READ | PROT_WRITE; + break; + case O_RDONLY: + prot = PROT_READ; + break; + case O_WRONLY: + prot = PROT_WRITE; + break; + default: + errx(6, "unknown open() flags: %d", open_flags); + break; + } + map_fd((void *) addr, divfd, prot, div_size); + if (close(divfd) == -1) { + err(6, "close(%s) failed", div); + } +} + +static void * +resv_vaddr(void *const addr, const size_t len) +{ + void *p; + const int prot = PROT_NONE; + const int flags = MAP_PRIVATE | MAP_ANON | + ((addr == NULL) ? 0 : MAP_FIXED); + if ((p = mmap(addr, len, prot, flags, -1, 0)) == MAP_FAILED) { + err(2, "mmap() failed"); + } + + return(p); +} + +/* ======== */ + +static sigjmp_buf memfault_env; +static siginfo_t memfault_info; + +__attribute__((noreturn)) static void +memfault_hdl(int sig, siginfo_t *info, __attribute__((unused)) void *uap) +{ + memfault_info = *info; + siglongjmp(memfault_env, sig); +} + +static void +install_memfault_hdl() +{ + struct sigaction a = { + .sa_sigaction = memfault_hdl, + .sa_flags = SA_SIGINFO + }; + if (sigaction(SIGSEGV, &a, NULL) == -1) { + err(3, "sigaction(SIGSEGV) failed"); + } + struct sigaction b = { + .sa_sigaction = memfault_hdl, + .sa_flags = SA_SIGINFO + }; + if (sigaction(SIGBUS, &b, NULL) == -1) { + err(3, "sigaction(SIGBUS) failed"); + } + + // Use uncatchable signal as memfault_hdl installed flag + memfault_info.si_signo = SIGKILL; +} + +static void +check_expected_memfault(const void *const as, const void *const ae) +{ + if (memfault_info.si_signo == SIGKILL) { + errx(4, "memfault_info not initialized!"); + } else if (memfault_info.si_signo != SIGSEGV && + memfault_info.si_signo != SIGBUS) { + errx(4, "unexpected %s", strsignal(memfault_info.si_signo)); + } else if ( + !((as <= memfault_info.si_addr) && (memfault_info.si_addr < ae)) + ) { + errx(4, "unexpected address %p", memfault_info.si_addr); + } +} + +/* ======== */ + +static void +msync2(const void *const a, const size_t n, const int f) +{ + const uintptr_t b = (uintptr_t) a; + const size_t m = b % MINDIV_SIZE(); + const uintptr_t c = b - m; + const size_t nn = n + m; + + g_gate_verbose_log(3, LOG_DEBUG, + " msync(0x%09" PRIxPTR ", 0x%08lx)", c, nn); + + if (nn > 0 && msync((void *) c, nn, f) == -1) { + err(8, "msync() failed"); + } +} + +__attribute__((unused)) static void * +memcpy_msync(void *const d, const void *const s, const size_t n) +{ + const uintptr_t dd = (uintptr_t) d; + g_gate_verbose_log(3, LOG_DEBUG, + "memcpy(0x%09" PRIxPTR ", ..., 0x%08lx)", dd, n); + + memcpy(d, s, n); + msync2(d, n, MS_SYNC); + return(d); +} + +/* ======== */ + +#pragma clang diagnostic push +#pragma clang diagnostic ignored "-Wpadded" +struct bundle_spec { + size_t resv; + uintptr_t as; + uintptr_t ae; + + int bundlefd; + size_t div_size; + size_t num_divs; + unsigned int blk_size; + + int open_flags; + bool bio_delete; +}; +#pragma clang diagnostic pop + +/* Too lazy to do LRU */ +#define mapped_addrs_size 100 +static void *mapped_addrs[mapped_addrs_size]; + +static void +update_mapped_addrs(const size_t div_size, void *const a) +{ + assert(mapped_addrs_size <= UINT_MAX); + + const size_t i = arc4random_uniform((unsigned int) mapped_addrs_size); + if (mapped_addrs[i] != NULL) { + resv_vaddr(mapped_addrs[i], div_size); + + const uintptr_t mai = (uintptr_t) mapped_addrs[i]; + g_gate_verbose_log(3, LOG_DEBUG, + "<- [ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx", + mai, mai + div_size, div_size); + } + mapped_addrs[i] = a; +} + +static void +do_read(const struct bundle_spec *const bspec, + struct g_gate_ctl_io *const ggio) +{ + static_assert(sizeof(ggio->gctl_length) <= sizeof(size_t), ""); + static_assert(sizeof(ggio->gctl_offset) <= sizeof(uintptr_t), ""); + assert(ggio->gctl_length >= 0); + assert(ggio->gctl_offset >= 0); + + assert(memfault_info.si_signo == SIGKILL); + + assert(!mul_overflow(bspec->div_size, mapped_addrs_size)); + const size_t max_len = bspec->div_size * mapped_addrs_size; + if ((size_t) ggio->gctl_length > max_len) { + ggio->gctl_error = ENOMEM; + return; + } + + const uintptr_t a = bspec->as + (uintptr_t) ggio->gctl_offset; + assert(a >= bspec->as); + assert(a < bspec->ae); + const uintptr_t b = a + (uintptr_t) ggio->gctl_length; + assert(b <= bspec->ae); + const size_t m = ((size_t) ggio->gctl_offset) % bspec->div_size; + + for (;;) { + volatile uintptr_t c = (volatile uintptr_t) NULL; + if (sigsetjmp(memfault_env, 1) == 0) { + for (c = a - m; c < b; c += bspec->div_size) { + static_assert(CHAR_BIT == 8, ""); + volatile unsigned char *d1 = (volatile unsigned char *) c; + __attribute__((unused)) volatile unsigned char d2 = *d1; + } + break; + } else { + check_expected_memfault((void *) bspec->as, (void *) bspec->ae); + const uintptr_t si_addr = (uintptr_t) memfault_info.si_addr; + memfault_info.si_signo = SIGKILL; + + assert((void *) c != NULL); + assert(si_addr == c); + + // UB?? + assert(bspec->as <= si_addr); + const size_t n = (si_addr - bspec->as) % bspec->div_size; + const uintptr_t d = si_addr - n; + assert(d <= si_addr); + assert(si_addr < d + 2 * bspec->div_size); + + assert(d >= bspec->as); + assert(d < bspec->ae); + assert((d - bspec->as) % bspec->div_size == 0); + + update_mapped_addrs(bspec->div_size, (void *) d); + + assert(d % bspec->blk_size == 0); + map_bundle(bspec->as, d, bspec->div_size, bspec->bundlefd, + bspec->open_flags); + } + } + + g_gate_verbose_log(2, LOG_DEBUG, "do_read(0x%lx, 0x%lx)", + ggio->gctl_offset, ggio->gctl_length); + ggio->gctl_data = (void *) a; + ggio->gctl_error = 0; +} + +static void +do_write(const struct bundle_spec *const bspec, + struct g_gate_ctl_io *const ggio, const bool zero) +{ + static void *zs_; + if (zero && zs_ == NULL) { + zs_ = calloc(1, bspec->blk_size); + assert(zs_ != NULL); + } + const void *const zeros = zs_; + + static_assert(sizeof(ggio->gctl_length) <= sizeof(size_t), ""); + static_assert(sizeof(ggio->gctl_offset) <= sizeof(uintptr_t), ""); + assert(ggio->gctl_length >= 0); + assert(ggio->gctl_offset >= 0); + + assert(memfault_info.si_signo == SIGKILL); + assert(((size_t) ggio->gctl_length) % bspec->blk_size == 0); + + uintptr_t d = bspec->as + (uintptr_t) ggio->gctl_offset; + assert(d >= bspec->as); + assert(d < bspec->ae); + uintptr_t s = (uintptr_t) ggio->gctl_data; + size_t len = (size_t) ggio->gctl_length; + + for (;;) { + if (sigsetjmp(memfault_env, 1) == 0) { + g_gate_verbose_log(3, LOG_DEBUG, + "memcpy(0x%09" PRIxPTR ", ..., 0x%08lx)", d, len); + if (zero) { + assert(zeros != NULL); + for (size_t i = 0; i < len; i += bspec->blk_size) { + memcpy((void *) (d + i), zeros, bspec->blk_size); + } + } else { + memcpy((void *) d, (void *) s, len); + } + break; + } else { + check_expected_memfault((void *) bspec->as, (void *) bspec->ae); + const uintptr_t si_addr = (uintptr_t) memfault_info.si_addr; + memfault_info.si_signo = SIGKILL; + + // UB?? + assert(bspec->as <= si_addr); + const size_t m = (si_addr - bspec->as) % bspec->div_size; + const uintptr_t a = si_addr - m; + assert(a <= si_addr); + assert(si_addr < a + 2 * bspec->div_size); + + update_mapped_addrs(bspec->div_size, (void *) a); + + assert(a % bspec->blk_size == 0); + map_bundle(bspec->as, a, bspec->div_size, bspec->bundlefd, + bspec->open_flags); + + // More UB?? + assert(d <= si_addr); + assert(len >= si_addr - d); + len = len - (si_addr - d); + s = s + (si_addr - d); + d = si_addr; + } + } + + g_gate_verbose_log(2, LOG_DEBUG, "do_write(0x%lx, 0x%lx)", + ggio->gctl_offset, ggio->gctl_length); + ggio->gctl_error = 0; +} + +static void +do_delete(const struct bundle_spec *const bspec, + struct g_gate_ctl_io *const ggio) +{ + if (bspec->bio_delete) { + g_gate_verbose_log(2, LOG_DEBUG, "do_delete() => do_write()"); + do_write(bspec, ggio, true); + } else { + g_gate_verbose_log(2, LOG_DEBUG, "do_delete() => EOPNOTSUPP"); + ggio->gctl_error = EOPNOTSUPP; + } +} + +static void +do_flush(const struct bundle_spec *const bspec, + __attribute__((unused)) const struct g_gate_ctl_io *const ggio) +{ + if (g_gate_verbose >= 4) { + for (size_t i = 0; i < mapped_addrs_size; ++i) { + const void *ma = mapped_addrs[i]; + if (ma != NULL) { + g_gate_verbose_log(4, LOG_DEBUG, + "0x%09" PRIxPTR, (uintptr_t) ma); + } + } + } + + g_gate_verbose_log(2, LOG_DEBUG, "do_flush()"); + msync2((void *) bspec->as, bspec->resv, MS_SYNC); +} + +__attribute__((noreturn)) void +g_gatel_serve_bundle(const int dfd_, const unsigned int ss_, const size_t ds_, + const size_t nd_, const int unit, const int fs_, const unsigned int dz_) +{ + freopen("/dev/null", "r", stdin); + if (g_gate_verbose == 0) { + if (daemon(0, 0) == -1) { + g_gate_destroy(unit, 1); + err(EXIT_FAILURE, "Cannot daemonize"); + } + freopen("/dev/null", "w", stdout); + freopen("/dev/null", "w", stderr); + } + g_gate_verbose_log(1, LOG_DEBUG, "Worker created: %u.", getpid()); + + assert(dfd_ != -1); + assert(!mul_overflow(ds_, nd_)); + struct bundle_spec bspec = { + .resv = ds_ * nd_, + .bundlefd = dfd_, + .div_size = ds_, + .num_divs = nd_, + .blk_size = ss_, + .open_flags = fs_, + .bio_delete = (bool) dz_ + }; + bspec.as = (uintptr_t) (resv_vaddr(NULL, bspec.resv)); + bspec.ae = bspec.as + bspec.resv; + + g_gate_verbose_log(1, LOG_DEBUG, + "[ 0x%09" PRIxPTR ", 0x%09" PRIxPTR " ): 0x%lx", + bspec.as, bspec.ae, bspec.resv); + + assert(bspec.blk_size > 0); + static_assert(sizeof(bspec.blk_size) <= sizeof(size_t), ""); + void *ggio_data_buf; + size_t ggio_data_buf_len = (size_t) bspec.blk_size; + if ((ggio_data_buf = malloc(ggio_data_buf_len)) == NULL) { + err(EXIT_FAILURE, "malloc() failed"); + } + + install_memfault_hdl(); + + for (;;) { + struct g_gate_ctl_io ggio = { + .gctl_version = G_GATE_VERSION, + .gctl_unit = unit, + .gctl_data = (void *) ggio_data_buf, + .gctl_length = (off_t) ggio_data_buf_len + }; + + g_gate_ioctl(G_GATE_CMD_START, &ggio); + + switch (ggio.gctl_error) { + case 0: + break; + case ECANCELED: + g_gate_close_device(); + if (close(bspec.bundlefd) == -1) { + err(EXIT_FAILURE, "close() failed"); + } + do_flush(&bspec, &ggio); + free(ggio_data_buf); + g_gate_verbose_log(1, LOG_DEBUG, "Finished."); + exit(EXIT_SUCCESS); + case ENOMEM: + assert(ggio.gctl_cmd == BIO_WRITE + || (ggio.gctl_cmd == BIO_DELETE && bspec.bio_delete)); + assert(ggio.gctl_length > 0); + static_assert(sizeof(ggio.gctl_length) <= + sizeof(ggio_data_buf_len), ""); + ggio_data_buf_len = (size_t) ggio.gctl_length; + ggio_data_buf = realloc(ggio_data_buf, ggio_data_buf_len); + if (ggio_data_buf == NULL) { + err(EXIT_FAILURE, "realloc() failed"); + } + continue; + case ENXIO: + default: + g_gate_xlog("ioctl(/dev/%s): %s.", G_GATE_CTL_NAME, + strerror(ggio.gctl_error)); + } + + switch(ggio.gctl_cmd) { + case BIO_READ: + do_read(&bspec, &ggio); + break; + case BIO_WRITE: + do_write(&bspec, &ggio, false); + break; + case BIO_DELETE: + do_delete(&bspec, &ggio); + break; + case BIO_FLUSH: + do_flush(&bspec, &ggio); + break; + default: + g_gate_verbose_log(1, LOG_DEBUG, "unsupported: %d", ggio.gctl_cmd); + ggio.gctl_error = EOPNOTSUPP; + break; + } + + g_gate_ioctl(G_GATE_CMD_DONE, &ggio); + g_gate_verbose_log(3, LOG_DEBUG, "========"); + } +} Property changes on: stable/10/sbin/ggate/ggatel/ggatel2.c ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property From owner-freebsd-hackers@freebsd.org Thu Jul 7 01:55:58 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1388B7561F for ; Thu, 7 Jul 2016 01:55:58 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id B11AE15F7 for ; Thu, 7 Jul 2016 01:55:58 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190]) (authenticated bits=0) by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u671tvPj042010 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 6 Jul 2016 18:55:57 -0700 (PDT) (envelope-from yuri@rawbw.com) X-Authentication-Warning: shell1.rawbw.com: Host c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be yuri.doctorlan.com Subject: Re: Why kinfo_getvmmap is sometimes so expensive? To: Konstantin Belousov References: <20160707001913.GJ38613@kib.kiev.ua> Cc: Freebsd hackers list From: Yuri Message-ID: <0b5c9018-2b12-e993-a6df-06ecad6a7b07@rawbw.com> Date: Wed, 6 Jul 2016 18:55:56 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160707001913.GJ38613@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 01:55:58 -0000 On 07/06/2016 17:19, Konstantin Belousov wrote: > To calculate residency count for the process map entries, kernel has to > iterate over all pages. This operation was somewhat optimized in 10.3 > and HEAD, particularly for the large sparce mappings. But for large populated > mappings there is no other way then to check each page. > > You may confirm my hypothesis by setting sysctl > kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU > consumption changed. Of course, you will not get the resident count > in the returned structure, after the knob is tweaked. Yes, this explains it. kern.proc_vmmap_skip_resident_count=0 made CPU consumption to go down. So, it is better to parse /proc//map to get the process size. Thank you, Yuri From owner-freebsd-hackers@freebsd.org Thu Jul 7 04:34:08 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3B84CB75CD4; Thu, 7 Jul 2016 04:34:08 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 214D21CB0; Thu, 7 Jul 2016 04:34:07 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467866046079240.9597104517618; Wed, 6 Jul 2016 21:34:06 -0700 (PDT) Date: Wed, 06 Jul 2016 21:34:06 -0700 From: Matthew Macy To: "freebsd-current@freebsd.org" , "freebsd-hackers@freebsd.org" Message-ID: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> Subject: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 04:34:08 -0000 As a first step towards managing linux user space in a chrooted /compat/linux, initially for i915 testing with intel gpu tools, later on to get widevine and steam to work I'm trying to get apt to work. I've fixed a number of issues to date in pseudofs/linprocfs but now I'm running in to a bug caused by differences in SIGCHLD handling between Linux and FreeBSD. The situation is that apt will spawn dpkg and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt causes a short read on the pipe which lets apt then continue. On FreeBSD a SIGCHLD is silently ignored. I've even experimented with doing a kill -20 to no effect. It would be easy enough to check sysvec against linux in pipe_read and break out of the loop when it's awakened from msleep (assuming there aren't deeper issues with signal propagation for anything other than SIGINT/SIGKILL) and then do a short read. However, I'm assuming that anyone who has worked in this area probably has a cleaner solution. Thanks in advance. -M From owner-freebsd-hackers@freebsd.org Thu Jul 7 04:44:04 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2605DB7502B; Thu, 7 Jul 2016 04:44:04 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0727D1360; Thu, 7 Jul 2016 04:44:04 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u674hsgK007808; Wed, 6 Jul 2016 21:43:58 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607070443.u674hsgK007808@gw.catspoiler.org> Date: Wed, 6 Jul 2016 21:43:54 -0700 (PDT) From: Don Lewis Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt To: mmacy@nextbsd.org cc: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org In-Reply-To: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 04:44:04 -0000 On 6 Jul, Matthew Macy wrote: > As a first step towards managing linux user space in a chrooted > /compat/linux, initially for i915 testing with intel gpu tools, later > on to get widevine and steam to work I'm trying to get apt to work. > I've fixed a number of issues to date in pseudofs/linprocfs but now > I'm running in to a bug caused by differences in SIGCHLD handling > between Linux and FreeBSD. The situation is that apt will spawn dpkg > and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt > causes a short read on the pipe which lets apt then continue. On > FreeBSD a SIGCHLD is silently ignored. I've even experimented with > doing a kill -20 to no effect. > > It would be easy enough to check sysvec against linux in pipe_read and > break out of the loop when it's awakened from msleep (assuming there > aren't deeper issues with signal propagation for anything other than > SIGINT/SIGKILL) and then do a short read. However, I'm assuming that > anyone who has worked in this area probably has a cleaner solution. It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't be in this case. From owner-freebsd-hackers@freebsd.org Thu Jul 7 04:52:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8A85B7546D; Thu, 7 Jul 2016 04:52:25 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com [IPv6:2607:f8b0:4001:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6DF22196D; Thu, 7 Jul 2016 04:52:25 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: by mail-io0-x22e.google.com with SMTP id i186so11787215iof.1; Wed, 06 Jul 2016 21:52:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=Qr7zrWnkMf4gX8LnCeEGEl5SSsZupuh3hdIGcY8xfso=; b=szWSVLGT1dtnjhZy5L0sQ669CCPbdpz24VTJg1Sj17NekpzwUxfk4jgMKGPIX6aJ7T JuTX8SWqYrGkqVxdS9cYP5Z3rxr1fnWgoCU+mkis3a7HyWwMZZF8uH9IniIVZuohLuHm nrcEIU8Dy3fuaFjwUK73Jqxr/msX+3x+KHQkkgZ606dsDEwj5mm787Xs8eOFqxkAtZ0y 8tU9NEnOXTxyFfxlVBU77FuGpPVumFdwKHTYDelcX7cqeLfABP/c2W78pMfUumR3jewo zJlqPsBCbt7/OucV57DdK0Wehymuc+EJTtTUcOe73p3L8Rvy7GjwOe8dAjFA+vzRnko4 eq2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=Qr7zrWnkMf4gX8LnCeEGEl5SSsZupuh3hdIGcY8xfso=; b=JeoNsZwJjDdHP6O9tgEkQp7t2aq+xpMHEztytsZ13XAV6Bxgni3Cvn2cRXlz+XRhG4 ox9GKj780BiKUJ9XucWRVPMPws9mfL7cdayCWEALzC//bJ1Y/Nsl0z9jo6dFILzCKeuA azLXLf61eTK8koat7+gotRl+e6QAjnqqZcdLAbjinWZS6+9o02upwcBGjrY3Ob/5fNpL Vy8Mq4QAGIdk0iRjDy3uelYM0fE8Nzt8Ttp2SdVKlAi0M/YiHbtWVFb1cgWwMNfAejFO Ua34RlaQ5ynnEnnasAGV6YZQ7Oo8Pkg3WSjFrko1zjpsIc05Nu6WslXZxrAF0iicKBmb 4YAQ== X-Gm-Message-State: ALyK8tKzx8zTAYzHFnRqyUZYSu7NjXuneyI9tqFOIrSr3kbtjDuQBlb3kBKKx8caM2nKhNAUIzSXxQ9YaCsLdw== MIME-Version: 1.0 X-Received: by 10.107.162.65 with SMTP id l62mr701724ioe.138.1467867144731; Wed, 06 Jul 2016 21:52:24 -0700 (PDT) Sender: kmacybsd@gmail.com Received: by 10.107.134.218 with HTTP; Wed, 6 Jul 2016 21:52:24 -0700 (PDT) In-Reply-To: <201607070443.u674hsgK007808@gw.catspoiler.org> References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> <201607070443.u674hsgK007808@gw.catspoiler.org> Date: Wed, 6 Jul 2016 21:52:24 -0700 X-Google-Sender-Auth: XwvOobuQKCFHqshmFbpAnOyFxzo Message-ID: Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt From: "K. Macy" To: Don Lewis Cc: "mmacy@nextbsd.org" , "freebsd-current@freebsd.org" , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 04:52:25 -0000 On Wednesday, July 6, 2016, Don Lewis wrote: > On 6 Jul, Matthew Macy wrote: > > As a first step towards managing linux user space in a chrooted > > /compat/linux, initially for i915 testing with intel gpu tools, later > > on to get widevine and steam to work I'm trying to get apt to work. > > I've fixed a number of issues to date in pseudofs/linprocfs but now > > I'm running in to a bug caused by differences in SIGCHLD handling > > between Linux and FreeBSD. The situation is that apt will spawn dpkg > > and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt > > causes a short read on the pipe which lets apt then continue. On > > FreeBSD a SIGCHLD is silently ignored. I've even experimented with > > doing a kill -20 to no effect. > > > > It would be easy enough to check sysvec against linux in pipe_read and > > break out of the loop when it's awakened from msleep (assuming there > > aren't deeper issues with signal propagation for anything other than > > SIGINT/SIGKILL) and then do a short read. However, I'm assuming that > > anyone who has worked in this area probably has a cleaner solution. > > It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't > be in this case. Good point. Thinking more about it, this seems like a bug in FreeBSD. Not a valid behavioral difference. -M > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org > " > From owner-freebsd-hackers@freebsd.org Thu Jul 7 05:15:14 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26C7DB75CE1; Thu, 7 Jul 2016 05:15:14 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu [18.7.68.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B4E1F1CE5; Thu, 7 Jul 2016 05:15:13 +0000 (UTC) (envelope-from kaduk@mit.edu) X-AuditID: 12074423-cafff70000006b63-be-577de42b8390 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by (Symantec Messaging Gateway) with SMTP id F4.B1.27491.B24ED775; Thu, 7 Jul 2016 01:10:03 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id u675A2RY001487; Thu, 7 Jul 2016 01:10:03 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id u6759xsK021350 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 7 Jul 2016 01:10:02 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id u6759wPw002618; Thu, 7 Jul 2016 01:09:58 -0400 (EDT) Date: Thu, 7 Jul 2016 01:09:58 -0400 (EDT) From: Benjamin Kaduk X-X-Sender: kaduk@multics.mit.edu To: freebsd-hackers@FreeBSD.org cc: freebsd-current@FreeBSD.org Subject: Last call for 2016Q2 quarterly status reports In-Reply-To: Message-ID: References: User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrGIsWRmVeSWpSXmKPExsUixCmqrKv9pDbcYO0ha4s5bz4wWWzf/I/R gcljxqf5LAGMUVw2Kak5mWWpRfp2CVwZZz6cYS3o5q34vKKRrYHxAlcXIyeHhICJxIKJG5i7 GLk4hATamCRenPrBAuFsYJT4fusxVOYgk8SE9VeBHA4gp17i2LMQkG4WAS2JH79mMYHYbAJq EutXXGOGmKoosfnUJDBbREBeYl/Te3YQmxnI3rJ6MhvIGGEBM4nbR1lAwpwCThKPpv0GK+EV cJA48eE7mC0k4CjxoeMiK4gtKqAjsXr/FBaIGkGJkzOfsECM1JJYPn0bywRGwVlIUrOQpBYw Mq1ilE3JrdLNTczMKU5N1i1OTszLSy3SNdPLzSzRS00p3cQICk52F+UdjC/7vA8xCnAwKvHw /sirDRdiTSwrrsw9xCjJwaQkyrvnLlCILyk/pTIjsTgjvqg0J7X4EKMEB7OSCO/eR0A53pTE yqrUonyYlDQHi5I4LyMDA4OQQHpiSWp2ampBahFMVoaDQ0mCl/UxUKNgUWp6akVaZk4JQpqJ gxNkOA/QcDaQGt7igsTc4sx0iPwpRkUpcV4vkIQASCKjNA+uF5w8djOpvmIUB3pFmHc3yG08 wMQD1/0KaDAT0OCfLtUgg0sSEVJSDYwTYuTnlUfGLD4Rv6ve6NOBw90b25a+kngUvLHvfYNa WFH1wph/jKz5Ild0y4o4q3yOc4dqvrn1frnX/CdcsYnSwatXL1f95NYUlizcVydlq8f2+r9O xtR5CaZ/vrJ29D2PWef/wGSWc9XavKCOdR8OGq16WpZy8uAVxQTVs5NDUkTi39ws8lViKc5I NNRiLipOBADNoPxK+QIAAA== X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 05:15:14 -0000 Reminder: we're still looking for more submissions for the 2015Q2 status report! Please let us know if you wish to write an entry, even if it will not be finished by today. Thanks, Ben (for the monthly@ team) On Tue, 21 Jun 2016, Benjamin Kaduk wrote: > Dear FreeBSD Community, > > The deadline for the next FreeBSD Quarterly Status update is July 7, > 2016, for work done in April through June. > > Status report submissions do not have to be very long. They may be about > anything happening in the FreeBSD project and community, and provide a > great way to inform FreeBSD users and developers about what you're working > on. Submission of reports is not restricted to committers. Anyone doing > anything interesting and FreeBSD-related can -- and should -- write one! > > The preferred and easiest submission method is to use the XML generator > [1] with the results emailed to the status report team at monthly at > FreeBSD.org . There is also an XML template [2] which can be filled out > manually and attached if preferred. For the expected content and style, > please study our guidelines on how to write a good status report [3]. > You can also review previous issues [4][5] for ideas on the style and > format. > > We are looking forward to all of your 2016Q2 reports! > > Thanks, > > Ben (on behalf of monthly@) > > [1] http://www.freebsd.org/cgi/monthly.cgi > [2] http://www.freebsd.org/news/status/report-sample.xml > [3] http://www.freebsd.org/news/status/howto.html > [4] http://www.freebsd.org/news/status/report-2015-10-2015-12.html > [4] http://www.freebsd.org/news/status/report-2016-01-2016-03.html > > From owner-freebsd-hackers@freebsd.org Thu Jul 7 06:28:50 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D0CBB76D6D for ; Thu, 7 Jul 2016 06:28:50 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0B5B91ECE for ; Thu, 7 Jul 2016 06:28:49 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: by mail-lf0-f52.google.com with SMTP id h129so4893520lfh.1 for ; Wed, 06 Jul 2016 23:28:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=EdmFlZ1aF8fhmSd8eppdAdvgKXWYd1kULvgaG+xbvvg=; b=XnvbpfAM65c/PFerBXshQgjP1lIazTEYWCmPP59fxDonCnLxiXlhDefLsfUP1SmYdK 91BZ63g6t5aQkC4x9lUdmXPXqxFMKy0OCuCLOKxo5DwJBEFWSw7+EqodE57GYfVq9jZs E25aWc3B3ui38LRzJ2D0P23Uphp7gMmvNgnOKT9HepEE+jV6Rk+R2q/SqPFTMTwjm+zv d3Jf6x65szJtP8f8c2n1fg1wvHLmY2VfMe68H39wZHgfbHxifQOco7H7BJOhIDcrfbs0 kQLGcFwnYBY3Dwh04vbCjDp7dXiQ/3gUFqSyVy/2FpHfk/i2t8R5gY0UVSpKMUgnaAVk CnMw== X-Gm-Message-State: ALyK8tJDWkdSoE0TUfwJSWqhjq2B4UrdBBQtduX2kralrl7jjY6A2XrTMrP8T4sXScWPvg== X-Received: by 10.25.21.16 with SMTP id l16mr5460245lfi.99.1467872922271; Wed, 06 Jul 2016 23:28:42 -0700 (PDT) Received: from [192.168.1.2] ([89.169.173.68]) by smtp.gmail.com with ESMTPSA id a199sm705946lfe.35.2016.07.06.23.28.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jul 2016 23:28:41 -0700 (PDT) Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt To: "K. Macy" , Don Lewis References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> <201607070443.u674hsgK007808@gw.catspoiler.org> Cc: "mmacy@nextbsd.org" , "freebsd-current@freebsd.org" , "freebsd-hackers@freebsd.org" From: Andrey Chernov Message-ID: <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org> Date: Thu, 7 Jul 2016 09:28:40 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 06:28:50 -0000 On 07.07.2016 7:52, K. Macy wrote: > On Wednesday, July 6, 2016, Don Lewis wrote: > >> On 6 Jul, Matthew Macy wrote: >>> As a first step towards managing linux user space in a chrooted >>> /compat/linux, initially for i915 testing with intel gpu tools, later >>> on to get widevine and steam to work I'm trying to get apt to work. >>> I've fixed a number of issues to date in pseudofs/linprocfs but now >>> I'm running in to a bug caused by differences in SIGCHLD handling >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg >>> and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt >>> causes a short read on the pipe which lets apt then continue. On >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with >>> doing a kill -20 to no effect. >>> >>> It would be easy enough to check sysvec against linux in pipe_read and >>> break out of the loop when it's awakened from msleep (assuming there >>> aren't deeper issues with signal propagation for anything other than >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that >>> anyone who has worked in this area probably has a cleaner solution. >> >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't >> be in this case. > > > > Good point. > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid > behavioral difference. You better need consult with POSIX before fixing things toward any Linuxisms blindly in our native code. I don't have a time now to see, is it really a bug according to POSIX, but please read or just find all SIGCHLD there: http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html it explain SIGCHLD actions in deep details. And that one too: http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html From owner-freebsd-hackers@freebsd.org Thu Jul 7 06:40:59 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CB49B75645; Thu, 7 Jul 2016 06:40:59 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3DA14179B; Thu, 7 Jul 2016 06:40:58 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467873657345708.1038340425864; Wed, 6 Jul 2016 23:40:57 -0700 (PDT) Date: Wed, 06 Jul 2016 23:40:57 -0700 From: Matthew Macy To: "Andrey Chernov" Cc: "K. Macy" , "Don Lewis" , "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" Message-ID: <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org> In-Reply-To: <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org> References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> <201607070443.u674hsgK007808@gw.catspoiler.org> <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org> Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 06:40:59 -0000 ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov wrote ---- > On 07.07.2016 7:52, K. Macy wrote: > > On Wednesday, July 6, 2016, Don Lewis wrote: > > > >> On 6 Jul, Matthew Macy wrote: > >>> As a first step towards managing linux user space in a chrooted > >>> /compat/linux, initially for i915 testing with intel gpu tools, later > >>> on to get widevine and steam to work I'm trying to get apt to work. > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now > >>> I'm running in to a bug caused by differences in SIGCHLD handling > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg > >>> and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt > >>> causes a short read on the pipe which lets apt then continue. On > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with > >>> doing a kill -20 to no effect. > >>> > >>> It would be easy enough to check sysvec against linux in pipe_read and > >>> break out of the loop when it's awakened from msleep (assuming there > >>> aren't deeper issues with signal propagation for anything other than > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that > >>> anyone who has worked in this area probably has a cleaner solution. > >> > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't > >> be in this case. > > > > > > > > Good point. > > > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid > > behavioral difference. > > You better need consult with POSIX before fixing things toward any > Linuxisms blindly in our native code. I don't have a time now to see, is > it really a bug according to POSIX, but please read or just find all > SIGCHLD there: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > it explain SIGCHLD actions in deep details. > And that one too: > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs. And I was asking for help with that, not a link to SUSv3 or POSIX. On closer reading of the man pages it looks like linux's clone is supposed to change the disposition for the exit signal. I'll have to write a test program to reproduce this behavior in isolation. -M > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Thu Jul 7 06:48:57 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88366B75A32 for ; Thu, 7 Jul 2016 06:48:57 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 161D21CE3 for ; Thu, 7 Jul 2016 06:48:56 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: by mail-lf0-f52.google.com with SMTP id q132so5109104lfe.3 for ; Wed, 06 Jul 2016 23:48:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=EtdIF6uNxMw8QIZlVINtfPUXzm2xUBy+KZBCTastsVQ=; b=MNN8TkpohFDgUr8xmB28SWmxt4X5D7imdouGOMWbMpOS4ZinC8PckWxbKxEn9j71R+ gwxfQL+N4iwTI17oaZdaTl5jbIGxkB2cICr+5f5//Tf5TRqS10yI9XiapIYblIth8ET3 Fc4x+ficDH1dXcygwglUn7YToZYBU2EOWeSmilduFiaew9azmXmWGKbR2UxQM0euYZ77 mzs3G2mZLnZiJNG1MwZ/d5WY/pr5f3wDeKWjOvhGn7k+Ll5ym85fsOVQx+Y1EiCLU2nE LUL2dv3mH4ko/7p/36cNirswGwggCG3qj4XbMV1ysejFvTsawXHyGBhHsdJvxkOHFMf9 VgxQ== X-Gm-Message-State: ALyK8tJGXf9ZaF8r8UPKj6YkXnsAsS/uvvu0UPWRpNMefDgEdK1xRTbsedyTxZQT84+lhQ== X-Received: by 10.25.133.87 with SMTP id h84mr5758688lfd.210.1467874134926; Wed, 06 Jul 2016 23:48:54 -0700 (PDT) Received: from [192.168.1.2] ([89.169.173.68]) by smtp.gmail.com with ESMTPSA id n7sm7077524lfb.31.2016.07.06.23.48.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jul 2016 23:48:54 -0700 (PDT) Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt To: Matthew Macy References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> <201607070443.u674hsgK007808@gw.catspoiler.org> <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org> <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org> Cc: "K. Macy" , Don Lewis , "freebsd-hackers@freebsd.org" , "freebsd-current@freebsd.org" From: Andrey Chernov Message-ID: <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org> Date: Thu, 7 Jul 2016 09:48:53 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 06:48:57 -0000 On 07.07.2016 9:40, Matthew Macy wrote: > > > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov wrote ---- > > On 07.07.2016 7:52, K. Macy wrote: > > > On Wednesday, July 6, 2016, Don Lewis wrote: > > > > > >> On 6 Jul, Matthew Macy wrote: > > >>> As a first step towards managing linux user space in a chrooted > > >>> /compat/linux, initially for i915 testing with intel gpu tools, later > > >>> on to get widevine and steam to work I'm trying to get apt to work. > > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now > > >>> I'm running in to a bug caused by differences in SIGCHLD handling > > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg > > >>> and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt > > >>> causes a short read on the pipe which lets apt then continue. On > > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with > > >>> doing a kill -20 to no effect. > > >>> > > >>> It would be easy enough to check sysvec against linux in pipe_read and > > >>> break out of the loop when it's awakened from msleep (assuming there > > >>> aren't deeper issues with signal propagation for anything other than > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that > > >>> anyone who has worked in this area probably has a cleaner solution. > > >> > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't > > >> be in this case. > > > > > > > > > > > > Good point. > > > > > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid > > > behavioral difference. > > > > You better need consult with POSIX before fixing things toward any > > Linuxisms blindly in our native code. I don't have a time now to see, is > > it really a bug according to POSIX, but please read or just find all > > SIGCHLD there: > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > > it explain SIGCHLD actions in deep details. > > And that one too: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html > > > > I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs. Of course, but in case it is FreeBSD bug, it should be fixed in our native code first before making any changes in Linuxator. > And I was asking for help with that, not a link to SUSv3 or POSIX. In case I was not helpful, sorry for that. Before you try to change something in Linuxator you need to be sure that FreeBSD does it right (or wrong, then fix FreeBSD native code first). I am just insisting on proper steps of fixing it. From owner-freebsd-hackers@freebsd.org Thu Jul 7 06:59:46 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93A79B75E33; Thu, 7 Jul 2016 06:59:46 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 73EF015F1; Thu, 7 Jul 2016 06:59:46 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467874783786638.9918064901652; Wed, 6 Jul 2016 23:59:43 -0700 (PDT) Date: Wed, 06 Jul 2016 23:59:43 -0700 From: Matthew Macy To: "Andrey Chernov" Cc: "freebsd-hackers@freebsd.org" , "Don Lewis" , "freebsd-current@freebsd.org" , "K. Macy" Message-ID: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org> In-Reply-To: <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org> References: <155c3a25e3f.11fb4143170445.2284890475527649192@nextbsd.org> <201607070443.u674hsgK007808@gw.catspoiler.org> <558d9bff-0f9b-2b08-b057-32b2a41953ff@freebsd.org> <155c41681bb.1141d206175455.3130944807853755277@nextbsd.org> <2249b671-765a-13e5-3b19-862416f6f73d@freebsd.org> Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 06:59:46 -0000 ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov wrote ---- > On 07.07.2016 9:40, Matthew Macy wrote: > > > > > > > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov wrote ---- > > > On 07.07.2016 7:52, K. Macy wrote: > > > > On Wednesday, July 6, 2016, Don Lewis wrote: > > > > > > > >> On 6 Jul, Matthew Macy wrote: > > > >>> As a first step towards managing linux user space in a chrooted > > > >>> /compat/linux, initially for i915 testing with intel gpu tools, later > > > >>> on to get widevine and steam to work I'm trying to get apt to work. > > > >>> I've fixed a number of issues to date in pseudofs/linprocfs but now > > > >>> I'm running in to a bug caused by differences in SIGCHLD handling > > > >>> between Linux and FreeBSD. The situation is that apt will spawn dpkg > > > >>> and wait on a pipe read. On Linux when dpkg exits the SIGCHLD to apt > > > >>> causes a short read on the pipe which lets apt then continue. On > > > >>> FreeBSD a SIGCHLD is silently ignored. I've even experimented with > > > >>> doing a kill -20 to no effect. > > > >>> > > > >>> It would be easy enough to check sysvec against linux in pipe_read and > > > >>> break out of the loop when it's awakened from msleep (assuming there > > > >>> aren't deeper issues with signal propagation for anything other than > > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm assuming that > > > >>> anyone who has worked in this area probably has a cleaner solution. > > > >> > > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD but shouldn't > > > >> be in this case. > > > > > > > > > > > > > > > > Good point. > > > > > > > > Thinking more about it, this seems like a bug in FreeBSD. Not a valid > > > > behavioral difference. > > > > > > You better need consult with POSIX before fixing things toward any > > > Linuxisms blindly in our native code. I don't have a time now to see, is > > > it really a bug according to POSIX, but please read or just find all > > > SIGCHLD there: > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > > > it explain SIGCHLD actions in deep details. > > > And that one too: > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html > > > > > > > > I was pretty clear in my initial email that I'm only interested in changing behavior for Linux programs. > > Of course, but in case it is FreeBSD bug, it should be fixed in our > native code first before making any changes in Linuxator. > > > And I was asking for help with that, not a link to SUSv3 or POSIX. > > In case I was not helpful, sorry for that. Before you try to change > something in Linuxator you need to be sure that FreeBSD does it right > (or wrong, then fix FreeBSD native code first). I am just insisting on > proper steps of fixing it. > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD to deliberately interrupt a pipe read is such a weird idiom. I'll test fork vs clone on Linux and see how OS X responds to a SIGCHLD during a pipe read. Thanks. -M > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Thu Jul 7 07:15:03 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6DB66B212D4; Thu, 7 Jul 2016 07:15:03 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 47D0F1DC5; Thu, 7 Jul 2016 07:15:03 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u677EqVx008159; Thu, 7 Jul 2016 00:14:56 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607070714.u677EqVx008159@gw.catspoiler.org> Date: Thu, 7 Jul 2016 00:14:52 -0700 (PDT) From: Don Lewis Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt To: mmacy@nextbsd.org cc: ache@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, kmacy@freebsd.org In-Reply-To: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 07:15:03 -0000 On 6 Jul, Matthew Macy wrote: > > > > ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov > wrote ---- > > On 07.07.2016 9:40, Matthew Macy wrote: > > > > > > > > > > > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov > > > wrote ---- > > > > On 07.07.2016 7:52, K. Macy wrote: > > > > > On Wednesday, July 6, 2016, Don Lewis > > > > > wrote: > > > > > > > > > >> On 6 Jul, Matthew Macy wrote: > > > > >>> As a first step towards managing linux user space in a > > > > >>> chrooted > > > > >>> /compat/linux, initially for i915 testing with intel gpu > > > > >>> tools, later on to get widevine and steam to work I'm > > > > >>> trying to get apt to work. I've fixed a number of issues > > > > >>> to date in pseudofs/linprocfs but now I'm running in to > > > > >>> a bug caused by differences in SIGCHLD handling between > > > > >>> Linux and FreeBSD. The situation is that apt will spawn > > > > >>> dpkg and wait on a pipe read. On Linux when dpkg exits > > > > >>> the SIGCHLD to apt causes a short read on the pipe > > > > >>> which lets apt then continue. On FreeBSD a SIGCHLD is > > > > >>> silently ignored. I've even experimented with doing a > > > > >>> kill -20 to no effect. > > > > >>> > > > > >>> It would be easy enough to check sysvec against linux in > > > > >>> pipe_read and break out of the loop when it's awakened > > > > >>> from msleep (assuming there aren't deeper issues with > > > > >>> signal propagation for anything other than > > > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm > > > > >>> assuming that anyone who has worked in this area > > > > >>> probably has a cleaner solution. > > > > >> > > > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD > > > > >> but shouldn't be in this case. > > > > > > > > > > > > > > > > > > > > Good point. > > > > > > > > > > Thinking more about it, this seems like a bug in FreeBSD. > > > > > Not a valid behavioral difference. > > > > > > > > You better need consult with POSIX before fixing things toward > > > > any Linuxisms blindly in our native code. I don't have a > > > > time now to see, is it really a bug according to POSIX, but > > > > please read or just find all SIGCHLD there: > > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > > > > it explain SIGCHLD actions in deep details. > > > > And that one too: > > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html > > > > > > > > > > > > I was pretty clear in my initial email that I'm only interested > > > in changing behavior for Linux programs. > > > > Of course, but in case it is FreeBSD bug, it should be fixed in our > > native code first before making any changes in Linuxator. > > > > > And I was asking for help with that, not a link to SUSv3 or POSIX. > > > > In case I was not helpful, sorry for that. Before you try to change > > something in Linuxator you need to be sure that FreeBSD does it > > right (or wrong, then fix FreeBSD native code first). I am just > > insisting on proper steps of fixing it. > > > > > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD > to deliberately interrupt a pipe read is such a weird idiom. I'll test > fork vs clone on Linux and see how OS X responds to a SIGCHLD during a > pipe read. It really depends on how signal handling has been set up. From my understanding of the FreeBSD man pages and the Open Group documents, the default handling for SIGCHLD is to just ignore it, in which case it shouldn't interrupt the pipe read. If the process has set up a SIGCHLD signal handler, then what happens with the read should depend on whether or not SA_RESTART was passed to sigaction(). I would expect that Linux would be the same as FreeBSD and the Open Group specs. How does apt set up its handling of SIGCHLD? From owner-freebsd-hackers@freebsd.org Thu Jul 7 07:32:09 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97185B2196E for ; Thu, 7 Jul 2016 07:32:09 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: from mail-lf0-f41.google.com (mail-lf0-f41.google.com [209.85.215.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E00F1A56 for ; Thu, 7 Jul 2016 07:32:09 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: by mail-lf0-f41.google.com with SMTP id l188so5727152lfe.2 for ; Thu, 07 Jul 2016 00:32:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=rp2PWzViNTn2dxaIhWKfuSPG+jg9eEQjyzxatq+deAs=; b=my/nEhFNk8hdGFBskhXRYw3s6q9dN5zZGUaLbrzkcOpntJnUPqTE7CoQNCqE6fjoAP 3NdJfCw4pmwN7ZW1h8Tol22x31qmjRaKc/hpi5U6vV61Hddu3U4GnOPPfSgMohLsSinZ 7ps15qiAdwdLQdVfASC59VlcAbgmtas4Y55n3N/OueWjLmbSXfMUJgJlV4HgUCywauxk e2QSLCuZLa+RMj7Ow/gNA2Z1GjPYdfhoapbcc12AVwcMLZf75JBO+umJjJx2DhYE9ap4 FZr2Tdrw4unhWblOaAG0vNV5wMeSHFMVrp/dvq5UBjw2vB2hAYoInIRF2JuMRKzix+yb eAnw== X-Gm-Message-State: ALyK8tKqpaD8pTkb7HBRjTL6R4RFRdWAHXXzT+91niGyHLpH18763ha11hToRBqQdir8BQ== X-Received: by 10.25.21.106 with SMTP id l103mr6391802lfi.27.1467876726567; Thu, 07 Jul 2016 00:32:06 -0700 (PDT) Received: from [192.168.1.2] ([89.169.173.68]) by smtp.gmail.com with ESMTPSA id a199sm750772lfe.35.2016.07.07.00.32.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Jul 2016 00:32:06 -0700 (PDT) Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt To: Don Lewis , mmacy@nextbsd.org References: <201607070714.u677EqVx008159@gw.catspoiler.org> Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, kmacy@freebsd.org From: Andrey Chernov Message-ID: <325f545e-a32d-59d8-86d3-079ecdf21df2@freebsd.org> Date: Thu, 7 Jul 2016 10:32:05 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <201607070714.u677EqVx008159@gw.catspoiler.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 07:32:09 -0000 On 07.07.2016 10:14, Don Lewis wrote: > On 6 Jul, Matthew Macy wrote: >> >> >> >> ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov >> wrote ---- >> > On 07.07.2016 9:40, Matthew Macy wrote: >> > > >> > > >> > > >> > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov >> > > wrote ---- >> > > > On 07.07.2016 7:52, K. Macy wrote: >> > > > > On Wednesday, July 6, 2016, Don Lewis >> > > > > wrote: >> > > > > >> > > > >> On 6 Jul, Matthew Macy wrote: >> > > > >>> As a first step towards managing linux user space in a >> > > > >>> chrooted >> > > > >>> /compat/linux, initially for i915 testing with intel gpu >> > > > >>> tools, later on to get widevine and steam to work I'm >> > > > >>> trying to get apt to work. I've fixed a number of issues >> > > > >>> to date in pseudofs/linprocfs but now I'm running in to >> > > > >>> a bug caused by differences in SIGCHLD handling between >> > > > >>> Linux and FreeBSD. The situation is that apt will spawn >> > > > >>> dpkg and wait on a pipe read. On Linux when dpkg exits >> > > > >>> the SIGCHLD to apt causes a short read on the pipe >> > > > >>> which lets apt then continue. On FreeBSD a SIGCHLD is >> > > > >>> silently ignored. I've even experimented with doing a >> > > > >>> kill -20 to no effect. >> > > > >>> >> > > > >>> It would be easy enough to check sysvec against linux in >> > > > >>> pipe_read and break out of the loop when it's awakened >> > > > >>> from msleep (assuming there aren't deeper issues with >> > > > >>> signal propagation for anything other than >> > > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm >> > > > >>> assuming that anyone who has worked in this area >> > > > >>> probably has a cleaner solution. >> > > > >> >> > > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD >> > > > >> but shouldn't be in this case. >> > > > > >> > > > > >> > > > > >> > > > > Good point. >> > > > > >> > > > > Thinking more about it, this seems like a bug in FreeBSD. >> > > > > Not a valid behavioral difference. >> > > > >> > > > You better need consult with POSIX before fixing things toward >> > > > any Linuxisms blindly in our native code. I don't have a >> > > > time now to see, is it really a bug according to POSIX, but >> > > > please read or just find all SIGCHLD there: >> > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html >> > > > it explain SIGCHLD actions in deep details. >> > > > And that one too: >> > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html >> > > >> > > >> > > >> > > I was pretty clear in my initial email that I'm only interested >> > > in changing behavior for Linux programs. >> > >> > Of course, but in case it is FreeBSD bug, it should be fixed in our >> > native code first before making any changes in Linuxator. >> > >> > > And I was asking for help with that, not a link to SUSv3 or POSIX. >> > >> > In case I was not helpful, sorry for that. Before you try to change >> > something in Linuxator you need to be sure that FreeBSD does it >> > right (or wrong, then fix FreeBSD native code first). I am just >> > insisting on proper steps of fixing it. >> > >> >> >> I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD >> to deliberately interrupt a pipe read is such a weird idiom. I'll test >> fork vs clone on Linux and see how OS X responds to a SIGCHLD during a >> pipe read. > > It really depends on how signal handling has been set up. From my > understanding of the FreeBSD man pages and the Open Group documents, the > default handling for SIGCHLD is to just ignore it, in which case it > shouldn't interrupt the pipe read. If the process has set up a SIGCHLD > signal handler, then what happens with the read should depend on whether > or not SA_RESTART was passed to sigaction(). I would expect that Linux > would be the same as FreeBSD and the Open Group specs. Linux as SysV derivative was always different regarding to SA_RESTART and other SA_* flags for signal(), see differences at the end of: http://linux.die.net/man/2/signal From owner-freebsd-hackers@freebsd.org Thu Jul 7 08:31:05 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92755B7686D for ; Thu, 7 Jul 2016 08:31:05 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 744C216F3 for ; Thu, 7 Jul 2016 08:31:05 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (ppp121-45-236-103.lns20.per1.internode.on.net [121.45.236.103]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u678UxYA066256 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Thu, 7 Jul 2016 01:31:02 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: freebsd-hackers@freebsd.org References: <20160705114808.GN38613@kib.kiev.ua> From: Julian Elischer Message-ID: <39cd0468-8301-06eb-4363-a57b18c60dbb@freebsd.org> Date: Thu, 7 Jul 2016 16:30:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 08:31:05 -0000 On 5/07/2016 10:43 PM, Maxim Sobolev wrote: > Seems like candidate for the MFC into releng/10.3 and appropriate errata > entry? > > -Max quite possibly. it sounds like a problem that needs to be fixed. > > On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov > wrote: > >> On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: >>> Hi all, investigating some random postgresql-9.1.21 server crashes on >>> FreeBSD 10.3, we've started seeing those after upgrading from postgres >>> 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very >>> unlikely. I suspect that postgres is at fault, however I am also curious >>> how could it be that kernel is not capable of generating core file when >>> application does something silly? Is it that some ELF-related data >>> structures got corrupted or something else? Are we protecting the page >>> where ELF header is mapped with R/O flag? I am looking at possibly >>> recreating this by poking around elf header(s), seeing if I can corrupt >> it >>> in a similar manner reliably, any pointers or suggestions are >> appreciated. >>> Jun 27 04:10:18 dal12 kernel: Failed to write core file for process >>> postgres (error 14) >>> Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on >>> signal 11 >>> Jul 1 05:21:46 dal12 kernel: Failed to write core file for process >>> postgres (error 14) >>> Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on >> signal >>> 11 >>> >>> #define EFAULT 14 /* Bad address */ >>> >>> The resulting files are truncated and is not really usable for anything. >>> We've seen the same issue >>> >>> -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 >> postgres.41361.core >>> -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 >> postgres.1722.core >>> [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core >>> GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] >>> Copyright (C) 2016 Free Software Foundation, Inc. >>> License GPLv3+: GNU GPL version 3 or later < >> http://gnu.org/licenses/gpl.html >>> This is free software: you are free to change and redistribute it. >>> There is NO WARRANTY, to the extent permitted by law. Type "show >> copying" >>> and "show warranty" for details. >>> This GDB was configured as "x86_64-portbld-freebsd10.3". >>> Type "show configuration" for configuration details. >>> For bug reporting instructions, please see: >>> . >>> Find the GDB manual and other documentation resources online at: >>> . >>> For help, type "help". >>> Type "apropos word" to search for commands related to "word"... >>> Reading symbols from postgres...(no debugging symbols found)...done. >>> BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core >> file >>> size >= 517120000, found: 1310720. >>> [New LWP 100261] >>> Core was generated by `postgres'. >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 >>> (gdb) where >>> #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 >>> Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 >>> (gdb) q >>> >> https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html >> >> > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@freebsd.org Thu Jul 7 10:18:12 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5E1EB753F1 for ; Thu, 7 Jul 2016 10:18:12 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1406C1E7A for ; Thu, 7 Jul 2016 10:18:11 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA00458; Thu, 07 Jul 2016 13:18:09 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1bL6Nl-000D2Q-OC; Thu, 07 Jul 2016 13:18:09 +0300 Subject: Re: ZFS ARC and mmap/page cache coherency question To: Paul Koch , Andrew Bates References: <20160630140625.3b4aece3@splash.akips.com> <20160701113243.307739cc@splash.akips.com> Cc: "freebsd-hackers@freebsd.org" From: Andriy Gapon Message-ID: <5ccbe625-1df6-74cb-a3ba-e35182f53a77@FreeBSD.org> Date: Thu, 7 Jul 2016 13:17:12 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160701113243.307739cc@splash.akips.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 10:18:12 -0000 On 01/07/2016 04:32, Paul Koch wrote: > akips recordsize 128K default I wonder if setting this to 4K or whatever is a logical block / page size of your application if it's larger than 4K would help. The setting has effect only for new files. -- Andriy Gapon From owner-freebsd-hackers@freebsd.org Thu Jul 7 14:04:35 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DFB7B74828; Thu, 7 Jul 2016 14:04:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ABCE11A5A; Thu, 7 Jul 2016 14:04:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u67E4OcV014345 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 7 Jul 2016 17:04:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u67E4OcV014345 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u67E4OAn014344; Thu, 7 Jul 2016 17:04:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 7 Jul 2016 17:04:24 +0300 From: Konstantin Belousov To: Don Lewis Cc: mmacy@nextbsd.org, ache@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, kmacy@freebsd.org Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt Message-ID: <20160707140424.GM38613@kib.kiev.ua> References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org> <201607070714.u677EqVx008159@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201607070714.u677EqVx008159@gw.catspoiler.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 14:04:35 -0000 On Thu, Jul 07, 2016 at 12:14:52AM -0700, Don Lewis wrote: > On 6 Jul, Matthew Macy wrote: > > > > > > > > ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov > > wrote ---- > > > On 07.07.2016 9:40, Matthew Macy wrote: > > > > > > > > > > > > > > > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov > > > > wrote ---- > > > > > On 07.07.2016 7:52, K. Macy wrote: > > > > > > On Wednesday, July 6, 2016, Don Lewis > > > > > > wrote: > > > > > > > > > > > >> On 6 Jul, Matthew Macy wrote: > > > > > >>> As a first step towards managing linux user space in a > > > > > >>> chrooted > > > > > >>> /compat/linux, initially for i915 testing with intel gpu > > > > > >>> tools, later on to get widevine and steam to work I'm > > > > > >>> trying to get apt to work. I've fixed a number of issues > > > > > >>> to date in pseudofs/linprocfs but now I'm running in to > > > > > >>> a bug caused by differences in SIGCHLD handling between > > > > > >>> Linux and FreeBSD. The situation is that apt will spawn > > > > > >>> dpkg and wait on a pipe read. On Linux when dpkg exits > > > > > >>> the SIGCHLD to apt causes a short read on the pipe > > > > > >>> which lets apt then continue. On FreeBSD a SIGCHLD is > > > > > >>> silently ignored. I've even experimented with doing a > > > > > >>> kill -20 to no effect. > > > > > >>> > > > > > >>> It would be easy enough to check sysvec against linux in > > > > > >>> pipe_read and break out of the loop when it's awakened > > > > > >>> from msleep (assuming there aren't deeper issues with > > > > > >>> signal propagation for anything other than > > > > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm > > > > > >>> assuming that anyone who has worked in this area > > > > > >>> probably has a cleaner solution. > > > > > >> > > > > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD > > > > > >> but shouldn't be in this case. > > > > > > > > > > > > > > > > > > > > > > > > Good point. > > > > > > > > > > > > Thinking more about it, this seems like a bug in FreeBSD. > > > > > > Not a valid behavioral difference. > > > > > > > > > > You better need consult with POSIX before fixing things toward > > > > > any Linuxisms blindly in our native code. I don't have a > > > > > time now to see, is it really a bug according to POSIX, but > > > > > please read or just find all SIGCHLD there: > > > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > > > > > it explain SIGCHLD actions in deep details. > > > > > And that one too: > > > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html > > > > > > > > > > > > > > > > I was pretty clear in my initial email that I'm only interested > > > > in changing behavior for Linux programs. > > > > > > Of course, but in case it is FreeBSD bug, it should be fixed in our > > > native code first before making any changes in Linuxator. > > > > > > > And I was asking for help with that, not a link to SUSv3 or POSIX. > > > > > > In case I was not helpful, sorry for that. Before you try to change > > > something in Linuxator you need to be sure that FreeBSD does it > > > right (or wrong, then fix FreeBSD native code first). I am just > > > insisting on proper steps of fixing it. > > > > > > > > > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD > > to deliberately interrupt a pipe read is such a weird idiom. I'll test > > fork vs clone on Linux and see how OS X responds to a SIGCHLD during a > > pipe read. > > It really depends on how signal handling has been set up. From my > understanding of the FreeBSD man pages and the Open Group documents, the > default handling for SIGCHLD is to just ignore it, in which case it > shouldn't interrupt the pipe read. If the process has set up a SIGCHLD > signal handler, then what happens with the read should depend on whether > or not SA_RESTART was passed to sigaction(). I would expect that Linux > would be the same as FreeBSD and the Open Group specs. > > How does apt set up its handling of SIGCHLD? BSD traditional and allowed handling of the signals with SIG_IGN disposition is to discard such signal at the time of generation. Then, such signal cannot interrupt a syscall regardless of SA_RESTART. For the interruption to work, some signal handler must be installed. AFAIR both SysV and Linux do not discard ignored signals, but process them up to the delivery point. Sure the test demonstrating the difference is required to actually diagnose and make conclusions. From owner-freebsd-hackers@freebsd.org Thu Jul 7 14:26:13 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ACE4CB74F6D; Thu, 7 Jul 2016 14:26:13 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x22b.google.com (mail-yw0-x22b.google.com [IPv6:2607:f8b0:4002:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 716B61809; Thu, 7 Jul 2016 14:26:13 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x22b.google.com with SMTP id l125so15266370ywb.2; Thu, 07 Jul 2016 07:26:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=WNSjvTdBSDtMESYgVMuHn1mA6cAnDXoYPjRyL8goSd4=; b=GL+VREkdaiKLGsvPl8zpVaZaXmUaKFXxYpCn8E4fdIkr3JlL5zLb8TAR/pUxqTDZXH Notas4vUkj/dB295/9pjV/WERr/L0uvMbpybND3kYgsRA5sxokWwnp6Y7gy2qeHT32Ss DVajPHTjTO5wZ7mpZgBcbDlmO8M9mMK4KRFZXdJHghz8K+Pfsw37bz349qbif5Nuu0Do iaXeLmzrwjXQCHjeJiuDmTONWyphzxSp8bmcrAR5W4JLcD5sBbA7PJUBOgj9ce1ANSk1 cc4rFfPgfmHEUjoiF8HdgPeoC6n25CzTeFEWGq/TdEeQnmOilnIIC98bsSbbD7CpgLaR +nDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=WNSjvTdBSDtMESYgVMuHn1mA6cAnDXoYPjRyL8goSd4=; b=T7pxCx+NDTnizlaGUjVTOLysqIz+gCZwH2lxj285vu5dWy5E8j2jyvoiO9MRuEOzz5 7CZSV6i8xRUvvg4wq8Kc8Rw2Y8S7ehQ/WTnSI7skkvB5gWgjGySG2eLVna7RxjLEpz/V 5EpVl9JxPySytasUwZjSU3ikrjBCSq11Ixmx6VuaJaTBXC+dClXqzkQi5ygWYgjPZmXu pSmNmE2VDOO8LJjD5CKV6pl/OQR7zNaafIXUOF2vI64MSgUt/HIFLW2lVQob4s/Hse5H xx43hEp8yK1LdrNP8N+rE3VSP9ucqHX4sndMB7H5wJPAnUx0QRYXLl0DZe+KVYxYkIIc 7xYQ== X-Gm-Message-State: ALyK8tK0GxE+KvzNG6H4aHsewnYkA34HvMo2wofG60dSuY/bBsFIdfSTDrCE4Sn03hpKYBhMxJ9Gvg1H6OdozA== X-Received: by 10.37.205.130 with SMTP id d124mr408294ybf.181.1467901572542; Thu, 07 Jul 2016 07:26:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Thu, 7 Jul 2016 07:26:10 -0700 (PDT) In-Reply-To: <20160707001218.GI38613@kib.kiev.ua> References: <20160706151822.GC38613@kib.kiev.ua> <20160706173758.GF38613@kib.kiev.ua> <20160707001218.GI38613@kib.kiev.ua> From: David Cross Date: Thu, 7 Jul 2016 10:26:10 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) To: Konstantin Belousov Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Thu, 07 Jul 2016 15:42:28 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 14:26:13 -0000 The state was printed after the panic, yes. If I understand the idea of softupdates correctly, I don't think its odd this buffer wasn't even attempted to be written, it has b_dep defined, that means those blocks should be written first, right? Also, I was just able to reproduce this on 11.0-ALPHA6, I did a fresh fsck on the filesystem to ensure it was clean (I typically don't fsck between reprouction runs, since it takes so long, and when I do need a 'clean' slate I just restore the snapshot, its faster than fsck). The panic from 11.0-ALPHA6 is: root@bhyve103:~ # panic: softdep_deallocate_dependencies: dangling deps cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe011b3861b0 vpanic() at vpanic+0x182/frame 0xfffffe011b386230 panic() at panic+0x43/frame 0xfffffe011b386290 softdep_deallocate_dependencies() at softdep_deallocate_dependencies+0x71/frame 0xfffffe011b3862b0 brelse() at brelse+0x162/frame 0xfffffe011b386310 bufwrite() at bufwrite+0x206/frame 0xfffffe011b386360 ffs_write() at ffs_write+0x3ed/frame 0xfffffe011b386410 VOP_WRITE_APV() at VOP_WRITE_APV+0x16f/frame 0xfffffe011b386520 vnode_pager_generic_putpages() at vnode_pager_generic_putpages+0x2d5/frame 0xffffe011b3865f0 VOP_PUTPAGES_APV() at VOP_PUTPAGES_APV+0xda/frame 0xfffffe011b386620 vnode_pager_putpages() at vnode_pager_putpages+0x89/frame 0xfffffe011b386690 vm_pageout_flush() at vm_pageout_flush+0x12d/frame 0xfffffe011b386720 vm_object_page_collect_flush() at vm_object_page_collect_flush+0x23a/frame 0xffffe011b386820 vm_object_page_clean() at vm_object_page_clean+0x1be/frame 0xfffffe011b3868a0 vm_object_terminate() at vm_object_terminate+0xa5/frame 0xfffffe011b3868e0 vnode_destroy_vobject() at vnode_destroy_vobject+0x63/frame 0xfffffe011b386910 ufs_reclaim() at ufs_reclaim+0x1f/frame 0xfffffe011b386940 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xda/frame 0xfffffe011b386970 vgonel() at vgonel+0x204/frame 0xfffffe011b3869e0 vnlru_proc() at vnlru_proc+0x577/frame 0xfffffe011b386a70 fork_exit() at fork_exit+0x84/frame 0xfffffe011b386ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe011b386ab0 Pardon the machine name, I have a setup script for bhyve VMs, and I didn't tweak the name, just the install location: root@bhyve103:~ # uname -a FreeBSD bhyve103.priv.dcrosstech.com 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #0 r302303: Fri Jul 1 03:32:49 UTC 2016 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 On the 10.3 kernel I was also able to walk the mnt_nvnodes list before the FS panic and I have the vnode * saved from before the vnlru attempted reclaim. print *((struct vnode *)0xfffff80002dc2760) $6 = {v_tag = 0xffffffff8072b891 "ufs", v_op = 0xffffffff80a13c40, v_data = 0xfffff8006a20b160, v_mount = 0xfffff800024e9cc0, v_nmntvnodes = { tqe_next = 0xfffff80002dc2588, tqe_prev = 0xfffff80002dc2958}, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0xfffffe0000932ef8}, v_cache_src = { lh_first = 0x0}, v_cache_dst = {tqh_first = 0xfffff8006a18ce00, tqh_last = 0xfffff8006a18ce20}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0xffffffff8072b891 "ufs", lo_flags = 117112832, lo_data = 0, lo_witness = 0xfffffe0000607280}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = { lo_name = 0xffffffff8074a89f "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0xfffffe00005fd680}, mtx_lock = 4}, v_vnlock = 0xfffff80002dc27c8, v_actfreelist = { tqe_next = 0xfffff80002dc2938, tqe_prev = 0xfffff80002dc2648}, v_bufobj = { bo_lock = {lock_object = {lo_name = 0xffffffff80754d34 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0xfffffe0000605700}, rw_lock = 1}, bo_ops = 0xffffffff809e97c0, bo_object = 0xfffff80002c1b400, bo_synclist = { le_next = 0xfffff80002dc2a08, le_prev = 0xfffff80002dc2688}, bo_private = 0xfffff80002dc2760, __bo_vnode = 0xfffff80002dc2760, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80002dc2880}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = { tqh_first = 0xfffffe00f7ae8658, tqh_last = 0xfffffe00f7ae86a8}, bv_root = {pt_root = 18446741878841706297}, bv_cnt = 1}, bo_numoutput = 0, bo_flag = 1, bo_bsize = 16384}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff80002dc28e8}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 2, v_usecount = 0, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 18236560, v_type = VREG} I think what is wanted is the buffer and their dependency lists.. I am not sure where those are under all of this.. bo_*? On Wed, Jul 6, 2016 at 8:12 PM, Konstantin Belousov wrote: > On Wed, Jul 06, 2016 at 02:21:20PM -0400, David Cross wrote: > > (kgdb) up 5 > > #5 0xffffffff804aafa1 in brelse (bp=0xfffffe00f77457d0) at buf.h:428 > > 428 (*bioops.io_deallocate)(bp); > > Current language: auto; currently minimal > > (kgdb) p/x *(struct buf *)0xfffffe00f77457d0 > > $1 = {b_bufobj = 0xfffff80002e88480, b_bcount = 0x4000, b_caller1 = 0x0, > > b_data = 0xfffffe00f857b000, b_error = 0x0, b_iocmd = 0x0, b_ioflags = > > 0x0, > > b_iooffset = 0x0, b_resid = 0x0, b_iodone = 0x0, b_blkno = 0x115d6400, > > b_offset = 0x0, b_bobufs = {tqe_next = 0x0, tqe_prev = > > 0xfffff80002e884d0}, > > b_vflags = 0x0, b_freelist = {tqe_next = 0xfffffe00f7745a28, > > tqe_prev = 0xffffffff80c2afc0}, b_qindex = 0x0, b_flags = 0x20402800, > > b_xflags = 0x2, b_lock = {lock_object = {lo_name = 0xffffffff8075030b, > > lo_flags = 0x6730000, lo_data = 0x0, lo_witness = > > 0xfffffe0000602f00}, > > lk_lock = 0xfffff800022e8000, lk_exslpfail = 0x0, lk_timo = 0x0, > > lk_pri = 0x60}, b_bufsize = 0x4000, b_runningbufspace = 0x0, > > b_kvabase = 0xfffffe00f857b000, b_kvaalloc = 0x0, b_kvasize = 0x4000, > > b_lblkno = 0x0, b_vp = 0xfffff80002e883b0, b_dirtyoff = 0x0, > > b_dirtyend = 0x0, b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0x0, > b_pager > > = { > > pg_reqpage = 0x0}, b_cluster = {cluster_head = {tqh_first = 0x0, > > tqh_last = 0x0}, cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}}, > > b_pages = {0xfffff800b99b30b0, 0xfffff800b99b3118, 0xfffff800b99b3180, > > 0xfffff800b99b31e8, 0x0 }, b_npages = 0x4, b_dep = > { > > lh_first = 0xfffff800023d8c00}, b_fsprivate1 = 0x0, b_fsprivate2 = > 0x0, > > b_fsprivate3 = 0x0, b_pin_count = 0x0} > > > > > > This is the freshly allocated buf that causes the panic; is this what is > > needed? I "know" which vnode will cause the panic on vnlru cleanup, but > I > > don't know how to walk the memory list without a 'hook'.. as in, i can > > setup the kernel in a state that I know will panic when the vnode is > > cleaned up, I can force a panic 'early' (kill -9 1), and then I could get > > that vnode.. if I could get the vnode list to walk. > > Was the state printed after the panic occured ? What is strange is that > buffer was not even tried for i/o, AFAIS. Apart from empty > b_error/b_iocmd, > the b_lblkno is zero, which means that the buffer was never allocated on > the disk. > > The b_blkno looks strangely high. Can you print *(bp->b_vp) ? If it is > UFS vnode, do p *(struct inode)(->v_data). I am esp. interested > in the vnode size. > > Can you reproduce the problem on HEAD ? > From owner-freebsd-hackers@freebsd.org Thu Jul 7 18:05:12 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E26AB75016; Thu, 7 Jul 2016 18:05:12 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7F42A128F; Thu, 7 Jul 2016 18:05:12 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467914710797328.50271387470593; Thu, 7 Jul 2016 11:05:10 -0700 (PDT) Date: Thu, 07 Jul 2016 11:05:10 -0700 From: Matthew Macy To: "Konstantin Belousov" Cc: "Don Lewis" , "" , "" , "" Message-ID: <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org> In-Reply-To: <20160707140424.GM38613@kib.kiev.ua> References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org> <201607070714.u677EqVx008159@gw.catspoiler.org> <20160707140424.GM38613@kib.kiev.ua> Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 18:05:12 -0000 ---- On Thu, 07 Jul 2016 07:04:24 -0700 Konstantin Belousov wrote ---- > On Thu, Jul 07, 2016 at 12:14:52AM -0700, Don Lewis wrote: > > On 6 Jul, Matthew Macy wrote: > > > > > > > > > > > > ---- On Wed, 06 Jul 2016 23:48:53 -0700 Andrey Chernov > > > wrote ---- > > > > On 07.07.2016 9:40, Matthew Macy wrote: > > > > > > > > > > > > > > > > > > > > ---- On Wed, 06 Jul 2016 23:28:40 -0700 Andrey Chernov > > > > > wrote ---- > > > > > > On 07.07.2016 7:52, K. Macy wrote: > > > > > > > On Wednesday, July 6, 2016, Don Lewis > > > > > > > wrote: > > > > > > > > > > > > > >> On 6 Jul, Matthew Macy wrote: > > > > > > >>> As a first step towards managing linux user space in a > > > > > > >>> chrooted > > > > > > >>> /compat/linux, initially for i915 testing with intel gpu > > > > > > >>> tools, later on to get widevine and steam to work I'm > > > > > > >>> trying to get apt to work. I've fixed a number of issues > > > > > > >>> to date in pseudofs/linprocfs but now I'm running in to > > > > > > >>> a bug caused by differences in SIGCHLD handling between > > > > > > >>> Linux and FreeBSD. The situation is that apt will spawn > > > > > > >>> dpkg and wait on a pipe read. On Linux when dpkg exits > > > > > > >>> the SIGCHLD to apt causes a short read on the pipe > > > > > > >>> which lets apt then continue. On FreeBSD a SIGCHLD is > > > > > > >>> silently ignored. I've even experimented with doing a > > > > > > >>> kill -20 to no effect. > > > > > > >>> > > > > > > >>> It would be easy enough to check sysvec against linux in > > > > > > >>> pipe_read and break out of the loop when it's awakened > > > > > > >>> from msleep (assuming there aren't deeper issues with > > > > > > >>> signal propagation for anything other than > > > > > > >>> SIGINT/SIGKILL) and then do a short read. However, I'm > > > > > > >>> assuming that anyone who has worked in this area > > > > > > >>> probably has a cleaner solution. > > > > > > >> > > > > > > >> It shoulds like SA_RESTART is set in sa_flags for SIGCHLD > > > > > > >> but shouldn't be in this case. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good point. > > > > > > > > > > > > > > Thinking more about it, this seems like a bug in FreeBSD. > > > > > > > Not a valid behavioral difference. > > > > > > > > > > > > You better need consult with POSIX before fixing things toward > > > > > > any Linuxisms blindly in our native code. I don't have a > > > > > > time now to see, is it really a bug according to POSIX, but > > > > > > please read or just find all SIGCHLD there: > > > > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html > > > > > > it explain SIGCHLD actions in deep details. > > > > > > And that one too: > > > > > > http://pubs.opengroup.org/onlinepubs/009695399/functions/sigaction.html > > > > > > > > > > > > > > > > > > > > I was pretty clear in my initial email that I'm only interested > > > > > in changing behavior for Linux programs. > > > > > > > > Of course, but in case it is FreeBSD bug, it should be fixed in our > > > > native code first before making any changes in Linuxator. > > > > > > > > > And I was asking for help with that, not a link to SUSv3 or POSIX. > > > > > > > > In case I was not helpful, sorry for that. Before you try to change > > > > something in Linuxator you need to be sure that FreeBSD does it > > > > right (or wrong, then fix FreeBSD native code first). I am just > > > > insisting on proper steps of fixing it. > > > > > > > > > > > > > I'm sorry for snapping . I misunderstood your intent. Using a SIGCHLD > > > to deliberately interrupt a pipe read is such a weird idiom. I'll test > > > fork vs clone on Linux and see how OS X responds to a SIGCHLD during a > > > pipe read. > > > > It really depends on how signal handling has been set up. From my > > understanding of the FreeBSD man pages and the Open Group documents, the > > default handling for SIGCHLD is to just ignore it, in which case it > > shouldn't interrupt the pipe read. If the process has set up a SIGCHLD > > signal handler, then what happens with the read should depend on whether > > or not SA_RESTART was passed to sigaction(). I would expect that Linux > > would be the same as FreeBSD and the Open Group specs. > > > > How does apt set up its handling of SIGCHLD? > > BSD traditional and allowed handling of the signals with SIG_IGN > disposition is to discard such signal at the time of generation. Then, > such signal cannot interrupt a syscall regardless of SA_RESTART. For > the interruption to work, some signal handler must be installed. > > AFAIR both SysV and Linux do not discard ignored signals, but process > them up to the delivery point. > > Sure the test demonstrating the difference is required to actually > diagnose and make conclusions. Unsurprisingly I may have misinterpreted the trace. John observes: Alternatively, if apt is creating a pipe() that it passes to dpkg() via fork() and apt only creates the read end opened and dpkg only keeps the write end up opened, then when dpkg exits, the pipe_read should return EOF when dpkg exits (that is normally the way pipes are used to detect child exit rather than EINTR from SIGCLD). The SIGCHLD may be a red herring as strace will report it even if it is ignored. What John describes is borne out by the traces. FreeBSD from pipe creation to dpkg exit and apt hang http://pastebin.com/TGRrMniD Linux from pipe creation to dpkg exit and apt continue http://pastebin.com/wPfd31Pf -M From owner-freebsd-hackers@freebsd.org Thu Jul 7 22:32:35 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E8DCB82A69 for ; Thu, 7 Jul 2016 22:32:35 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8E7731CAF for ; Thu, 7 Jul 2016 22:32:35 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from yuri.doctorlan.com (c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190]) (authenticated bits=0) by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id u67MWTFY074892 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Thu, 7 Jul 2016 15:32:29 -0700 (PDT) (envelope-from yuri@rawbw.com) X-Authentication-Warning: shell1.rawbw.com: Host c-24-5-143-190.hsd1.ca.comcast.net [24.5.143.190] claimed to be yuri.doctorlan.com Subject: Re: Why kinfo_getvmmap is sometimes so expensive? References: <20160707001913.GJ38613@kib.kiev.ua> To: Freebsd hackers list From: Yuri Message-ID: <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com> Date: Thu, 7 Jul 2016 15:32:28 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: <20160707001913.GJ38613@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 22:32:35 -0000 On 07/06/2016 17:19, Konstantin Belousov wrote: > To calculate residency count for the process map entries, kernel has to > iterate over all pages. This operation was somewhat optimized in 10.3 > and HEAD, particularly for the large sparce mappings. But for large populated > mappings there is no other way then to check each page. > > You may confirm my hypothesis by setting sysctl > kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU > consumption changed. Of course, you will not get the resident count > in the returned structure, after the knob is tweaked. When people raise the question of why malloc library doesn't unmap the memory, developers there usually say that they call madvise(MADV_FREE) and this is as good as unmap. But this example shows that this isn't quite the case on the FreeBSD, and unmapping is better. Yuri From owner-freebsd-hackers@freebsd.org Thu Jul 7 23:20:15 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7F10B8249A for ; Thu, 7 Jul 2016 23:20:15 +0000 (UTC) (envelope-from cedric.blancher@gmail.com) Received: from mail-pa0-x22a.google.com (mail-pa0-x22a.google.com [IPv6:2607:f8b0:400e:c03::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 87D6C17F6 for ; Thu, 7 Jul 2016 23:20:15 +0000 (UTC) (envelope-from cedric.blancher@gmail.com) Received: by mail-pa0-x22a.google.com with SMTP id uj8so9857139pab.3 for ; Thu, 07 Jul 2016 16:20:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=n9qqVM9YdRvf66WYZCPZ2tOqYFzahK2zysWrJZsTtKA=; b=Wi50pxXx4K0f4dbVKFX2mE3BfNIwCK105o/twiShCyDoA0BQNNNtDPT+x41BSVQBlM vxB2/PPP0W/BMOev0bIPHKYQHsh9zDmtEV+CAylI3SlKi56Qmzr12U2ZHjVyG93F1Cpm iItpmxtJ3k1CZaV6SBVnPGIYvpAfuz1LATAUUQy0e3XYrws/7gNwulrU7wKlpoY9kDdN JZcmxxDdlLNUn3Jm72uc1HFpDEjw4wtvNaXeGUwwZ4hL0ZeHC3qkXkO5UYC8+lVGXL3q IRacRtAQZuSxlp2+0yF38TFGURuujfRg7DSyzsONzufWWwVEQL/W0Cd1sFDbIOauMu+A vl8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=n9qqVM9YdRvf66WYZCPZ2tOqYFzahK2zysWrJZsTtKA=; b=M8uu+NAXGdO4iFIlzZopJFiR5KvyrNjpXUpdlq/Q2Cd8v0VG+0F+qzbINMkoiy79Gp AkeNsDxK1bo3x8GxKSsbfDUgfyaBdJpNpfirc20TLtsZ9SWQkamBMndYVtrcMZ7pQs6k vcOeAwDwT/pMq66EeWw2wn+KyUKr+XNvydikdEq8WpqxrXzOUSFXcRJIzGvOzzXuLBtF t/8j0olD+NhOeFhSVZu3VGFurwKw2FXiMNWVSSf1lWf4Z/cn1fZXKM1MxZ1CyLM3lalp l+Sp5MXaKIAc100VYtqyObkoeO8pdegZ1pjMP8ZMlI1ZidQxkucX2VVf5xoyekPUaXIR PAxw== X-Gm-Message-State: ALyK8tJl4VTdNj7xn/HyomnzYutGKWaaY0Ry3bErkmRM7IVXZ6WuYOZtm/ESpALj9NwmgMD5o/JIULyktR+BmQ== X-Received: by 10.66.76.10 with SMTP id g10mr4627821paw.110.1467933615034; Thu, 07 Jul 2016 16:20:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.173.8 with HTTP; Thu, 7 Jul 2016 16:20:14 -0700 (PDT) In-Reply-To: <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> References: <20160630140625.3b4aece3@splash.akips.com> <20160703123004.74a7385a@splash.akips.com> <155afb8148f.c6f5294d33485.2952538647262141073@nextbsd.org> <45865ae6-18c9-ce9a-4a1e-6b2a8e44a8b2@denninger.net> From: Cedric Blancher Date: Fri, 8 Jul 2016 01:20:14 +0200 Message-ID: Subject: Re: ZFS ARC and mmap/page cache coherency question To: Karl Denninger Cc: "freebsd-hackers@freebsd.org" , illumos-dev , "Garrett D'Amore" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 23:20:15 -0000 I think Garrett D'Amore had some ideas about the VM<---->ZFS communication and double/multicaching issues too. Ced On 3 July 2016 at 17:43, Karl Denninger wrote: > > On 7/3/2016 02:45, Matthew Macy wrote: >> >> Cedric greatly overstates the intractability of resolving it= . Nonetheless, since the initial import very little has been done to improv= e integration, and I don't know of anyone who is up to the task taking an i= nterest in it. Consequently, mmap() performance is likely "doomed" for the = foreseeable future.-M---- > > Wellllll.... > > I've done a fair bit of work here (see > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594) and the > political issues are at least as bad as the coding ones. > > In short what Cedric says about the root of the issue is real. VM is > really-well implemented for what it handles, but the root of the issue > is that while the UFS data cache is part of VM and thus it "knows" about > it, ZFS is not because it is a "bolt-on." UMA leads to further (severe) > complications for certain workloads. > > Finally the underlying ZFS dmu_tx sizing code is just plain wrong and in > fact this is one of the biggest issues as when the system runs into > trouble it can take a bad situation and make it a *lot* worse. There is > only one write-back cache maintained instead of one per zvol, and that's > flat-out broken. Being able to re-order async writes to disk (where > fsync() has not been called) and minimizing seek latency is excellent. > Sadly rotating media these days sabotages much of this due to opacity > introduced at the drive level (e.g. varying sector counts per track, > etc) but it can still help. But where things go dramatically wrong is > on a system where a large write-back cache is allocated relative to the > underlying zvol I/O performance (this occurs on moderately-large and > bigger RAM systems) with moderate numbers of modest-performance rotating > media; in this case it is entirely possible for a flush of the write > buffers to require upwards of a *minute* to complete, during which all > other writes block. If this happens during periods of high RAM demand > and you manage to trigger a page-out at the same time system performance > will go straight into the toilet. I have seen instances where simply > trying to edit a text file with vi (or a "select" against a database > table) will hang for upwards of a minute leading you to believe the > system has crashed, when it fact it has not. > > The interaction of VM with the above can lead to severe pathological > behavior because the VM system has no way to tell the ZFS subsystem to > pare back ARC (and at least as important, perhaps more-so -- unused but > allocated UMA) when memory pressure exists *before* it pages. ZFS tries > to detect memory pressure and do this itself but it winds up competing > with the VM system. This leads to demonstrably wrong behavior because > you never want to hold disk cache in preference to RSS; if you have a > block of data from the disk the best case is you avoid one I/O (to > re-read it); if you page you are *guaranteed* to take one I/O (to write > the paged-out RSS to disk) and *might* take two (if you then must read > it back in.) > > In short trading the avoidance of one *possible* I/O for a *guaranteed* > I/O and a second possible one is *always* a net lose. > > To "fix" all of this "correctly" (for all cases, instead of certain > cases) VM would have to "know" about ARC and its use of UMA, along with > being able to police both. ZFS also must have the dmu_tx writeback > cache sized per-zvol with its size chosen by the actual I/O performance > characteristics of the disks in the zvol itself. I've looked into doing > both and it's fairly complex, and what's worse is that it would > effectively "marry" VM and ZFS, removing the "bolt-on" aspect of > things. This then leads to a lot of maintenance work over time because > any time ZFS code changes (and it does, quite a bit) you then have to go > back through that process in order to become coherent with Illumos. > > The PR above resolved (completely) the issues I was having along with a > number of other people on 10.x and before (I've not yet rolled it > forward to 11.) but it's quite clearly a hack of sorts, in that it > detects and treats symptoms (e.g. dynamic TX cache size modification, > etc) rather than integrating VM and ZFS cache management. > > -- > Karl Denninger > karl@denninger.net > /The Market Ticker/ > /[S/MIME encrypted email preferred]/ --=20 Cedric Blancher Institute Pasteur From owner-freebsd-hackers@freebsd.org Fri Jul 8 07:52:41 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2CCF9B7539B; Fri, 8 Jul 2016 07:52:41 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F2FBC1C66; Fri, 8 Jul 2016 07:52:40 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1467964358050811.143266695883; Fri, 8 Jul 2016 00:52:38 -0700 (PDT) Date: Fri, 08 Jul 2016 00:52:38 -0700 From: Matthew Macy To: "Matthew Macy" Cc: "Konstantin Belousov" , "" , "Don Lewis" , "" , "" Message-ID: <155c97e7d70.126966a3c142756.8632532805949896728@nextbsd.org> In-Reply-To: <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org> References: <155c427b1ea.e316552376378.990303254341485453@nextbsd.org> <201607070714.u677EqVx008159@gw.catspoiler.org> <20160707140424.GM38613@kib.kiev.ua> <155c688eecf.fe750982120278.6541123167784850321@nextbsd.org> Subject: Re: difference in SIGCHLD behavior between Linux and FreeBSD breaks apt MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-ZohoMail: Z_57973067 SPT_1 Z_57973066 SPT_1 SLF_D X-Zoho-Virus-Status: 2 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 07:52:41 -0000 > Unsurprisingly I may have misinterpreted the trace. > > John observes: > Alternatively, if apt is creating a pipe() that it passes to dpkg() via fork() and apt > only creates the read end opened and dpkg only keeps the write end up opened, then when > dpkg exits, the pipe_read should return EOF when dpkg exits (that is normally the way pipes > are used to detect child exit rather than EINTR from SIGCLD). > > The SIGCHLD may be a red herring as strace will report it even if it is ignored. What John describes is borne out by the traces. > > FreeBSD from pipe creation to dpkg exit and apt hang > http://pastebin.com/TGRrMniD > > Linux from pipe creation to dpkg exit and apt continue > http://pastebin.com/wPfd31Pf It turns out that this was footshooting. In my changes to linprocfs the /fd directory was holding additional references to the struct file pointers which prevented apt from getting an EOF when dpkg exited. Thanks to all who commented. FWIW, after fixing the previous issue and then linux_mremap to be able to grow a mapping apt works now: root@planecrash:/home/mmacy # chroot /compat/linux/ apt-get update Hit:1 http://archive.ubuntu.com/ubuntu xenial InRelease Get:2 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB] Hit:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease Fetched 94.5 kB in 1s (56.3 kB/s) Reading package lists... Done I don't think this is all that useful until I update / implement any system calls to get steam / widevine whatever working, but in case anyone cares this is all going on in the drm-next-4.6 branch alongside the graphics work. -M From owner-freebsd-hackers@freebsd.org Fri Jul 8 10:55:10 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B3DEB82A18 for ; Fri, 8 Jul 2016 10:55:10 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: from puchar.net (puchar.net [194.1.144.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "puchar.net", Issuer "puchar.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 94FFD11BB for ; Fri, 8 Jul 2016 10:55:09 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: Received: from 127.0.0.1 (localhost [127.0.0.1]) by puchar.net (8.15.2/8.14.9) with ESMTPS id u68Asxn9002790 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 8 Jul 2016 12:54:59 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from laptop.wojtek.intra (localhost [127.0.0.1]) by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u68At2kB000865 for ; Fri, 8 Jul 2016 12:55:02 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from localhost (wojtek@localhost) by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u68AsvCs000862 for ; Fri, 8 Jul 2016 12:54:57 +0200 (CEST) (envelope-from wojtek@puchar.net) X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs Date: Fri, 8 Jul 2016 12:54:57 +0200 (CEST) From: Wojciech Puchar X-X-Sender: wojtek@laptop.wojtek.intra To: freebsd-hackers@freebsd.org Subject: help with onboard LAN Message-ID: User-Agent: Alpine 2.20 (BSF 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (puchar.net [10.0.1.1]); Fri, 08 Jul 2016 12:54:59 +0200 (CEST) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 10:55:10 -0000 my supermicro-rebranded server is specified as having 2 1Gb/s ethernet ports onboard what actually is: ix0@pci0:3:0:0: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller 10-Gigabit X540-AT2' class = network subclass = ethernet ix1@pci0:3:0:1: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller 10-Gigabit X540-AT2' class = network subclass = ethernet which is strange card is autodetected with ixgbe driver under FreeBSD 10 and i put device miibus device ixgbe in my custom kernel. ix0: port 0xe020-0xe03f mem 0xfbc00000-0xfbdfffff,0xfbe04000-0xfbe07fff irq 42 at device 0.0 on pci3 ix0: Using MSIX interrupts with 9 vectors ix0: Ethernet address: 0c:c4:7a:6e:7e:9e ix0: PCI Express Bus: Speed 5.0GT/s Width x8 ix1: port 0xe000-0xe01f mem 0xfba00000-0xfbbfffff,0xfbe00000-0xfbe03fff irq 45 at device 0.1 on pci3 ix1: Using MSIX interrupts with 9 vectors ix1: Ethernet address: 0c:c4:7a:6e:7e:9f ix1: PCI Express Bus: Speed 5.0GT/s Width x8 And it works. But seems i have autonegotiation problem with gigabit switch - it connects at 100Mb/s ix0: flags=8843 metric 0 mtu 1500 options=8407bb ether 0c:c4:7a:6e:7e:9e inet 194.1.144.90 netmask 0xfffffff8 broadcast 194.1.144.95 inet 194.1.144.91 netmask 0xfffffff8 broadcast 194.1.144.95 media: Ethernet autoselect (100baseTX ) status: active i tried ifconfig ix0 media 1000baseT but it shows error. Any idea what i really have in my server and how to manually set it up to 1Gb/s Seems like no phy is detected. From owner-freebsd-hackers@freebsd.org Fri Jul 8 12:03:03 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EE8D3B8250A for ; Fri, 8 Jul 2016 12:03:03 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: from mail-qk0-x244.google.com (mail-qk0-x244.google.com [IPv6:2607:f8b0:400d:c09::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A71E41AC1 for ; Fri, 8 Jul 2016 12:03:03 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: by mail-qk0-x244.google.com with SMTP id r68so8132632qka.3 for ; Fri, 08 Jul 2016 05:03:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=date:from:to:subject:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=+8WPtPHos/Yoz950VJ4JznavMv6epPfdv3QnK2qd3pQ=; b=IIP3I39P3FTF6HCmQx9rhUfrB/vRVrXTMIEoEEKRFj4zQpfONI2VckolpdVJlhmLNv Yr09Qr/jYjmKoIqsldUkvsJhnR0GD8eIyKFBkWf3TxJnoV0ChZmoT/Znkth6Kd48tSOv jpgCR/S3rS23v4HWgxQFMF9yLoJr3jsZUHrRZo1BUx4333O+veDaemFVacALEEwWcUXb urcCo7UbgwN82VW3agOdza7bIMtdlwBD5DxPGlqcqLRkLWIKDf0FsxNdee5YfaE3mO47 Z8tIdeYxXKV+fQCH31v+WnR14UOFjy94fZ4ivLgPccFAxcNT+RdIQuCWFHKrpJUtlA2M Z4MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+8WPtPHos/Yoz950VJ4JznavMv6epPfdv3QnK2qd3pQ=; b=kxUWgP3/3GC6rUqVmoIEpk1W0gClKVs9VSbBT5JtDQMzPmbl83xNvCDOu/ankU+euc btmOIC6OVGzwqR6z8YRbKFlN7oclsd9BFx1JA1rWiE22wDIOBj0dvnRSJgWlcqGWGqsD svXH9yG71wupmwhAsUN2MMC1euB60jni80X4IgFE7aJJ7RnjVpaak+V+ATq29fsFJyMc fD9IODOLgGxqa4DugOkwsyk8MxrkEOn0j7HycEhEFPq51Hik3/grn+mQr6+Ht9Qn2s8S TdKjafHcO5eKxpLilRpiLJawHmkwxFQTwV79QZGxjssDG4NP06i1U0N1NlbBnqsLf0/u Kj+A== X-Gm-Message-State: ALyK8tL+QRlxMBZudMNzJQeZF0TN9McvcE7LOVVAK9+ehPBSg+gDIIuWiFGEicOGSl1Eww== X-Received: by 10.194.200.100 with SMTP id jr4mr5025403wjc.176.1467979382321; Fri, 08 Jul 2016 05:03:02 -0700 (PDT) Received: from gumby.homeunix.com ([81.171.97.59]) by smtp.gmail.com with ESMTPSA id x83sm2718546wmx.9.2016.07.08.05.03.00 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 08 Jul 2016 05:03:01 -0700 (PDT) Date: Fri, 8 Jul 2016 13:02:58 +0100 From: RW To: freebsd-hackers@freebsd.org Subject: Re: Why kinfo_getvmmap is sometimes so expensive? Message-ID: <20160708130258.7b772558@gumby.homeunix.com> In-Reply-To: <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com> References: <20160707001913.GJ38613@kib.kiev.ua> <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.2) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 12:03:04 -0000 On Thu, 7 Jul 2016 15:32:28 -0700 Yuri wrote: > On 07/06/2016 17:19, Konstantin Belousov wrote: > > To calculate residency count for the process map entries, kernel > > has to iterate over all pages. This operation was somewhat > > optimized in 10.3 and HEAD, particularly for the large sparce > > mappings. But for large populated mappings there is no other way > > then to check each page. > > > > You may confirm my hypothesis by setting sysctl > > kern.proc_vmmap_skip_resident_count to 0 and see whether the CPU > > consumption changed. Of course, you will not get the resident count > > in the returned structure, after the knob is tweaked. > > > When people raise the question of why malloc library doesn't unmap > the memory, developers there usually say that they call > madvise(MADV_FREE) and this is as good as unmap. It's better than unmapping because freed memory is commonly re-malloced shortly after it's freed. > But this example > shows that this isn't quite the case on the FreeBSD, and unmapping is > better. That doesn't mean it's better in general. From owner-freebsd-hackers@freebsd.org Fri Jul 8 16:20:01 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D0A1B83395 for ; Fri, 8 Jul 2016 16:20:01 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it0-f47.google.com (mail-it0-f47.google.com [209.85.214.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1BA0B1226 for ; Fri, 8 Jul 2016 16:20:00 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it0-f47.google.com with SMTP id h190so13109794ith.1 for ; Fri, 08 Jul 2016 09:20:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :from:date:message-id:subject:to:cc; bh=UNL5m55Wd9TBiws/ufa44O9Q7Of0TVuTF3jTudwKjdw=; b=UI5rRechDOR1LwHBwc4dnlbnD9oACJkcTSRldnt7JEmE1uaZkvZOXe4lzrSSvtDJ1v Wue9lNg+BBktFnnc1XiXfW0Gr3PuA/6zd+f7pGmHswOzRD2bGr8I4Ixcmt3BjF+PcMHX 1YfMp2gihmye5WGLN5xE7fN52omJQJ/JYPeHN9RTr2lFF8oA9R6IQvQGwwRbqn6bgJXV a+aA2SAIw+LGxog9kMwSwLGUtm7ogUdICJunz5yCE7UfSTcP4QKDxt6X+zAJc7cqIRFS zqLaUc9rPfK9qln3W9/Koubony42HFMKw24ZcQzuxHTCKBqCxahZselRh5QzkcqKs7/4 fVSQ== X-Gm-Message-State: ALyK8tKkZh12+cY+mX8SJOlARoUSv8lqStCK7WxOLf0daTsGIKgiUDBcL4l2frhuwKtqNA== X-Received: by 10.36.58.13 with SMTP id m13mr4076460itm.81.1467993744677; Fri, 08 Jul 2016 09:02:24 -0700 (PDT) Received: from mail-io0-f176.google.com (mail-io0-f176.google.com. [209.85.223.176]) by smtp.gmail.com with ESMTPSA id o139sm1490481ito.4.2016.07.08.09.02.24 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 08 Jul 2016 09:02:24 -0700 (PDT) Received: by mail-io0-f176.google.com with SMTP id s93so4922600ioi.3 for ; Fri, 08 Jul 2016 09:02:24 -0700 (PDT) X-Received: by 10.107.46.162 with SMTP id u34mr9443035iou.162.1467993744071; Fri, 08 Jul 2016 09:02:24 -0700 (PDT) MIME-Version: 1.0 Reply-To: cem@freebsd.org Received: by 10.36.206.2 with HTTP; Fri, 8 Jul 2016 09:02:23 -0700 (PDT) In-Reply-To: <20160708130258.7b772558@gumby.homeunix.com> References: <20160707001913.GJ38613@kib.kiev.ua> <6193bbf3-39cd-abaa-a5e4-0480c40dac55@rawbw.com> <20160708130258.7b772558@gumby.homeunix.com> From: Conrad Meyer Date: Fri, 8 Jul 2016 09:02:23 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Why kinfo_getvmmap is sometimes so expensive? To: RW Cc: FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 16:20:01 -0000 On Fri, Jul 8, 2016 at 5:02 AM, RW via freebsd-hackers wrote: > On Thu, 7 Jul 2016 15:32:28 -0700 > Yuri wrote: > >> When people raise the question of why malloc library doesn't unmap >> the memory, developers there usually say that they call >> madvise(MADV_FREE) and this is as good as unmap. > > It's better than unmapping because freed memory is commonly re-malloced > shortly after it's freed. > >> But this example >> shows that this isn't quite the case on the FreeBSD, and unmapping is >> better. > > That doesn't mean it's better in general. Additionally, it would not be difficult to make "getProcessSizeBytes()" cheaper without changing malloc. Fetching the entire VM map from the kernel when you only care about an integer RSS count is obviously inefficient. Best, Conrad From owner-freebsd-hackers@freebsd.org Fri Jul 8 16:34:38 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4BA52B839E7 for ; Fri, 8 Jul 2016 16:34:38 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2FF211F3E for ; Fri, 8 Jul 2016 16:34:37 +0000 (UTC) (envelope-from allanjude@freebsd.org) Received: from [10.1.1.2] (unknown [10.1.1.2]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id D28421622 for ; Fri, 8 Jul 2016 16:34:30 +0000 (UTC) Subject: Re: help with onboard LAN To: freebsd-hackers@freebsd.org References: From: Allan Jude Message-ID: <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org> Date: Fri, 8 Jul 2016 12:34:30 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 16:34:38 -0000 On 2016-07-08 06:54, Wojciech Puchar wrote: > my supermicro-rebranded server is specified as having 2 1Gb/s ethernet > ports onboard > > what actually is: > > ix0@pci0:3:0:0: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 > hdr=0x00 > vendor = 'Intel Corporation' > device = 'Ethernet Controller 10-Gigabit X540-AT2' > class = network > subclass = ethernet > ix1@pci0:3:0:1: class=0x020000 card=0x152815d9 chip=0x15288086 rev=0x01 > hdr=0x00 > vendor = 'Intel Corporation' > device = 'Ethernet Controller 10-Gigabit X540-AT2' > class = network > subclass = ethernet > > > > which is strange > > card is autodetected with ixgbe driver under FreeBSD 10 and i put > > device miibus > device ixgbe > > in my custom kernel. > > ix0: > port 0xe020-0xe03f mem 0xfbc00000-0xfbdfffff,0xfbe04000-0xfbe07fff irq > 42 at device 0.0 on pci3 > ix0: Using MSIX interrupts with 9 vectors > ix0: Ethernet address: 0c:c4:7a:6e:7e:9e > ix0: PCI Express Bus: Speed 5.0GT/s Width x8 > ix1: > port 0xe000-0xe01f mem 0xfba00000-0xfbbfffff,0xfbe00000-0xfbe03fff irq > 45 at device 0.1 on pci3 > ix1: Using MSIX interrupts with 9 vectors > ix1: Ethernet address: 0c:c4:7a:6e:7e:9f > ix1: PCI Express Bus: Speed 5.0GT/s Width x8 > > And it works. > > But seems i have autonegotiation problem with gigabit switch - it > connects at 100Mb/s > > ix0: flags=8843 metric 0 mtu 1500 > > options=8407bb > > ether 0c:c4:7a:6e:7e:9e > inet 194.1.144.90 netmask 0xfffffff8 broadcast 194.1.144.95 > inet 194.1.144.91 netmask 0xfffffff8 broadcast 194.1.144.95 > media: Ethernet autoselect (100baseTX ) > status: active > > > i tried > > ifconfig ix0 media 1000baseT > > but it shows error. > > > Any idea what i really have in my server and how to manually set it up > to 1Gb/s > > Seems like no phy is detected. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" Are you sure they are 1 Gigabit ports? They look like 10 Gigabit ports. I have not had trouble getting any of my 10 Gigabit ports to link to a 1 Gigabit switch. Install and run 'dmidecode', and in the first page or two, get the model number of the supermicro motherboard. It will help shed light on the situation. Will look something like this: Base Board Information Manufacturer: Supermicro Product Name: X10DRi-LN4+ -- Allan Jude From owner-freebsd-hackers@freebsd.org Sat Jul 9 01:43:14 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A645FB83E1D; Sat, 9 Jul 2016 01:43:14 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: from mail-yw0-x22a.google.com (mail-yw0-x22a.google.com [IPv6:2607:f8b0:4002:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 73B8C18B9; Sat, 9 Jul 2016 01:43:14 +0000 (UTC) (envelope-from dcrosstech@gmail.com) Received: by mail-yw0-x22a.google.com with SMTP id l125so50839438ywb.2; Fri, 08 Jul 2016 18:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:cc; bh=s9wFMRb5nSXLWOxXSbco6K4+d90kJ7McKN32ouFL5Cw=; b=Jre5cmm+10apMmStyyjiCkew3eGvH5AppI+CpYf74E0Ei2/x+WZJN2nR2L/hviMbCd 315YEYB0ufUlbEuCHxuEf1DWrmyKYhcBP5KpuoFw9G8dhdsYmowY6A7FPsKv1vep/PW3 P76DYlb4eCa9xzllyxrLb6YaO11jW69DlUmOWQ/xe1qjGWlLHLpZzFoqv0JgXhM3i9Rc jI/rT54BQQdglWE5Nz1Uhu1QjKe+maTV6zH0j4N8OnTKK+f7TwrFz0InVV0FZMjahC0S 85L3dxqYjuCeG2p6Cf6ao/Vu57Qo7YvoZKilE2sXYuDnnsn3OK3FJPA9bZWYAX5o6O+K h3iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=s9wFMRb5nSXLWOxXSbco6K4+d90kJ7McKN32ouFL5Cw=; b=On6E2EAJdOGwkgVsZLqZAs5wREV41ReW5Fqv6pINIHNRkv1APBiiaxgMx9r9xHTVrF qePcn2qPcLaSnxu0QPuyB2dpb/ZOupFX9ndLN81zXr5sYjjvKzlmdYsi8zXtCUsaSUaK NxtyCZ0HIeiW9K+P7Zq+5ASve9CzovfPch1IrcJNmBSS6PR3492JjzTV8YIc5qzcxm+X jyyOzh61tKhZ9V2X6XumPz/eG8kjO74wZQ9gJ4GsSLanOT0MHJ6RZjx4y8/kxnQSf0YN JT7UIOGwPwF6/j7YgU2yCiqKc8Bjn5lvkvSgjcsqikNdOCsGNrkYrClvfA2HwnN4/H65 Mfpw== X-Gm-Message-State: ALyK8tLjYGaFn4fg4A4RzLheSqw5T/aKTklr34TdDzp2cZjrkSWuVm6NlujkdTbmLsiaalVEAAhxcTCtWgHwag== X-Received: by 10.13.217.20 with SMTP id b20mr115656ywe.44.1468028592896; Fri, 08 Jul 2016 18:43:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.212.66 with HTTP; Fri, 8 Jul 2016 18:43:12 -0700 (PDT) From: David Cross Date: Fri, 8 Jul 2016 21:43:12 -0400 Message-ID: Subject: Re: Reproducable panic in FFS with softupdates and no journaling (10.3-RELEASE-pLATEST) FOUND IT, including reproduction steps To: Konstantin Belousov Cc: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailman-Approved-At: Sat, 09 Jul 2016 01:52:10 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2016 01:43:14 -0000 Ok... I found it. All of the writes go through ffs_write (including VOP_RECLAIM, so my statement that VOP_RECLAIM couldn't handle things that vinvalbuf left behind is obviously incorrect). Sometimes it worked, sometimes it paniced, I started putting more deugging into it and I noticed the following: The problem file would balloc twice as follows: attempting to balloc inode 18237205 softdep_setup_allocdirect(18237205, 1, 72834400, 0, 8192, 0, 0xfffffe00f76a6d88) balloc at 291337600, flags: 50000 attempting to balloc inode 18237205 softdep_setup_allocdirect(18237205, 0, 72834448, 0, 16384, 0, 0xfffffe00f7749970) balloc at 291337792, flags: 7f040080 panic: softdep_deallocate_dependencies: dangling deps Furthrer reading of ffs_write to figure out why it worked sometimes and not others pointed me at the IO_SYNC flag, if passed in ffs_write dispatches to bwrite.. which gives the panic, otherwise it goes to bawrite which does not. However the problem is in ufs_balloc, around line 778 (which I saw in the earlier newbuf dump); There NO call to any write method for that buffer. If we compare this to the other calls to softdep_setup_allocdirect in that function (lines: 148, 264, 708, 828) we see that each of them has some call to bwrite, bdwrite, bawrite following it (a number of the other calls do not make any direct calls to b*writes either, I do not know nearly enough to say if those are correct or incorrect; I tried adding bwrite arround those lines with a conditional on IO_SYNC and I only made it panic earlier. I just don't know what the semantics of this enough. That being said, I was finally able to isolate a set of reproduction steps that anyone can run. As it stands it relies on a set of filesystem options that are no longer standard (but were, not that long ago), but I definitely believe they could be trivially modified to work on *any* UFS1/UFS2 filesystem... To that extent I am NOT including them, I will reply individually with the exploit code an instructions to reproduce; if you want, and you have an appropriate commit history or other credentials I will forward it on. Thanks, and I eagerly look forward to the patch, or assisting where I can in development. From owner-freebsd-hackers@freebsd.org Sat Jul 9 05:14:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46BBCB8467D for ; Sat, 9 Jul 2016 05:14:37 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: from puchar.net (puchar.net [194.1.144.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "puchar.net", Issuer "puchar.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D054918A8; Sat, 9 Jul 2016 05:14:36 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: Received: from 127.0.0.1 (localhost [127.0.0.1]) by puchar.net (8.15.2/8.14.9) with ESMTPS id u695EXWS070743 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 9 Jul 2016 07:14:34 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from laptop.wojtek.intra (localhost [127.0.0.1]) by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u695Eb2C008603; Sat, 9 Jul 2016 07:14:37 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from localhost (wojtek@localhost) by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u695EVhO008600; Sat, 9 Jul 2016 07:14:32 +0200 (CEST) (envelope-from wojtek@puchar.net) X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs Date: Sat, 9 Jul 2016 07:14:31 +0200 (CEST) From: Wojciech Puchar X-X-Sender: wojtek@laptop.wojtek.intra To: Allan Jude cc: freebsd-hackers@freebsd.org Subject: Re: help with onboard LAN - fixed In-Reply-To: <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org> Message-ID: References: <042e6e78-13cb-7d48-68b1-495a0a341129@freebsd.org> User-Agent: Alpine 2.20 (BSF 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (puchar.net [10.0.1.1]); Sat, 09 Jul 2016 07:14:34 +0200 (CEST) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2016 05:14:37 -0000 by using driver from latest FreeBSD-10. Everything now works fine. thanks for help From owner-freebsd-hackers@freebsd.org Sat Jul 9 10:47:17 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80BB3B831FE for ; Sat, 9 Jul 2016 10:47:17 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: from puchar.net (puchar.net [194.1.144.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "puchar.net", Issuer "puchar.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 1596719B7 for ; Sat, 9 Jul 2016 10:47:16 +0000 (UTC) (envelope-from wojtek@puchar.net) Received: Received: from 127.0.0.1 (localhost [127.0.0.1]) by puchar.net (8.15.2/8.14.9) with ESMTPS id u69AlExk008206 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 9 Jul 2016 12:47:14 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from laptop.wojtek.intra (localhost [127.0.0.1]) by laptop.wojtek.intra (8.14.9/8.14.9) with ESMTP id u69AlI9w001211 for ; Sat, 9 Jul 2016 12:47:18 +0200 (CEST) (envelope-from wojtek@puchar.net) Received: from localhost (wojtek@localhost) by laptop.wojtek.intra (8.14.9/8.14.9/Submit) with ESMTP id u69AlD0K001208 for ; Sat, 9 Jul 2016 12:47:13 +0200 (CEST) (envelope-from wojtek@puchar.net) X-Authentication-Warning: laptop.wojtek.intra: wojtek owned process doing -bs Date: Sat, 9 Jul 2016 12:47:13 +0200 (CEST) From: Wojciech Puchar X-X-Sender: wojtek@laptop.wojtek.intra To: freebsd-hackers@freebsd.org Subject: apache&EnableSendfile on = 100% CPU Message-ID: User-Agent: Alpine 2.20 (BSF 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (puchar.net [10.0.1.1]); Sat, 09 Jul 2016 12:47:15 +0200 (CEST) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2016 10:47:17 -0000 is it apache or possible kernel bug (FreeBSD 10)? When i turn on EnableSendfile setting in apache config, the process handling http connection would use 100% CPU no matter if i transfer 1kB/s or 1GB/s. turning it off fixes the problem. where can i search for problem source? From owner-freebsd-hackers@freebsd.org Sat Jul 9 06:47:47 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88FA0B85B87 for ; Sat, 9 Jul 2016 06:47:47 +0000 (UTC) (envelope-from rupavath@juniper.net) Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0125.outbound.protection.outlook.com [104.47.40.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT SSL SHA2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 487A619A0 for ; Sat, 9 Jul 2016 06:47:46 +0000 (UTC) (envelope-from rupavath@juniper.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=junipernetworks.onmicrosoft.com; s=selector1-juniper-net; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=nGKVTKXnpC07ePQLOdDN2ZKDLmgub5HJMNiLv/rOhBA=; b=GPovcPY+cJvZSXGpu6H4qBUOJ9AWvU7ej6xSyoKbOxhfP/N+8hLEJrVthnAuMS0ymRDMdj9x8TCNNouUZoGesJhPtHIgVbtigwZkZAE7V9gHMq/w6cdbQorNBjuMzR2xz1kjttN9YHCvHNg3Cd44554czVnzu8Xyh4QNH0uapaI= Received: from CY4PR05MB2824.namprd05.prod.outlook.com (10.169.182.146) by CY4PR05MB2823.namprd05.prod.outlook.com (10.169.182.145) with Microsoft SMTP Server (TLS) id 15.1.523.12; Sat, 9 Jul 2016 00:13:35 +0000 Received: from CY4PR05MB2824.namprd05.prod.outlook.com ([10.169.182.146]) by CY4PR05MB2824.namprd05.prod.outlook.com ([10.169.182.146]) with mapi id 15.01.0523.028; Sat, 9 Jul 2016 00:13:35 +0000 From: Sreekanth Rupavatharam To: "freebsd-hackers@freebsd.org" Subject: mbuf leak in kern_sendit? Thread-Topic: mbuf leak in kern_sendit? Thread-Index: AQHR2Xa75spuXAJjqkGxwSdNjN01Rg== Date: Sat, 9 Jul 2016 00:13:35 +0000 Message-ID: <1286BFDE-9238-4967-913F-26E0E28D0F74@juniper.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/f.17.0.160611 authentication-results: spf=none (sender IP is ) smtp.mailfrom=rupavath@juniper.net; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [2601:646:8200:65cc:1432:f7ab:4189:ef7] x-ms-office365-filtering-correlation-id: 145b30de-07d7-44c5-4326-08d3a78dde62 x-microsoft-exchange-diagnostics: 1; CY4PR05MB2823; 6:oUpc0Z/rwOeWd17lyTOHAY74K4dXuWeSZCyQr3YDiIpjk1Z1Bh+QXWOIoHcjFWDHIwIMAs5foq4Pm8/tRHv26iuhldzbkAN0c7+PCZZy4P98j9jgU9XLESEQh9aa5rVKVJx66vIxhGKzryQI/lodPyO+j8Ee6mOCSkc+YinUPiTMQvxB2VZoh1TQIXL+2iTVcJVA2/ClCtbuQoL8BWc0fFn65KrBIZaLtOmh5Y1SmLxl5HZZVKuPS9FlC4CrE+H2ceVUf17U4tsLR/dzkX1+xEoLPbgM/uYrn4Q87nsP5gCmYqwWIlfbIl2qmsqz9FKmgAlZJwiLIRiQtjoSTyS7Yw==; 5:WPegGxBcEKMAyV7glij6y8DOyNsDzLORoS+hs6BE1jDNAatllfQxPObh2Sh/gMHVZ/HokjVNxELOXihvRKqjr7H8qeE3T5qi4PGCDYKm991CZLGyIkT1ZoteW+HMHCPecy6KGEF5u3yAliLV5C+FxA==; 24:2mf0igjC4+5/BKefiv8pE2n1+Dj/BSVDdDrqDsfpXb3YD0g9aqRHOkVP2jSFx6nhcTguQERJDPS+wEgNYdeVBRenohXFrGZ9yBgE/XKooWA=; 7:45iyb0CEGuRDxa6I3kjVeIdKRQBQbbWcW85aPKGgVloax8Ov8eaouUZipgQ2i5Gen2ym1+TYo9nXCDaa0SifB/OPoS1TK0YR7vCBePI9Kp8qWVXSn3Thi9yetNO1PexbrzsxK77A0SqwEwhQwoNOgNfxmTAwVFC7AuKNTdBcjqKXfetPAxRi/B/S6DHn21cumvT2qs0do7DHfHNggwNh6SX+xv5dIq3prt2f9iBzrjUtl0XXfpVJ0E3ZYrpFYZ4W x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY4PR05MB2823; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026); SRVR:CY4PR05MB2823; BCL:0; PCL:0; RULEID:; SRVR:CY4PR05MB2823; x-forefront-prvs: 0998671D02 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(7916002)(189002)(199003)(3280700002)(97736004)(83506001)(229853001)(450100001)(82746002)(110136002)(107886002)(101416001)(2351001)(7846002)(77096005)(33656002)(81166006)(189998001)(106356001)(81156014)(54356999)(105586002)(8936002)(2900100001)(106116001)(83716003)(87936001)(50986999)(99286002)(122556002)(68736007)(5002640100001)(6116002)(8676002)(4001350100001)(586003)(11100500001)(92566002)(2501003)(10400500002)(36756003)(86362001)(305945005)(7736002)(102836003)(3660700001)(5640700001)(2906002)(3826002)(104396002); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR05MB2823; H:CY4PR05MB2824.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: juniper.net does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: Content-Transfer-Encoding: base64 MIME-Version: 1.0 X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jul 2016 00:13:35.1349 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR05MB2823 X-Mailman-Approved-At: Sat, 09 Jul 2016 11:02:06 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2016 06:47:47 -0000 SSBzZWUgaW4ga2Vybl9zZW5kaXQoKSBmdW5jdGlvbihzdGFibGUvMTApLCB0aGUgY29udHJvbCBt YnVmIGRvZXNu4oCZdCBnZXQgZnJlZWQgb24gZXJyb3IuIEUuZy4sIA0KOTE0IAkgICAgICAgIGlm IChtcC0+bXNnX25hbWUgIT0gTlVMTCkgew0KOTE1IAkgICAgICAgICAgICAgICAgZXJyb3IgPSBt YWNfc29ja2V0X2NoZWNrX2Nvbm5lY3QodGQtPnRkX3VjcmVkLCBzbywNCjkxNiAJICAgICAgICAg ICAgICAgICAgICBtcC0+bXNnX25hbWUpOw0KOTE3IAkgICAgICAgICAgICAgICAgaWYgKGVycm9y ICE9IDApDQo5MTggCSAgICAgICAgICAgICAgICAgICAgICAgIGdvdG8gYmFkOyDih5AgSGVyZQ0K OTE5IAkgICAgICAgIH0NCg0Kb3IgDQoNCjkzMyAgICAgICAgZm9yIChpID0gMDsgaSA8IG1wLT5t c2dfaW92bGVuOyBpKyssIGlvdisrKSB7DQo5MzQgCSAgICAgICAgICAgICAgICBpZiAoKGF1aW8u dWlvX3Jlc2lkICs9IGlvdi0+aW92X2xlbikgPCAwKSB7DQo5MzUgCSAgICAgICAgICAgICAgICAg ICAgICAgIGVycm9yID0gRUlOVkFMOw0KOTM2IAkgICAgICAgICAgICAgICAgICAgICAgICBnb3Rv IGJhZDsg4oeQIEhlcmUNCjkzNyAJICAgICAgICAgICAgICAgIH0NCjkzOCAJICAgICAgICB9DQoN Cg0KOTY1IAliYWQ6DQo5NjYgCSAgICAgICAgZmRyb3AoZnAsIHRkKTsNCjk2NyAJICAgICAgICBy ZXR1cm4gKGVycm9yKTsNCk5vIGZyZWUgb2YgY29udHJvbCBtYnVmIGhlcmUgZWl0aGVyLiANCg0K QWN0dWFsbHksIHRoZSBvbmx5IHBsYWNlIHdoZXJlIHRoZSBtYnVmIGdldHMgZnJlZWQgaXMgd2hl biBpdCBjYWxscyBwcnVfc29zZW5kIHdoZXJlIGl0IGdldHMgZnJlZWQgaW4gdGhlcmUuIEFtIEkg bWlzc2luZyBzb21ldGhpbmcgaGVyZT8gRS5nLiwgdHJhY2tpbmcgdGhlIGNhbGwgdHJhY2UgZnJv bSBzZW5kaXQNCnNlbmRpdCgpDQogICAgICAgc29ja2FyZ3MoKSAtPiBjb250cm9sIG1idWYgaXMg YWxsb2NhdGVkIGhlcmUNCiAgICAgICBrZXJuX3NlbmRpdCgpIC0+IGl04oCZcyBmcmVlZCBvbmx5 IG9uIHBydV9zb3NlbmQoKQ0KICAgICAgIGNvbnRyb2wgbm90IGZyZWVkIG9uIGVycm9yLiAgQW0g SSBtaXNzaW5nIHNvbWV0aGluZz8gDQoNCg0KDQoNClRoYW5rcywNCg0KLVNyZWVrYW50aA0KDQoN Cg==