From owner-freebsd-arm@freebsd.org Thu Sep 6 05:15:25 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A9F8FE8115 for ; Thu, 6 Sep 2018 05:15:25 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "www.zefox.org", Issuer "www.zefox.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5C41685C8D for ; Thu, 6 Sep 2018 05:15:24 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.15.2/8.15.2) with ESMTPS id w865FL5t004021 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 5 Sep 2018 22:15:22 -0700 (PDT) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.15.2/8.15.2/Submit) id w865FKNV004020; Wed, 5 Sep 2018 22:15:21 -0700 (PDT) (envelope-from fbsd) Date: Wed, 5 Sep 2018 22:15:20 -0700 From: bob prohaska To: Mark Millard Cc: freebsd-arm@freebsd.org, bob prohaska Subject: Re: RPI3 swap experiments (r338342 with vm.pageout_oom_seq="1024" and 6 GB swap) Message-ID: <20180906051520.GB3482@www.zefox.net> References: <20180813185350.GA47132@www.zefox.net> <20180814014226.GA50013@www.zefox.net> <20180815013612.GB51051@www.zefox.net> <20180815225504.GB59074@www.zefox.net> <20180901230233.GA42895@www.zefox.net> <20180906003829.GC818@www.zefox.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 05:15:25 -0000 On Wed, Sep 05, 2018 at 07:05:11PM -0700, Mark Millard wrote: > [I've omitted Kirk McKusick as my notes are largely off subject for > what he asked about for testing specific to his changes.] > > On 2018-Sep-5, at 5:38 PM, bob prohaska wrote: > > > On Sat, Sep 01, 2018 at 04:02:33PM -0700, bob prohaska wrote: > > > > It looks as if using all six GB of swap doesn't cause any immediate problem, > > at least so long as swap usage stays relatively low, say 1.5 GB. In a final > > test, TRIM was turned on without catastrophe, though it had little to do > > given that all the busy filesystems were on USB. The penalty was about one > > hour extra (25 vs 24 hours) to run -j4 buildworld from a clean start. > > What UFS file systems with TRIM enabled were on some /dev/mmcsd0* ? Everything _except_ /var, /tmp and /usr. Effectively, not much. > Did you 1st use "fsck_ffs -E" on any of the file systems where > trim would work? No, I did not. > > If I gather right, the "clean start" was on USB where TRIM during the > clean would not be available. > By "clean start" I meant running make cleandir twice and removing /usr/obj/usr/src. That was done to make all of the -j4 buildworld tests consistent. > The extra swap space may have contributed to the extra time? Having > more swap uses more kernel memory for keeping track of the swap > if I understand right. That leaves less for other things. That could > have consequences other than outright failure. > There were two buildworld tests run with 6 GB of swap, the first without TRIM being turned on and the second with TRIM turned on. The second run too an hour longer, with TRIM being on the only difference. > Quoting "man 8 loader" related to kern.maxswzone : > > Note that swap metadata can be fragmented, which means that > the system can run out of space before it reaches the > theoretical limit. Therefore, care should be taken to not > configure more swap than approximately half of the > theoretical maximum. > > Running out of space for swap metadata can leave the system > in an unrecoverable state. > > This wording suggests not allocating 6 GiBytes of swap when 3.5 GiBytes > is approximately half the theoretical maximum --even if the system does > still operate with 6 GiBytes. > It's understood that 6 GB of swap on a Pi3 isn't a good idea. It was tried to see if something useful might be revealed. > (Note: The man page's reference to "eight times the amount of physical memory" > and such does not seem to apply to all platforms. And rpi2 V1.1 and an rpi3 > with the same amount of RAM get rather difference recommended figures > according to the messages generated.) > > > One chance observation caught my attention, however. I'd always thought > > the VM system would favor fast swap devices over slow, but the gstat log > > recorded this, visible at > > http://www.zefox.net/~fbsd/rpi3/swaptests/r338342/3gbsd_3gbusb/trim_on/swapscript.log > > > > > > > > dT: 10.004s w: 10.000s > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name > > 3 175 91 673 4.0 84 701 4.0 0 0 0.0 24.4 mmcsd0 > > 4 173 88 693 106.6 86 723 176.5 0 0 0.0 103.4 da0 > > 1 58 30 224 4.5 28 220 4.1 0 0 0.0 14.5 mmcsd0s2b > > 3 175 91 673 4.0 84 701 4.0 0 0 0.0 24.7 mmcsd0s2 > > 1 58 30 223 4.0 28 244 3.8 0 0 0.0 14.0 mmcsd0s2d > > 1 59 31 227 3.7 28 237 4.3 0 0 0.0 14.9 mmcsd0s2e > > 2 57 28 235 140.2 28 236 103.8 0 0 0.0 186.1 da0a > > 0 56 28 224 178.4 28 222 35.9 0 0 0.0 131.5 da0b > > 2 59 31 234 9.4 28 240 59.1 0 0 0.0 99.5 da0d > > 0 0 0 0 0.0 0 3 15011 0 0 0.0 150.1 da0e > > 0 1 0 0 0.0 1 22 13376 0 0 0.0 147.8 da0g > > Are there any examples of "d/s kBps ms/d" being non-zero? If they are > always zero then no TRIMing likely happened. That in turn would make > TRIM an unlikely use of an extra hour. > Near as I can tell there are no non-zero values for d/s, which if it's tied to TRIM is reasonable for all but microSD, which did have TRIM enabled. Since microSD wasn't particularly busy, apart from swap, that too is unsurprising. > > Tue Sep 4 15:07:39 PDT 2018 > > Device 1K-blocks Used Avail Capacity > > /dev/da0b 1048576 236872 811704 23% > > /dev/mmcsd0s2b 1048576 221568 827008 21% > > /dev/da0d 1048576 218636 829940 21% > > /dev/da0a 1048576 222028 826548 21% > > /dev/mmcsd0s2d 1048576 221660 826916 21% > > /dev/mmcsd0s2e 1048576 221392 827184 21% > > Total 6291456 1342156 4949300 21% > > As I understand the normal use of multiple swap partitions > is to split the load across channels that can operate > independently in parallel. Having 3 such partitions on > the same channel/device may only add overhead vs. one > full-size partition per channel/device. > The multiple partitions on one device were a simple way to vary swap amounts. I did expect that having swap on both microSD and USB would lead to some performance gain, but it seems not so. > I also do not know if mmcsd0 and da0 can have independent, > parallel I/O activity in the rpi3 context. > That is a key point; I took it for granted that they _can_ have independent, parallel I/O activity. If not, seemingly it makes better sense, both for performance and cost, to use a single large microSD card and skip USB devices entirely. That seems to be where the evidence is leading me. > > Sep 4 14:57:52 www sshd[41673]: error: Received disconnect from 103.207.39.197 port 64499:3: com.jcraft.jsch.JSchException: Auth cancel [preauth] > > Sep 4 15:04:19 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2217840, size: 12288 > > Note: my context is very different from yours and I get no console > messages about I/O or waits during buildworld buildkernel or other > such build/install tests. > That's likely the benefit of having 2 GB of RAM, I would think. > > The system has lots of fast swap available on microSD, but is seemingly choking > > trying to use the slow swap on da0 _and_ run traffic to /usr and /var. Buildworld > > doesn't run any faster with less swap, so I don't think the oversupply is the problem. > > If I understand right, your only 6 GiByte swap experiment was slower > but you attributed all time variations to an (inactive? ever used?) > TRIM enabled status. You might want to manipulate the two > separately. For all I know something else may also have contributed. > The tests were 6 GB swap, TRIM off vs TRIM on. TRIM on took an extra hour, everything else kept the same to the best of my ability. I did the TRIM test sort of on a whim, just to see if things would go spectacularly wrong. That they didn't is encouraging. It certainly isn't decisive. > I've no clue if having so many swap partitions on the same channel/device > has consequences that having only one per channel/device would avoid. > > > Is this expected behavior? > > As I understand the approximately even split across the in-use swap > partitions is the normal way things are split. It is the placement > of the partitions themselves that contributes to how effective that > split is at improving the swap/paging I/O if I understand right. > The great difference in activity between da0 and mmcsd0 suggests they do have a degree of independence. Whether that independence can be exploited to improve swap throughput is the point I wanted to explore. Thanks for reading! bob prohaska