From owner-freebsd-current@freebsd.org Thu Dec 7 05:00:20 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94C2AE9CDE9 for ; Thu, 7 Dec 2017 05:00:20 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-170.reflexion.net [208.70.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 594647605C for ; Thu, 7 Dec 2017 05:00:19 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 4474 invoked from network); 7 Dec 2017 05:00:13 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 7 Dec 2017 05:00:13 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Thu, 07 Dec 2017 00:00:13 -0500 (EST) Received: (qmail 2504 invoked from network); 7 Dec 2017 05:00:13 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 7 Dec 2017 05:00:13 -0000 Received: from [192.168.1.25] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 7F00BEC932D; Wed, 6 Dec 2017 21:00:12 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: rpi2 hangup during poudriere build: lots of pfault wmseg status From: Mark Millard In-Reply-To: <5014B6E6-68BA-4499-8728-EF80237F3269@nuxi.ca> Date: Wed, 6 Dec 2017 21:00:11 -0800 Cc: freebsd-arm@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <05BEA04B-249B-4E7D-855A-46DA1A0DEA16@dsl-only.net> <36A8BDCC-4ECE-4187-8705-54A9E38E8AD5@dsl-only.net> <5014B6E6-68BA-4499-8728-EF80237F3269@nuxi.ca> To: Laurent Cimon X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2017 05:00:20 -0000 > On 2017-Dec-6, at 5:47 PM, Laurent Cimon wrote: >=20 >> On Dec 6, 2017, at 20:01, Mark Millard = wrote: >>=20 >> On 2017-Dec-6, at 1:54 PM, Laurent Cimon wrote: >>=20 >>>> On Dec 6, 2017, at 00:57, Mark Millard = wrote: >>>>=20 >>>> I tried to build some ports on a rpi2 >>>> (via poudriere) but it hung up: >>>> Ethernet and normal console use. (Note: >>>> the root file system is on a USB SSD >>>> and the swap partition is also on that >>>> USB SSD.) >>>>=20 >>>> But ~^b worked for getting to the db> >>>> prompt on the console. >>>>=20 >>>> =46rom there a ps suggests that it got hung >>>> up in pfault activity. (Possibly insufficient >>>> RAM+swap-partition space?) But it is not >>>> clear to me that it should end up hung up >>>> vs. killing processes or other such. >>>=20 >>> Hi, >>>=20 >>> =46rom what I know the raspberry pis use the same controller for = ethernet and >>> the USB hub on which you=E2=80=99re hosting an SSD. It seems like = you make very heavy >>> use of the USB ports, and all of the resources used by poudriere = except for the >>> CPU and the (very limited) memory that=E2=80=99s not in swap is = attached to them. If you >>> really didn=E2=80=99t have enough memory and swap, the linkers = would=E2=80=99ve been stopped. >>>=20 >>> I think it might just be a swap death. Poudriere compiles and = fetches in parallel >>> a lot, ethernet and disk I/O is slow because it=E2=80=99s very = limited, so linking takes >>> longer. You end up linking a few very big binaries at the same time, = and they >>> all fight for the memory, to get out of swap through page faults, = but there >>> are too many page faults, all too big, requesting for more CPU time = that=E2=80=99s >>> allowed to them. >>>=20 >>> This would explain why you have 3 linkers waiting on a page fault = out of the 4 >>> CPUs poudriere allows builds on, on top of the awk processes. It = would also >>> explain why you had easy access to the debugger: it was in memory = already with >>> the kernel. >>>=20 >>> I=E2=80=99d advise you to disable parallel builds and see if it = happens again, >>> but it would make building much slower. Using makejobs would help if = you >>> can afford watching the build. Otherwise be patient, it should = resolve itself >>> eventually, but it will take a while and it will happen again. >>=20 >> My post was more about how FreeBSD handled the >> heavy-use context and less about getting the >> builds to finish: it managed to to get to a >> state of no-progress for processes and a loss >> of normal control as far as I could tell. >>=20 >> I did a "c" to ddb and left it until just before >> this note then did ~ ^B again. Things looked the >> same. [I've finally rebooted the rpi2.] >>=20 >> PARALLEL_JOBS=3D1 was already in use but >> ALLOW_MAKE_JOBS=3Dyes was also in use. >> USE_TMPFS=3Dno was already in use. >>=20 >> While an ssh session was monitoring the >> build, Ethernet was not in heavy use. >> (No nfs mounts to its disks, for example.) >>=20 >> I may try without ALLOW_MAKE_JOBS=3Dyes and >> with ALLOW_MAKE_JOBS_PACKAGES empty/undefined >> to see if it can complete for such a context >> without having the same sort of problem. >>=20 >> Ultimately I can cross-build and install from >> those materials when I really want updates. I >> have the context for such. This was more about >> seeing how well the rpi2 did for self-hosted. >> Classically I've used a BPI-M3 with 2 GiBytes >> of RAM and a proportionally bigger swap partition >> instead (approximately). >>=20 >>=20 >> FYI (rpi2 after rebooting): >>=20 >> # swapinfo >> Device 1K-blocks Used Avail Capacity >> /dev/label/RPI2swap 1572860 0 1572860 0% >>=20 >> # df -m >> Filesystem 1M-blocks Used Avail Capacity Mounted on >> /dev/ufs/RPI2rootfs 195378 30791 148957 17% / >> devfs 0 0 0 100% /dev >> /dev/label/RPI2Aboot 49 12 37 25% /boot/msdos >>=20 >>=20 >> An rpi3 (aarch64) with the same amount of RAM, >> same type of USB SSD, etc., but well more swap >> completed building basically the same set of >> ports for the same poudriere settings just >> fine. >>=20 >> Interestingly for the default kern.maxswzone: >> (Just to show the reported recommended maximum >> figures for swap.) >>=20 >> rpi2: . . . exceeds maximum recommended amount (411488 pages). >> rpi3: . . . exceeds maximum recommended amount (925680 pages). >>=20 >> (I was running with somewhat under those maximums for >> the tests.) >>=20 >> # swapinfo >> Device 1K-blocks Used Avail Capacity >> /dev/gpt/RPI3swap 3702784 0 3702784 0% >>=20 >> # df -m >> Filesystem 1M-blocks Used Avail Capacity Mounted on >> /dev/ufs/RPI3rootfs 195378 14937 164811 8% / >> devfs 0 0 0 100% /dev >> /dev/label/RPI3Aboot 49 7 42 15% /boot/efi >>=20 >> If I restricted the rpi3 to somewhat under what the >> rpi2 allows for swap, I do not know if it would also >> hang up vs. not. >>=20 >> If having more swap makes the difference, then it >> would not seem to be being I/O-bound that would >> explain the hangup. >>=20 >>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net >=20 > There are a few factors that could have prevented this on your = raspberry pi 3. >=20 > It has a faster, 64 bit CPU instead of the raspberry pi 2=E2=80=99s 32 = bit CPU and the > RAM is twice as fast. These make it less likely for this to happen, = because it > makes both building and linking faster, which reduces the odds of = linking 2 > binaries at once, let alone 3. There are many things that could have = gone > differently in the build that didn=E2=80=99t make it end up linking 3 = big binaries at > the same time to cause the same behaviour. >=20 > What I think happened on your raspberry pi 2 is just likely bad luck = that could > also happen on your raspberry pi 3. The odds of 3 parallel builds = needing so > much ram to link at the exact same time are still very low, just less = low on > faster hardware. >=20 > Keep in mind that this is still entirely theoretical, I don=E2=80=99t = present it as an > absolute explanation. It=E2=80=99s simply what I understand from this. >=20 > I=E2=80=99d be curious seeing how a different operating system using a = system similar to > poudriere where builds are done on one CPU but in parallel would be = handled on > the rpi2. My understanding is that this is simply a mix of hardware = limitation > and conceptual flaws with the swap. And by flaws I mean, your = operating system > cannot save you when you try to do something that your hardware cannot = possibly > do. For reference: The rpi2 hung up during: [08:00:15] [01] [00:00:00] Building devel/binutils | binutils-2.28,1 (Only one builder, no prior builds should matter. All 4 cores allowed.) On the rpi3 this was: [08:13:38] [01] [00:00:00] Building devel/binutils | binutils-2.28,1 [10:17:12] [01] [02:03:34] Finished devel/binutils | binutils-2.28,1: = Success (Only one builder, no prior or following builds should matter. All 4 cores allowed.) Comparing a couple of examples that both completed: rpi2: [00:43:40] [01] [00:00:00] Building lang/perl5.24 | perl5-5.24.3 [01:38:37] [01] [00:54:57] Finished lang/perl5.24 | perl5-5.24.3: = Success vs. rpi3: [00:26:35] [01] [00:00:00] Building lang/perl5.24 | perl5-5.24.3 [00:56:14] [01] [00:29:39] Finished lang/perl5.24 | perl5-5.24.3: = Success rpi2: [07:12:51] [01] [00:00:00] Building databases/sqlite3 | sqlite3-3.21.0_1 [07:59:04] [01] [00:46:13] Finished databases/sqlite3 | = sqlite3-3.21.0_1: Success vs. rpi3: [07:43:31] [01] [00:00:00] Building databases/sqlite3 | sqlite3-3.21.0_1 [08:13:35] [01] [00:30:04] Finished databases/sqlite3 | = sqlite3-3.21.0_1: Success The rpi2 lasting days longer than the rpi3 2hr figure for devel/binutils is likely out of scale for processor and RAM differences in speed. (The USB-tied performance likely is not all that different.) =3D=3D=3D Mark Millard markmi at dsl-only.net