From owner-freebsd-arm@freebsd.org Tue Aug 14 23:50:13 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 86780106A139 for ; Tue, 14 Aug 2018 23:50:13 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x22f.google.com (mail-it0-x22f.google.com [IPv6:2607:f8b0:4001:c0b::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 141D18E5C7 for ; Tue, 14 Aug 2018 23:50:13 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x22f.google.com with SMTP id s7-v6so21006885itb.4 for ; Tue, 14 Aug 2018 16:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=Dm9HS6o1A6PUEFuQ06yarQlzd+UMsR02+AiHbEt+rcI=; b=Y1A7Ns5/TMv31GDaaq83Y7uLzsZ0RTuZJ9lahJaeKbEtzk9pGA5vEItxo0b4Jx3vao E2fKyZYG/y8VfU1v5kUpZAUr9mEQgbO7sPE97Vigye2Izrk+YOPOtnjZEyTdBh/OLIPm LEVUv+dOV6T6uTqNTTNxOq/XL3CeTcNsKS2nmXyDw0w8bnPVBbrMjmhfM1XrAQ+fFXLc Rje2FvU1TEOJ4PamEPxe1gJ3IlyPVbOUQe7HZcT1mTLqb2FTj1ixIiWB4nnZlOLQT3M0 MW77bOrn1kLxuc8JledIUNi8ulxODbcVsFJdiZ9f2dnAhmfVLl0g+q3PVcBExy0SwbpV 4fuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=Dm9HS6o1A6PUEFuQ06yarQlzd+UMsR02+AiHbEt+rcI=; b=SiOjdMguJ8vaazyjNx6ssZyhdxaUlS1l9xigKcx5NLQbIFPXORSmCUwgv3Zg0saxIR CXt7JOZWdFNGd5sl5PFqi4RGmc8pElGf4K2IFBlLIgeokHEsL+Uco6Sh408DfxRzzJgT zxPASe/ZmCdX/ROvOu9X2iat3/adtxVWOLT1s7t2o94UUswSD41LdctcQ5nSRVB0aFHf raGB2TPDRO75LrgEedAczOQV6u6vVrefmqfdvVhUGhJyKRpYBuDUDAQJTLnf/2/eiNgU 2WEhhFp1UmXFXe+45ED3m2/o84oo2Tyskj6OcxEq8DPjVKVPOMmr+VsqrtwtxADLgFrz H32w== X-Gm-Message-State: AOUpUlGjYUrh+RnT0N+8Jq20+qxtHyGPhbsaUH2Zulo/03LqnXeTMesi QqyDO0VrTef1+qnAfsna3sQdG1vHuPZbjSP0AmeGzQ== X-Google-Smtp-Source: AA+uWPzG30LBB0SpQqdnS8l1dHVO6U1QOZ5+HRuKwZU5rejfgSWsKO1DopKvQcQDnx0ZrHqlEMsKv92rWCD1k7mALZU= X-Received: by 2002:a24:b211:: with SMTP id u17-v6mr15827707ite.1.1534290612259; Tue, 14 Aug 2018 16:50:12 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 2002:a4f:381a:0:0:0:0:0 with HTTP; Tue, 14 Aug 2018 16:50:11 -0700 (PDT) X-Originating-IP: [2603:300b:6:5100:1052:acc7:f9de:2b6d] In-Reply-To: <20180814014226.GA50013@www.zefox.net> References: <20180809033735.GJ30738@phouka1.phouka.net> <20180809175802.GA32974@www.zefox.net> <20180812173248.GA81324@phouka1.phouka.net> <20180812224021.GA46372@www.zefox.net> <20180813021226.GA46750@www.zefox.net> <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com> <20180813185350.GA47132@www.zefox.net> <20180814014226.GA50013@www.zefox.net> From: Warner Losh Date: Tue, 14 Aug 2018 17:50:11 -0600 X-Google-Sender-Auth: 4D7h4a2-VM9cCQ_neBsW3JhqoKs Message-ID: Subject: Re: RPI3 swap experiments (grace under pressure) To: bob prohaska Cc: Mark Millard , freebsd-arm , Mark Johnston Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Aug 2018 23:50:13 -0000 On Mon, Aug 13, 2018 at 7:42 PM, bob prohaska wrote: > [Altered subject, philosophical question] > On Mon, Aug 13, 2018 at 01:05:38PM -0700, Mark Millard wrote: > > > > Here there is architecture choice and goals/primary > > contexts. FreeBSD is never likely to primarily target > > anything with a workload like buildworld buildkernel > > on hardware like rpi3's and rpi2 V1.1's and > > Pine64+ 2GB's and so on. > > > > I understand that the RPi isn't a primary platform for FreeBSD. > But, decent performance under overload seems like a universal > problem that's always worth solving, whether for a computer or > an office. The exact goals might vary, but coping with too much > to do and not enough to do it with is humanity's oldest puzzle. > > Maybe I should ask what the goals of the OOMA process serve. > I always thought an OS's goals were along the lines of: > 1. maintain control > 2. get the work done > 3. remain responsive > Simplistically, one can view the VM system as a producer of dirty pages, and a cleaner of dirty pages. These happen at different rates, but usually are closely matched. We're normally able to launder enough pages to satisfy the need for new pages from the VM system (since clean pages can just be thrown away w/o any loss of data). The problem happens when we put a large load onto the creation side with a build. This generates a lot of dirty pages, and we have to flush the writes of the dirty pages quickly to keep up. When the backing store has time-varying write rates that vary substantially, we run into problems. We're not able to clean enough pages to keep up with demand. The system does what it can to slow down demand, but at some point it just can't keep up and we trigger OOM. I'm still firmly convinced that a combination of bugs that's making the storage system less robust. The solution? Fix those bugs. Once you do that, however, you are still stuck with crappy hardware is crappy. Swapping to the ultra-low-end is still going to suck. USB and SD cards generally is geared to long stretches of sequential writes and random reads since they are expected to go into cameras, or used as sneaker net. We might be able to not overload the device so much via tweaks to either the swap-out code (to reduce its rate more quickly when the GC on the card goes wonkies). But that might also allow for some way to write bigger, contiguous blocks when swapping out (which would help avoid the Read Modify Write behavior on 'small' writes that grind performance of some USB/SD flash devices into the ground). That would help this workload (and likely others). This is tricky because you'd want to do that as part of a single write which has some tricky implications for the VM system. These can be dealt with, of course. And the code to page it out will need a scatter gather list do the DMA works right, so we have to be careful not to exceed those limits. There's some clustering in the page-out code, but the swapper looks like it could use some work... I've not studied closely though to start work. At Netflix we've seen some workloads that suggest some improvements there would be helpful for us, but I don't know if that's the same problem or a different, related one. So, philosophically, I agree that the system shouldn't suck. Making it robust against suckage for extreme events that don't match the historic usage of BSD, though, is going to take some work. Warner