From owner-freebsd-stable@freebsd.org Fri Apr 6 15:01:54 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED6F0F8C2D0 for ; Fri, 6 Apr 2018 15:01:53 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 965E57F9F6 for ; Fri, 6 Apr 2018 15:01:51 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x230.google.com with SMTP id t192-v6so5292415itc.1 for ; Fri, 06 Apr 2018 08:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=GfZZy6f4ONCHbSs2Qu/jlgCAmgGx+8mdb0nvb50XMtI=; b=wGn1rFiGV+ldmxEJ8ZAkIwIpQXh9fASzjX8yuEDpiOKAUcBJpGEO49GUtMsE26wRCr rjJnEXKCf+oqCIqPGpegh83UlPtHeybMFtb6BbbKkoVSIDCui1/QD1SAqWbFkmiZcx8P S3sV8jO8xtYDaJPCNJ7eapC/YZHusrWShuzACekfZyLouIi441B0P6YB/bNwhMeOvhjf mRbLDrj+Ck3AJUc2B9j/7qcZ4gPrLkKko/SgMc70bHOlYvuLtBec6znbxzV49H5/kLgh bpIah+nbZeLdAwvYyo7uODIeOdMr6yH5kqlSWLfrtvbdzIbgFzgJAYp2heEuaLVlrYUA ti+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=GfZZy6f4ONCHbSs2Qu/jlgCAmgGx+8mdb0nvb50XMtI=; b=cbSLZXRNrba9iiTcb+tON5FWnqaeFzD7fVUSjjL3FURynwlgLZ4u1jwasmhpIMMBCE nxxJPlIgK0D7GELgKwOfx1vwUevFY7OdVFN8MG1A+JqpAGBsO+hTdtnnnPMZEgPzYnzo LzNC2ck8uZ92CACkEFY8m84RdsYD3SxshTo9KCb6X9StDtyljKeEIJQXuJx3SbY0Q3J+ XdY+Rw6Hl8+oRy38TO90xybs6K8UOa03sSdth0nifVyy2IMncFg+2XTkc6Ut/OmlXpFn ogI7Wy3I4ZOussYh42AORCrEfCQ5j4zCujJ9+NsnBVOeZ5Etj6n5EtP8IVJT5APPWNNV 8tFw== X-Gm-Message-State: ALQs6tA4SWgPgjiShAaZovi49xU3g02z/Bhe8fgVS4qaAm5Zzk50+aJo 8NVcOoY2fZQy9q4LhggPSnHtGiMl09iXoOk3FLH+hQ== X-Google-Smtp-Source: AIpwx4+IEn5I9MOqs2/6iykcH5J6VIZyPsF6rmoLLrGCza+S67W2JGGB2rLVqiiudJS9RD4bfoYl5F4ejG0ZwJjcs6Y= X-Received: by 2002:a24:b649:: with SMTP id d9-v6mr18992685itj.51.1523026910245; Fri, 06 Apr 2018 08:01:50 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.203.196 with HTTP; Fri, 6 Apr 2018 08:01:49 -0700 (PDT) X-Originating-IP: [2603:300b:6:5100:1052:acc7:f9de:2b6d] In-Reply-To: <8935E1F4-10DE-4621-8153-26AF851CC26E@sarenet.es> References: <92b92a3d-3262-c006-ed5a-dc2f9f4a5cb9@zhegan.in> <8935E1F4-10DE-4621-8153-26AF851CC26E@sarenet.es> From: Warner Losh Date: Fri, 6 Apr 2018 09:01:49 -0600 X-Google-Sender-Auth: zlDd8UyAu5DGx72KYqbxjIkpvSo Message-ID: Subject: Re: TRIM, iSCSI and %busy waves To: Borja Marcos Cc: "Eugene M. Zheganin" , FreeBSD-STABLE Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Apr 2018 15:01:54 -0000 On Fri, Apr 6, 2018 at 1:58 AM, Borja Marcos wrote: > > > On 5 Apr 2018, at 17:00, Warner Losh wrote: > > > > I'm working on trim shaping in -current right now. It's focused on NVMe= , > > but since I'm doing the bulk of it in cam_iosched.c, it will eventually > be > > available for ada and da. The notion is to measure how long the TRIMs > take, > > and only send them at 80% of that rate when there's other traffic in th= e > > queue (so if trims are taking 100ms, send them no faster than 8/s). Whi= le > > this will allow for better read/write traffic, it does slow the TRIMs > down > > which slows down whatever they may be blocking in the upper layers. Can= 't > > speak to ZFS much, but for UFS that's freeing of blocks so things like > new > > block allocation may be delayed if we're almost out of disk (which we > have > > no signal for, so there's no way for the lower layers to prioritize tri= ms > > or not). > > Have you considered "hard" shaping including discarding TRIMs when needed= ? > Remember that a TRIM is not a write, which is subject to a contract with > the application, > but a better-if-you-do-it operation. > Well, yes and no. TRIM is there to improve performance, in the long term, of the drives because they'd otherwise get too fragmented and/or have an unacceptably high write amplification. It's more than just a hint, but maybe, in some cases, less than a write. Better if you do it does give some leeway, how much depends on the application. If we were to implement a hard limit on the latency of TRIMs, it would have to be user configurable. There's also the strategy of returning some TRIMs right away, while letting only a percentage through to the device. If I go through with what you're calling hard shaping, I'd also look for ways to allow the upper layers to tell me to hurry up. We have it in the buffer daemon between all the users of bufs when there's a buf shortage, but no similar signal from UFS down to the device to tell it that the results are needed NOW vs needed eventually. And the urgency of the need varies somewhat over time. you could easily send down a boatload of TRIMs with no urgent need for blocks, time passes, and then you have an urgent need for blocks. So you can't add something to the bio going down that it's needed or not since you might not have another TRIM to send down. A new BIO type and/or a tweak to BIO_FLUSH might suffice and be well defined for drivers that don't do weird things. The notion of the upper layers being able to cancel a TRIM that's been queued up was also floated since TRIM + WRITE in quick succession often gives no different performance than just the bare WRITE. And I have no clue what ZFS does wrt TRIMs. So I've considered it, yes. But there's more tricky corners here to consider if it were to be implemented due to (a) the diversity of quality in the market place and (b) the diversity of workloads FreeBSD is used for. > Otherwise, as you say, you might be blocking other operations in the uppe= r > layers. > I am assuming here that with many devices doing TRIMs is better than not > doing them. > And in case of queue congestion doing *some* TRIMs should be better than > doing > no TRIMs at all. > > Yep, not the first time I propose something of the sort, but my queue of > suggestions > to eventually discard TRIMs doesn=E2=80=99s implement the same method ;) > I'm looking at all options, to be honest. I'm not sure what will work the best in the long term. I've observed that, at least with UFS, it's quite easy to survive for hours without finishing the trim with millions of TRIMs in the queue. All it affects are monitoring programs that freak out when you have this many items in the queue for so long (thinking something must be wrong). Of coarse, we control the monitoring programs, so that's easy to fix. (I discovered this when I was doing 1 IOP for TRIMs when running early, buggy versions of this, btw). The backup causes UFS to be waiting on the blocks, if there is a block shortage. Since most of the time there's not, this didn't cause problems when I tweaked a parameter and drained the TRIMs 8 hours after tons of files were deleted.... Warner