From owner-freebsd-stable@freebsd.org Fri Apr 5 15:08:19 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 23B7C1579BFD for ; Fri, 5 Apr 2019 15:08:19 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 80518959AD for ; Fri, 5 Apr 2019 15:08:18 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x832.google.com with SMTP id z17so7688788qts.13 for ; Fri, 05 Apr 2019 08:08:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Nh1US6CI4loxkffk3/0LSAd5VaSvnmKkj7E0tL/8sIQ=; b=Zxz7PlSpN/XIYmYywLQaQpo1dINnN/ey4/RDa26V2ajMK4Gtf8gZ8KCLnPy0vC2a7T ylF5QdsCz0C2sQMuhTPcuQUhO43gBvpt0Iu5zT8ny51fHMHt/A2wPB+TgGYss5ioffAk 5oszDcBekkdnlckuyLlmBoUdLoGLJNkW01wvzT0vPNKy7Xemabvyp7h5moBf837UhjVP d4jh6+ZgDPoQtwXOdh7aBNDlibpDyFhQjYQ3c0QCtDRRRP9gXm/R7jAEcGeQgecxYIaI NVykhXpvklt8mGCLVDAM0r5l545Df6sh7woy1dudz+vLFlSKcv3Eh+osykcSq6G67SMp GgHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Nh1US6CI4loxkffk3/0LSAd5VaSvnmKkj7E0tL/8sIQ=; b=JpMgFTvWIaozWCaptsATp23vGZujtNfWNOHwCwcsiQDFBlm2ZmFM9SCjl31hsqCscp vJ9eaHudH+/4r3o2MykWiLxEbQrCo0aTyEYvehIcgA664/fZ8h8ZaA5sFnMioegUcPAl 7LMD2f0Trs6dPj+uWO3TaZXEDt03NXdn2kdftFcOnYwixLl8H8pCfnRl66oLBJcT8t51 pPl4COWURVy94CXbnWbfQ9OmN/dMYqUrDmQ/I2WmUXgkKVGT1ztg4UBrp/UeLCQ0h/d8 2mz7MIBrZKvfhz2LZdLoltsc7Wg4BzMcGWkuX3tNe7nbpMrTpAwRX+NTL6ocEpHLYL2h KxpQ== X-Gm-Message-State: APjAAAXCgr1A+tbAeFtS1KTlb9i6DJQX1r9A12UlD8W7QYIL/nnxyupn 5TZ7tYVfDu2TQmlCf+dUMmx5ivZK0ENpe4cXLhhxSj1m X-Google-Smtp-Source: APXvYqy2jV8IyxR3rVvA/Ctb2VFh4ow9CqChDQ7J0RXYPU6zpSdp6qVv+IOk05XKQiIe9mM2s0yYoWeGcpmAcutuVQc= X-Received: by 2002:a0c:d4a2:: with SMTP id u31mr10941436qvh.139.1554476897726; Fri, 05 Apr 2019 08:08:17 -0700 (PDT) MIME-Version: 1.0 References: <818CF16A-D71C-47C0-8A1B-35C9D8F68F4E@punkt.de> <58E4FC01-D154-42D4-BA0F-EF9A2C60DBF7@punkt.de> In-Reply-To: <58E4FC01-D154-42D4-BA0F-EF9A2C60DBF7@punkt.de> From: Warner Losh Date: Fri, 5 Apr 2019 09:08:06 -0600 Message-ID: Subject: Re: NVME aborting outstanding i/o To: "Patrick M. Hausen" Cc: FreeBSD-STABLE Mailing List X-Rspamd-Queue-Id: 80518959AD X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=Zxz7PlSp X-Spamd-Result: default: False [-5.72 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_SHORT(-0.85)[-0.852,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2.3.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; MX_GOOD(-0.01)[cached: ALT1.aspmx.l.google.com]; R_SPF_NA(0.00)[]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; IP_SCORE(-2.86)[ip: (-9.14), ipnet: 2607:f8b0::/32(-2.92), asn: 15169(-2.17), country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2019 15:08:19 -0000 On Fri, Apr 5, 2019 at 1:33 AM Patrick M. Hausen wrote: > Hi all, > > > Am 04.04.2019 um 17:11 schrieb Warner Losh : > > There's a request that was sent down to the drive. It took longer than > 30s to respond. One of them, at least, was a trim request. > > [=E2=80=A6] > > Thanks for the explanation. > > This further explains why I was seeing a lot more of those and the system > occasionally froze for a couple of seconds after I increased these: > > vfs.zfs.vdev.async_write_max_active: 10 > vfs.zfs.vdev.async_read_max_active: 3 > vfs.zfs.vdev.sync_write_max_active: 10 > vfs.zfs.vdev.sync_read_max_active: 10 > > as recommended by Allan Jude reasoning that NVME devices could work on > up to 64 requests in parallel. I have since reverted that change and I am > running with the defaults. > > If I understand correctly, this: > > > hw.nvme.per_cpu_io_queues=3D0 > > essentially limits the rate at which the system throws commands at the > devices. Correct? > Yes. It de-facto limits the number of commands the system can throw at a nvme drive. Some drives have trouble with multiple CPUs submitting things. Others just have trouble with the volume of commands sometimes. This limits both. > So it=E2=80=99s not a real fix and there=E2=80=99s nothing fundamentally = wrong with the > per CPU queue or > interrupt implementation. I will look into new firmware for my Intel > devices and > try tweaking the vfs.zfs.vdev.trim_max_active and related parameters. > Correct. It's a hack-a-around. > Out of curiosity: what happens if I disable TRIM? My knowledge is rather > superficial > and I just filed that under =E2=80=9ETRIM is absolutely essential lest pe= rformance > will > suffer severely and your devices die - plus bad karma, of course =E2=80= =A6=E2=80=9C ;-) > TRIMs help the drive optimize their garbage collection by giving it a larger pool of free blocks to work with. This has the effect of reducing write amplification. Write amp is the measure of the amount of extra work the drive has to do for every user write it processes. Ideally, you want this number to be 1.0. You'll never get to 1.0, but numbers less than 1.5 are common and most of the models drive makers use to rate the lifetime of their NAND assume a write amp of about 2. So, if you eliminate the TRIMs you eliminate this optimization and write amp will increase. This has two bad effects. First, wear and tear on the NAND. Second, it takes resources away from the user. In practice, however, the bad effects are quite limited if you don't have a write intensive workload. Your drive is rated for so many drive writes per day (or equivalently total data written over the life of the drive). This will be on the spec sheet somewhere. If you don't have a write intensive workload (which I'd say is any sustained write load greater than about 1/10th the datasheet write limit), then if you think TRIMs are causing issues, you should disable them. The effects of not trimming are likely to be in the noise on such systems, and the benefits of having things TRIMed will be less. At work, for a large video streaming company, we enable the TRIMs, even though we're on the edge of the rule of thumb since we're unsure how long the machines really need to be in the field and don't want to risk it. Except for the version of Samsung nvme drives (PM963, no longer made) we got a while ago... those we turn TRIM off on because UFS' machine-gunning down of TRIMs and nvd's blind pass-through of TRIMs took down the drive. UFS now combines TRIMs and we've moved to using nda since it also combines TRIMs and it won't be so bad if we tried again today. Drive makers optimize different things. Enterprise drives handle TRIMs a lot better than consumer drives. consumer drives are cheaper (in oh so many ways), so some care is needed. Intel makes a wide range of drives, from the super duper awesome (with prices to match) to the somewhat disappointing (but incredibly cheap and good enough for a lot of applicaitons). Not sure where on this scale your drives fall on this spectrum. tl;dr: Unless you are writing the snot out of those Intel drives, disabling TRIM entirely will likely help avoid pushing so many commands they timeout. Warner