From owner-freebsd-stable@freebsd.org Thu Apr 4 15:11:40 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A6621573B4F for ; Thu, 4 Apr 2019 15:11:40 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1D1CA75FDE for ; Thu, 4 Apr 2019 15:11:39 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qt1-x82f.google.com with SMTP id w30so3619375qta.8 for ; Thu, 04 Apr 2019 08:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8yALtkS5XQvx6CIAJD7dEtflwQIWZoLMOp81ZZh1C0A=; b=IWhj57HtFPpcgkXIoTyjTg6vRVGc4554HxamrjtkWYp4J1u30rwG3mzb+U7u8MurOE aFDx8SJM56hrOfJcgPU32uEDdPbPwwtKlIbAL0EKryGVcqSduxxzCL7GOQ7NLaTgR6ng bOdkfqGW5D6kyz444T5iZjw57keQadGIpRJISSivQKELJzV6IMA0XGvSBtjhCKfnRZAC 3CGgtrmySlJTqIg2sVRM4yKEDE0v4XmnRiXiBMgMCxUAwB7HwIPnb++Yj0Rv1xxbWY/u jPu2wcybuCcPfalrkKTqYM7si6ObzT2DfqSsRp3GZFYC0/kOuQtgx1vl/qK12kC9zaEn RtwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8yALtkS5XQvx6CIAJD7dEtflwQIWZoLMOp81ZZh1C0A=; b=popxk6yVXd9fAjgxTcsMxeWdGwGQJxDdYuWrsFJkueYD+pe6zzy8sldyeahx/sa8XR CYb5g8vD+M/jp4rgklzjAMkUzsKAx6cIQ0+GXZXDc68PKOKnKUTECHULFgJpRIW/fxMO QEKHnMMomqF1xQv7ZeoagTKnz8VXYLArOeCVVD0oY3UXdQUawaox0Rkkzo4Pv9VSoIn7 VLCKvkCM/pFpdE220HEuCIrbXKW+O2e9O+19McOQYEbhCJ6PO9h9RpMswQ4UD+88mhP6 Pg4tO1rB081iB0mLO+hZ6Qc5wMmsHCb0X+wpHOdyt38+YOQM1uQO+IXsJek0tLkgLHiu Yk7g== X-Gm-Message-State: APjAAAUqMBZ3p9T7AwAXvBe89J+0fXkgdE/vNHyytNH4PBcJ6nisu8iR wTwVbcPax4rTWE9B/Y/MGwgcVKJhZQhpUKf+4963Ex/Y X-Google-Smtp-Source: APXvYqxMV+F0X/SMNprqK5thuUy73umHugHhXZhU0+CrcnltVUbuI3M7NMpBbV78WMp1npLrVVc8TH2k7fxy07EyNyc= X-Received: by 2002:a0c:ecca:: with SMTP id o10mr5151348qvq.197.1554390698357; Thu, 04 Apr 2019 08:11:38 -0700 (PDT) MIME-Version: 1.0 References: <818CF16A-D71C-47C0-8A1B-35C9D8F68F4E@punkt.de> In-Reply-To: From: Warner Losh Date: Thu, 4 Apr 2019 09:11:27 -0600 Message-ID: Subject: Re: NVME aborting outstanding i/o To: "Patrick M. Hausen" Cc: FreeBSD-STABLE Mailing List X-Rspamd-Queue-Id: 1D1CA75FDE X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=IWhj57Ht X-Spamd-Result: default: False [-5.80 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_SHORT(-0.99)[-0.988,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[f.2.8.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; MX_GOOD(-0.01)[cached: ALT1.aspmx.l.google.com]; R_SPF_NA(0.00)[]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; IP_SCORE(-2.80)[ip: (-8.86), ipnet: 2607:f8b0::/32(-2.91), asn: 15169(-2.16), country: US(-0.06)]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Apr 2019 15:11:40 -0000 On Thu, Apr 4, 2019 at 2:39 AM Patrick M. Hausen wrote: > Hi all, > > I=E2=80=99m currently doing some load tests/burn in for two new servers. > These feature all NVME SSDs and run FreeNAS, i.e. FreeBSD 11.2-STABLE. > > pcib17: at device 3.2 numa-domain 1 on pci1= 5 > pcib17: [GIANT-LOCKED] > pci17: numa-domain 1 on pcib17 > nvme7: mem 0xeca10000-0xeca13fff at device > 0.0 numa-domain 1 on pci17 > > When putting some moderate i/o load on the system, the log fills with the= se > messages: > > nvme7: aborting outstanding i/o > nvme7: DATASET MANAGEMENT sqid:41 cid:91 nsid:1 > nvme7: ABORTED - BY REQUEST (00/07) sqid:41 cid:91 cdw0:0 > OK. So unless you are suspending and resuming, or the drive is somehow failing, here's what's going on: There's a request that was sent down to the drive. It took longer than 30s to respond. One of them, at least, was a trim request. There's a number of reasons for this. NAND sucks. It's a horrible steaming pile of... silicon. To make it useful, there's a layer of software called the FTL (flash translation layer). NAND is an append-only medium at the lowest level, so the FTL has to take requests and build a map of logical blocks to physical blocks, as well as manage the 'log structured device' in some way. The details of why are too long to get into here (see my BSDCan talk from a few years ago). But what is relevant is that many drives have really crappy FTLs, especially when it comes to TRIMs. They can't handle a lot of them, and when you send a lot down, like FreeBSD will often do with UFS or ZFS, you can trigger the driving doing a bunch of garbage collection. This can cause the drive to delay > 30s before responding to commands. So sometimes you can avoid this by disabling trims. Other times, you have crappy FTL that crashes. This can cause a long timeout because FreeBSD has done something that, while in spec, is unexpected or not well tested. Here you can really only have FreeBSD do less work at once to avoid this issue, or you can upgrade the firmware. There has been some discussion of this on on the iX Systems forum as well > as various > FreeBSD media and one person suggested setting: > > hw.nvme.per_cpu_io_queues=3D0 > > > This is where I need some help now. This is from the manpage for nvme(4): > > ---------- > To force a single I/O queue pair shared by all CPUs, set the followin= g > tunable value in loader.conf(5): > > hw.nvme.per_cpu_io_queues=3D0 > > To assign more than one CPU per I/O queue pair, thereby reducing the > number of MSI-X vectors consumed by the device, set the following > tunable > value in loader.conf(5): > > hw.nvme.min_cpus_per_ioq=3DX > > To force legacy interrupts for all nvme driver instances, set the > following tunable value in loader.conf(5): > > hw.nvme.force_intx=3D1 > > Note that use of INTx implies disabling of per-CPU I/O queue pairs. > ---------- > > But: > > root@freenas01[~]# sysctl hw.nvme.per_cpu_io_queues > sysctl: unknown oid 'hw.nvme.per_cpu_io_queues' > root@freenas01[~]# sysctl hw.nvme.min_cpus_per_ioq > sysctl: unknown oid 'hw.nvme.min_cpus_per_ioq' > root@freenas01[~]# sysctl hw.nvme.force_intx > sysctl: unknown oid 'hw.nvme.force_intx' > > > Where do I go from here? > Did you add it to /boot/loader.conf? There's no sysctl for this. Wanrer > > Thanks! > Patrick > -- > punkt.de GmbH Internet - Dienstleistungen - Beratung > Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100 > 76133 Karlsruhe info@punkt.de http://punkt.de > AG Mannheim 108285 Gf: Juergen Egeling > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >