From nobody Fri Dec 8 01:33:02 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SmYYn0tZvz530gh for ; Fri, 8 Dec 2023 01:33:17 +0000 (UTC) (envelope-from bakul@iitbombay.org) Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SmYYm6MQLz4Mpj for ; Fri, 8 Dec 2023 01:33:16 +0000 (UTC) (envelope-from bakul@iitbombay.org) Authentication-Results: mx1.freebsd.org; none Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-5c690c3d113so1318522a12.1 for ; Thu, 07 Dec 2023 17:33:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iitbombay-org.20230601.gappssmtp.com; s=20230601; t=1701999195; x=1702603995; darn=freebsd.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=BNZcCFcPFI4jh30wbIZm24FjVocuSoXlnqXsnlmde4w=; b=SFtwObuR3VJ7J6VFjD/xDGMriIg3y2San+F9H4SD+rQha6EZjQTzuiZjJiqKtNf232 3QIMXIRTxwV8xVH4V9m6QgSCuQpyzAsyf7wOW6STolnXuFPbMBziGLjDSLcvhLA1yCog p+XJEsv4tJPwuDcEaUdUkig9NCPCW7FkU38Q5vGDKL+l4P7IV5Pbk8EbMk04BXBsWgn9 S8v1FmxB8VKJcW9i4YA/OTfHupyce/jn2eucsHbzCQsSEuF0DMqy2mPo9bEPW05ooISO 6yOxeKfWq6HflwweEJwOp81SR90JPuAeOXFWHtPilWaZkQp/PqaFbXy6Svo90dtxlEVa 4uIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701999195; x=1702603995; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BNZcCFcPFI4jh30wbIZm24FjVocuSoXlnqXsnlmde4w=; b=IOU5eiT5bV5Lnf1QYfuQQiU+zls18uVo9DhgARnw1Rp4s9hL/aol73NvR4uFBp9VRM BwEirgyeNUtvxpFTbmCQQlXmpC50TkmtLV/QaYRbWWi/h0KbX3dMvDLlKX2Cj2aHvJkt vf66WJVx+ysuREG/pj5nP39Mopcgabx2oWHg6CqsCGkbqxjEuHzVABN9/wBCbK82d2LR dz1sEAbjrwnoe9zuAF0vsTIoYhwDn+dZa1Cv+RArFfvrN8MfZhTe3O0lDsIpH+y+vHHc tExHRqnsOt0luDDoJ/XTE0njtQPkqBIbltYRybIGd7+qKfxE5lcFhazYeORV1oQiL6FN QZnw== X-Gm-Message-State: AOJu0YwC1lawG8k0C4ONcJsrhvX4V1ByOlyLINqvqeEqB/bB7hEXgYpI UwiPeeK4INkVc7Kagb/I51aCEw== X-Google-Smtp-Source: AGHT+IFT6P08OXkrmk0rByXDa+0q6FOKCn+1gjF1v/BLUsMkxtzlIY8C6cNSOKen5sxhMFnPCSRAGw== X-Received: by 2002:a17:90a:df82:b0:286:6cc0:cace with SMTP id p2-20020a17090adf8200b002866cc0cacemr3650921pjv.69.1701999194941; Thu, 07 Dec 2023 17:33:14 -0800 (PST) Received: from smtpclient.apple (107-215-223-229.lightspeed.sntcca.sbcglobal.net. [107.215.223.229]) by smtp.gmail.com with ESMTPSA id d5-20020a17090ac24500b0028559a67729sm2072203pjx.42.2023.12.07.17.33.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Dec 2023 17:33:14 -0800 (PST) From: Bakul Shah Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_15189078-E915-4290-888A-F62380A8027C" List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.200.91.1.1\)) Subject: Re: nvme timeout issues with hardware and bhyve vm's Date: Thu, 7 Dec 2023 17:33:02 -0800 In-Reply-To: Cc: Warner Losh , Tomoaki AOKI , FreeBSD Current To: Maxim Sobolev References: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> <0ad493d5-1c1e-4370-977a-118f46ebd677@nomadlogic.org> <0c4f8149-89dd-4635-a5ed-4766fffd2553@nomadlogic.org> <20231208080929.cfd9fca421fea81d89d2380b@dec.sakura.ne.jp> <10FD2FC6-1F39-4F7D-8BA8-976ADC0AE37A@iitbombay.org> X-Mailer: Apple Mail (2.3774.200.91.1.1) X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4SmYYm6MQLz4Mpj --Apple-Mail=_15189078-E915-4290-888A-F62380A8027C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Thanks. It may be worth checking the temp periodically and warning the user in = case it is too high (70=C2=BAC+ or something). Even for devices that = allow internal throttling, a user might wish to know whether the device = neads a (better) heatsink. > On Dec 7, 2023, at 5:02=E2=80=AFPM, Maxim Sobolev = wrote: >=20 > How quickly it heats up depends on lots of factors. Usually those = devices burn some 3-7 watts per stick at 100% load, so maybe this would = give you some idea. At least some of them support several toggleable = performance modes, which use throttling internally to limit power = consumption to a certain level (man nvmecontril). It helped me recently = to make a system stable, which otherwise would hang with timeout after = reaching 70-75C until I got the chance to take it apart and attach a = heatsinks to the nvmes. Once the temperature dropped to <=3D 50C the = drives become 100% stable. >=20 > -Max >=20 > On Thu, Dec 7, 2023, 4:07=E2=80=AFPM Bakul Shah > wrote: >> On Dec 7, 2023, at 3:59=E2=80=AFPM, Warner Losh > wrote: >> >=20 >> >=20 >> > *Overheating caused hang of NVMe controller or PCI bridge on SSD, = or >> >=20 >> > Yes. Most drive's firmware when it overheats resets. There might be = something >> > that the pci code can do when this happens to retrain the link, = reprogram the >> > config registers, etc. >>=20 >> How quickly can the device heat up? Can it be queried frequently >> enough act before it overheats by throttling io? >>=20 >>=20 >>=20 >>=20 --Apple-Mail=_15189078-E915-4290-888A-F62380A8027C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Thanks.

It may be worth checking = the temp periodically and warning the user in case it is too high = (70=C2=BAC+ or something). Even for devices that allow internal = throttling, a user might wish to know whether the device neads a = (better) heatsink.

On = Dec 7, 2023, at 5:02=E2=80=AFPM, Maxim Sobolev = <sobomax@freebsd.org> wrote:

How quickly = it heats up depends on lots of factors. Usually those devices burn some = 3-7 watts per stick at 100% load, so maybe this would give you some = idea. At least some of them support several toggleable performance = modes, which use throttling internally to limit power consumption to a = certain level (man nvmecontril). It helped me recently to make a system = stable, which otherwise would hang with timeout after reaching 70-75C = until I got the chance to take it apart and attach a heatsinks to the = nvmes. Once the temperature dropped to <=3D 50C the drives become = 100% stable.

-Max

On Thu, Dec 7, 2023, 4:07=E2=80=AFPM = Bakul Shah <bakul@iitbombay.org> = wrote:
On Dec 7, 2023, at = 3:59=E2=80=AFPM, Warner Losh <imp@bsdimp.com> wrote:
>
>
>  *Overheating caused hang of NVMe controller or PCI bridge on = SSD, or
>
> Yes. Most drive's firmware when it overheats resets. There might be = something
> that the pci code can do when this happens to retrain the link, = reprogram the
> config registers, etc.

How quickly can the device heat up? Can it be queried frequently
enough act before it overheats by throttling io?





= --Apple-Mail=_15189078-E915-4290-888A-F62380A8027C--