From owner-freebsd-net@freebsd.org Sun Jul 21 20:32:10 2019 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 590E2BCE6B; Sun, 21 Jul 2019 20:32:10 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3697B8F375; Sun, 21 Jul 2019 20:32:09 +0000 (UTC) (envelope-from pkelsey@gmail.com) Received: by mail-ot1-x342.google.com with SMTP id r6so38070901oti.3; Sun, 21 Jul 2019 13:32:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=W9hWa6Gs3qPv8LlQ/edRF3vs/lKoo2ziuqRzAxgvWPY=; b=mT6qNwCBQ9adJRGgHfq1WJHbG/VbthXsIaU/nN6A1yp6VQBo9GaOS6UAVJ6+IyVgAx p5pu9tGG8+rcLBqM1Z7QkW11AaxEgV4glTg9NkhzCtoBvs60CKoSNfeA0BQSVFuJiP80 PHFeW02hC+uT1rxy1m/3nhn2y6Qprcs1OQOv2rT63SV06xuKIEsoJSjoBrjjPF2FLZ6J zyRt8nkw9HG/V7Ww1f4AQdtT5x9L4X8Cuj38UAyAOTjTrWJGbXe6nWjB0NQjCRMTOGOQ nOUhKg6o94GwD5iLguuYFgEQDaPM9IheUS1oTlOpXwBxNENMVNw635OxJx2pquocbNd3 ZCEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:mime-version:subject:from:in-reply-to :date:cc:content-transfer-encoding:message-id:references:to; bh=W9hWa6Gs3qPv8LlQ/edRF3vs/lKoo2ziuqRzAxgvWPY=; b=IxBuFHfVhn3SVRX9/GXaNlOA6c2rbKg+hrbmg+ZIZ1JW/giPfNcRkPmL+Z73GRnUcy qJ/eYnGNOPWhqgR7TPLB5cxlYdNgqroSiCVm2HOY+kw3fj+4uKdUVcMq43pZ339g3Moq RjGUv49gxeuTBPsp4T8QRKNHmHbRy9Nha74MSe3tr04V9UoRxNLEVgvHsp4IeYd8xGFV sJfWoR4uTVw7NlDR8kFEw1M+fsSxw2n8eFKqdOFoaL90NoY7JW6FFAF7ZDa7ylxgDGRm dSt3JsGezlJ+37Y/o/Ngt/ks4ZXqyuQrRX8QhXOOJxzpFZm9bOmeisy/GsijcJkzeoDV U0VQ== X-Gm-Message-State: APjAAAVs64YCHehpH9ZO/xO9gCyPfHtOfFwyqu6e4lE06k5tfxjWVD7t XMv6K5Q86qZWApKxPYubcqB0zBssN10= X-Google-Smtp-Source: APXvYqwknql3f3/wE/ydNOtKriHtw9Rkpt/f8AQnW/L5KceaTyvAJwEWyc6gMcn3tOjuS/ZqlSMShA== X-Received: by 2002:a05:6830:200e:: with SMTP id e14mr5858243otp.245.1563741128277; Sun, 21 Jul 2019 13:32:08 -0700 (PDT) Received: from [10.108.108.123] (mobile-107-92-57-163.mycingular.net. [107.92.57.163]) by smtp.gmail.com with ESMTPSA id 103sm13151282otu.33.2019.07.21.13.32.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 21 Jul 2019 13:32:07 -0700 (PDT) Sender: Patrick Kelsey Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: vmx0: watchdog timeout on queue 2, no interrupts on BSP From: Patrick Kelsey X-Mailer: iPhone Mail (16F203) In-Reply-To: Date: Sun, 21 Jul 2019 16:32:04 -0400 Cc: freebsd-net@freebsd.org, FreeBSD Current Content-Transfer-Encoding: quoted-printable Message-Id: References: <9c509f7b-8294-d2fe-ea3e-f10fd51f5736@FreeBSD.org> To: Andriy Gapon X-Rspamd-Queue-Id: 3697B8F375 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=mT6qNwCB; spf=pass (mx1.freebsd.org: domain of pkelsey@gmail.com designates 2607:f8b0:4864:20::342 as permitted sender) smtp.mailfrom=pkelsey@gmail.com X-Spamd-Result: default: False [-3.33 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MV_CASE(0.50)[]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; DKIM_TRACE(0.00)[gmail.com:+]; NEURAL_HAM_SHORT(-0.91)[-0.913,0]; FORGED_SENDER(0.30)[pkelsey@freebsd.org,pkelsey@gmail.com]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[pkelsey@freebsd.org,pkelsey@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.99)[-0.990,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; IP_SCORE(-0.72)[ip: (2.01), ipnet: 2607:f8b0::/32(-3.11), asn: 15169(-2.42), country: US(-0.05)]; RCVD_IN_DNSWL_NONE(0.00)[2.4.3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Jul 2019 20:32:10 -0000 > On Jul 21, 2019, at 4:17 PM, Andriy Gapon wrote: >=20 >> On 20/07/2019 20:08, Patrick Kelsey wrote: >>=20 >>=20 >> On Fri, Jul 19, 2019 at 10:07 AM Andriy Gapon > > wrote: >>=20 >>=20 >> Recently we experienced a strange problem. >> We noticed a lot of these messages in the logs: >> vmx0: watchdog timeout on queue 2 >> (always queue 2) >> Also, we noticed that connections to some end points did not work at a= ll >> while others worked without problems. I assume that that was because >> specific flows got assigned to that queue 2. >>=20 >> Further investigation has shown that none of interrupts assigned to th= e >> BSP has ever fired (since boot, of course). That included vmx0:rx2 an= d >> vmx0:tx2. But also interrupts for other drivers as well. >>=20 >> Trying to get more information I rebooted the system and the problem >> disappeared. >>=20 >> Has anyone seen anything like that? >> Any thoughts on possible causes? >> Any suggestions what to check if/when the problem reoccurs? >>=20 >> Thanks! >>=20 >>=20 >> If you are running head at or after r347221 or stable/12 at or after >> r349112, then this could be due to >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D239118 (see Comment 4= >> - short story is that an iflib change has broken the vmx driver). >=20 > I am not sure if that bug could lead to all interrupts on the core > getting disabled (for all drivers), and right at the boot time. I am not sure either, but it=E2=80=99s the kind of bug that breaks the desig= n of the vmx driver in such a way that its state can get corrupted to the po= int where the kernel can panic. I haven=E2=80=99t fully analyzed the potent= ial scope of memory corruption / hardware state corruption that can occur (b= ecause the fix for the issue is already apparent), so I am freely considerin= g it to include elements beyond the device and driver itself. If you are saying that zero vmx queue interrupts have occurred anywhere in t= he system, then I would rule out any connection to this as a prerequisite fo= r the corruption to occur is having at least one such interrupt. -Patrick=