From owner-freebsd-arm@freebsd.org Mon Dec 7 00:59:54 2020 Return-Path: Delivered-To: freebsd-arm@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EC7F9472764 for ; Mon, 7 Dec 2020 00:59:54 +0000 (UTC) (envelope-from marcel@brickporch.com) Received: from mail2.brickporch.com (mail2.brickporch.com [45.79.84.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cq4j625w2z3PJZ; Mon, 7 Dec 2020 00:59:53 +0000 (UTC) (envelope-from marcel@brickporch.com) Received: from twill.home.brickporch.com (69-84-3-66.mxu.aerioconnect.net [69.84.3.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail2.brickporch.com (Postfix) with ESMTPSA id 9E1111B353; Mon, 7 Dec 2020 00:59:47 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.20.0.2.21\)) Subject: Re: ThunderX Panic after r368370 From: Marcel Flores In-Reply-To: <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org> Date: Sun, 6 Dec 2020 16:59:45 -0800 Cc: Mark Millard , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <7DFA7D8E-45A6-48B8-BB74-CC2EE29AF73C@brickporch.com> References: <1C3442ED-278E-45B8-9206-0DD24FCBC237@brickporch.com> <4331eee0-74a6-565c-3bec-0051415b2bc1@freebsd.org> <56F0E9EB-0B78-4B0B-830A-48F8AFC5ABE1@yahoo.com> <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org> To: mmel@freebsd.org X-Mailer: Apple Mail (2.3654.20.0.2.21) X-Rspamd-Queue-Id: 4Cq4j625w2z3PJZ X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of marcel@brickporch.com designates 45.79.84.102 as permitted sender) smtp.mailfrom=marcel@brickporch.com X-Spamd-Result: default: False [-2.80 / 15.00]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FREEFALL_USER(0.00)[marcel]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MV_CASE(0.50)[]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+mx]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; DMARC_NA(0.00)[brickporch.com]; SPAMHAUS_ZRD(0.00)[45.79.84.102:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[45.79.84.102:from]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:63949, ipnet:45.79.64.0/19, country:US]; FREEMAIL_CC(0.00)[yahoo.com,freebsd.org]; MAILMAN_DEST(0.00)[freebsd-arm]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Dec 2020 00:59:55 -0000 > On Dec 6, 2020, at 3:51 AM, Michal Meloun = wrote: >=20 >=20 >=20 > On 06.12.2020 10:47, Mark Millard wrote: >> On 2020-Dec-6, at 00:17, Michal Meloun = wrote: >>> On 06.12.2020 3:21, Marcel Flores wrote: >>>> Hi All, >>>> Looks like the ThunderX started panicking at boot after r368370: >>>> https://reviews.freebsd.org/rS368370 >>>> =46rom a verbose boot, it looks like it bails in gic0 redistributor = setup(?): >>>> gic0: CPU29 Re-Distributor woke up >>>> gic0: CPU24 enabled CPU interface via system registers >>>> gic0: CPU17 enabled CPU interface via system registers >>>> gic0: CPU29 enabled CPU interface via system registers >>>> done >>>> Full Verbose boot: >>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb >>>> I'm not really familiar with the details of the commit, but happy = to test >>>> anything if anyone has any ideas. >>>=20 >>>=20 >>> Hi Marcel >>> are you able to get crashdump and do backtrace? >>> = https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#= kerneldebug-obtain >>> and >>> = https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.h= tml >>> If not, I'll make some debug patch. >>>=20 >>> It's weird, even though GIC is potentially affected by my patch, in = this case the cpuid numbering was not changed. >> (I've no access to a ThunderX. I just looked for my own curiosity. >> Sorry if this is obvious and so is noise.) >> When I looked at the code it appeared to be the last "->" in >> the following that was dereferencing the nullptr value (via [x8] >> in assembler notation): >> static uint64_t >> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc) >> { >> uint64_t target; >> uint8_t cmd_type; >> u_int size; >> cmd_type =3D desc->cmd_type; >> target =3D ITS_TARGET_NONE; >> switch (cmd_type) { >> case ITS_CMD_MOVI: /* Move interrupt ID to another = collection */ >> target =3D desc->cmd_desc_movi.col->col_target; >> . . . >> In other words: it appeared to me that the above = desc->cmd_desc_movi.col >> evaluated as 0 when used in what was reported. > This is very probably right analysis. But problem is that = cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi = from sc->sc_its_cols which should be allocated in gicv3_its_attach(). >=20 >=20 > Marcel, can you, please also try this debug patch? > = https://github.com/strejda/freebsd/commit/a25ed736644b05672e3e813891af213c= 280daac3 > Unfortunately, I have only single socket board with GIv3, Honeycomb, = but it still boots fine. >=20 > Thanks, Michal Debug patch output here (I also switched from GENERIC-NODEBUG to = GENERIC): https://gist.github.com/mesflores/27bd1cca45b04e5b938166c9f1f79a04 Having a little trouble getting the crashdump saved, but will update if = I can sort it out. -m