From owner-freebsd-stable@freebsd.org Mon Jul 24 17:45:08 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4F13CFFB52 for ; Mon, 24 Jul 2017 17:45:08 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 81932715D6; Mon, 24 Jul 2017 17:45:08 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xGTKY0pHnz1VG; Mon, 24 Jul 2017 19:45:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1500918301; x=1503510302; bh=i32 u89NFeepxzhxto9pFMZduR62mmfpm0gdmRGvB+IE=; b=nGS54/kKbde9LM86U89 qYnvo+aHAEqYfp2Q/ffLVEQSaj0KfFZjHjCDXbmU4nO4TjB0zWoalNJTxG7W7Umv 647NGvCxHEdH7l+ek4W6Y3IqJ6QmmfihJEoN2MNFJIoBYuTSOWvfT/hZSOUZUQC6 YNq2vR67qvE8KcudyIpYI6is= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id 3g2dP6qrC5Hf; Mon, 24 Jul 2017 19:45:01 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xGTKT5vQJz1V5; Mon, 24 Jul 2017 19:45:01 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xGTKT5cNCz18W; Mon, 24 Jul 2017 19:45:01 +0200 (CEST) Received: from neli.ijs.si (2001:1470:ff80:88:21c:c0ff:feb1:8c91) by nabiralnik.ijs.si with HTTP (HTTP/1.1 POST); Mon, 24 Jul 2017 19:45:01 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Mon, 24 Jul 2017 19:45:01 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Cc: re@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute In-Reply-To: References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> <81295bcacd7c44813de8d346c88cbb65@ijs.si> <20170724021504.GA97170@raichu> <10649c9070bc419d93ae2a87a511d2ba@ijs.si> Message-ID: <42cc3fffe99f5b7d5deb7d7bf8d071cd@ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2017 17:45:08 -0000 2017-07-24 18:25, Ken Merry wrote: > It is possible that the change I MFCed today (r321207 in head, r321415 > in stable/11) is related, but Mark will have to boot his machine with > the fix to see if it makes any difference. > > What happened in my case on one particular machine (not on most > machines in our lab running the same code) was that mps_wait_command() > / mpr_wait_command() would not wait the full 60 seconds for a write to > the DPM table (Driver Persistent Mapping) table in the controller. > So, it reported that there was a timeout. > [...] > Eliminating bogus timeouts will eliminate most all of the sources of > those panics anyway. Took r321415 from stable/11 and applied it to 11.1-RC3 - and it makes no difference to booting: still hangs attempting to attach da0, with a spinning CPU (according to fan speed). Booting in safe mode, or with EARLY_AP_STARTUP disabled avoids the problem. > There is a secondary bug that is still in the mps(4) / mpr(4) drivers > when a timeout does happen — the error recovery code in the > wait_command() routine reinitializes the controller, which clears out > all the commands. When the wait_command() routine returns, the > command passed in has been freed, but the caller doesn’t know that. > So the caller (it happens in a number of places) dereferences a > pointer to freed memory and the kernel panics. > > I’m planning to fix that bug, too, if slm@ doesn’t get to it first, > I’ve just had other bugs to fix first. No panics in my case, just hangs. Mark