From owner-freebsd-stable@freebsd.org Wed Jul 19 23:46:40 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C45E2D9A0FE for ; Wed, 19 Jul 2017 23:46:40 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66EA971710 for ; Wed, 19 Jul 2017 23:46:40 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xCYb137Fcz1Vx for ; Thu, 20 Jul 2017 01:46:37 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1500507993; x=1503099994; bh=fIh 5qigYWYLRxq1TZDkeJYWt9HaQNCUVDTlPaNWM3TU=; b=IvULuadbQS2ROwwzjNJ F7ZtT4vpODcjqdslaY3gw3V6nVB57AQVIYz9HYVs/xcYj8nKBUZNKD9oCoRmF1m7 XZuRCeolElPfWfiiT2D6YaSQBZLZ4blNU0yfcBG8YeT3ObCh/FZgdDjpUsS2n4V0 O6WkuiCjzC8y5AS0ymtrdpbM= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id A4OLfgqqBoyK for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xCYZx5n2Cz1Vw for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xCYZx5YHjz6s for ; Thu, 20 Jul 2017 01:46:33 +0200 (CEST) Received: from www-proxy.ijs.si (2001:1470:ff80::3128:1) by webmail.ijs.si with HTTP (HTTP/1.1 POST); Thu, 20 Jul 2017 01:46:33 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 20 Jul 2017 01:46:33 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute In-Reply-To: References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> Message-ID: <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2017 23:46:40 -0000 More news on the matter. As reported yesterday the locally built kernel with options INVARIANTS and DDB works fine and somehow avoids the trouble at attaching the da (mps) disks on an LSI controller, so today I wanted to get back to a reproducible hang - and sure enough, reverting to the generic kernel as distributed brings back the hang. So I tried rebuilding the kernel while experimenting with options like DDB and INVARIANTS. A locally built GENERIC kernel behaves the same as the original kernel from the distribution (as installed by freebsd-upgrade), so no surprises there. It hangs trying to attach the first of the da disks (after first successfully attaching all the ada disks). The alt ctrl esc is unable to enter debugger when the hang occurs (possibly due to an unresponsive USB keyboard at that time), even though the debug.kdb.break_to_debugger was set to 1 at a loader prompt. It needs loader "Safe mode" to be able to boot. Next, a locally built kernel with DDB and INVARIANTS works well (the remaining options come from an included GENERIC). Now the funny part: a locally built kernel with just the DDB option (and the rest included from GENERIC) *also* works well. Somehow the DDB option makes a difference, even though kernel debugger is never activated. To re-assert: at the time of a hang the CPU fan starts revving up, and the USB keyboard is unresponsive ( does not enter scroll mode, caps lock and num lock do not toggle their LED indicators, alt ctrl esc do not activate kernel debugger. Loader "Safe mode" avoids the problem (presumably by disabling SMP). Meanwhile I have successfully upgraded two other similar hosts from 11.0 to 11.1-RC3, no surprises there (but they do not have the same disk controller). Not sure what to try next. Mark 2017-07-19 01:18, Mark Martinec wrote: > 2017-07-18 01:24, Mark Johnston wrote: >> Are you able to break into the debugger at this point? Try setting >> debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at >> the loader prompt, and hit the break key, or the key sequence >> ~ ctrl-b once the hang occurs. At the debugger prompt, try >> "bt" and "show allpcpu" to start. > > Thank you for a prompt and good suggestion! I spent an afternoon > fiddling with the machine, with mixed results. Your suggestion to > break into debugger did not work, there was no reaction to > or to ~ ctrl-b. > > So I embarked on rebuilding the RC3 kernel with > options KDB > options DDB > options BREAK_TO_DEBUGGER > options ALT_BREAK_TO_DEBUGGER > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options WITNESS_SKIPSPIN > but then I realized the key is mapped-to by: alt ctrl , > which now does break into debugger - but not so early where the > holdup occurs. > > The WITNESS produced some LOR warnings, but that is probably ok. > I came across a trace just before the problem area, but it flows > by so fast on a vt console and only the last 40 or so lines > remain on the screen (I have a photo), which do not look like > revealing much. Unfortunately this machine does not have a serial > interface. > > So in my last attempt I rebuilt a kernel with INVARIANTS but > without WITNESS - and now I cannot reproduce the problem, with > or without a "safe mode". What is interesting here that now > the da0..da3 disks are attached first, and only then the ada > disks - and even within the group of disks on the same > controller their order has been shuffled - no idea what could > have caused it - and it may have avoided the problem by doing so. > > Will play some more with this tomorrow... > > Mark > > >> On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote: >>> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update >>> upgrade >>> method I ended up with a system which gets stuck while trying to >>> attach >>> the second set of disks. This happened already after the first phase >>> of >>> the upgrade procedure (installing and re-booting with a new kernel). >>> >>> The first set of disks (ada0 .. ada2) are attached successfully, also >>> a >>> cd0, but then when the first of the set of four (a regular spinning >>> disk) >>> on an LSI controller is to be attached, the boot procedure just gets >>> stuck there: >>> kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO >>> 8192bytes) >>> kernel: ada1: Command Queueing enabled >>> kernel: ada1: 305245MB (625142448 512 byte sectors) >>> kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 >>> kernel: ada2: ATA8-ACS SATA 3.x device >>> kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 >>> kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO >>> 8192bytes) >>> kernel: ada2: Command Queueing enabled >>> kernel: ada2: 114473MB (234441648 512 byte sectors) >>> kernel: ada2: quirks=0x1<4K> >>> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 >>> >>> (stuck here, keyboard not responding, fans rising their pitch, >>> presumably CPU is spinning) > [...] > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org"