From owner-freebsd-stable@freebsd.org Thu Jul 20 00:02:28 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3D72D9AA2F for ; Thu, 20 Jul 2017 00:02:28 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk0-x22f.google.com (mail-qk0-x22f.google.com [IPv6:2607:f8b0:400d:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6A4E071F69 for ; Thu, 20 Jul 2017 00:02:28 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk0-x22f.google.com with SMTP id t2so6787707qkc.1 for ; Wed, 19 Jul 2017 17:02:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1Xi/C80sSlSxlchwRm6s1+Xz/vNhgkkHvVbkK7A3Opw=; b=l3J5vGPZNHTfffYeQV+e4Oo97H9uZs+QIWfu14jM5ZMfx4Ur+vUp5OMnD6rAWBSqVZ IB+kWbF9x4bb33eQqrlHXsxvH2c/bkTWpGNvKO91UqAhV8FhCxysMWq1hz1umBmZEGfS GSP0OM+jrkt4bf+FHDehUJAhAddoTNQcrp+UQPc30ar7PobGcc+aV+8JxL8s/GjP9TeZ 5j0Tcbltnfvu92iTXJ+XKumHPCnuBD9TuFX/gIMrfVv2WQKn6cH2zRiDksW+ll2IvBcy L2o18TR/xj5NG+BtJZrUWp3FyJ6MU9K/EKfSZG59kmQC5Gcj668275KOAu2ky42n3C8W x1cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=1Xi/C80sSlSxlchwRm6s1+Xz/vNhgkkHvVbkK7A3Opw=; b=fxNVS7PeVFqAh3ifGs6B7olMxrsOcbfpm450mABM8gqs8hxMyoj0KGKL2RpLw5mN+m FwdnXUKQITvomrwn92gHpZQ/BfmbtmAZQciLRSPypkuPLC8go4qC70GH/KOcIe6BC4rD 4/pD8fUjBSiULVj9TGIz4VhscdWIFC8/p6r8FeYa+6qEVWxnbx54FCZPQ18fz4Tq/xH3 eL7E4yjZvL4d4hVfHHLwcijJljNAOC4fyDrcg7+LU+fy9O7DI6L4Vqc0I0M94NWQfnpU Fp0+ZJ1IfTmdbTDBWGq4cJv7KuSuxAFEDiaW0EXtSYRp1i2sj98LsDrpOJkvCfrXHB2g lhNw== X-Gm-Message-State: AIVw113XVOBuzCMm78RqyHhlnhxaa3Q2K23RFo5y5nZLjgUpX7xsd9R1 byC7kwbTYGzY5+Y+ X-Received: by 10.55.207.199 with SMTP id v68mr2561008qkl.142.1500508947332; Wed, 19 Jul 2017 17:02:27 -0700 (PDT) Received: from wkstn-mjohnston.west.isilon.com (c-76-104-201-218.hsd1.wa.comcast.net. [76.104.201.218]) by smtp.gmail.com with ESMTPSA id c4sm936675qtc.1.2017.07.19.17.02.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Jul 2017 17:02:26 -0700 (PDT) Sender: Mark Johnston Date: Wed, 19 Jul 2017 17:03:25 -0700 From: Mark Johnston To: Mark Martinec Cc: freebsd-stable@freebsd.org Subject: Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Message-ID: <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com> References: <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si> User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jul 2017 00:02:28 -0000 On Thu, Jul 20, 2017 at 01:46:33AM +0200, Mark Martinec wrote: > More news on the matter. As reported yesterday the locally built > kernel with options INVARIANTS and DDB works fine and somehow avoids > the trouble at attaching the da (mps) disks on an LSI controller, so > today I wanted to get back to a reproducible hang - and sure enough, > reverting to the generic kernel as distributed brings back the hang. > > So I tried rebuilding the kernel while experimenting with options > like DDB and INVARIANTS. > > A locally built GENERIC kernel behaves the same as the original > kernel from the distribution (as installed by freebsd-upgrade), > so no surprises there. It hangs trying to attach the first of the > da disks (after first successfully attaching all the ada disks). > The alt ctrl esc is unable to enter debugger when the hang occurs > (possibly due to an unresponsive USB keyboard at that time), > even though the debug.kdb.break_to_debugger was set to 1 at a > loader prompt. It needs loader "Safe mode" to be able to boot. > > Next, a locally built kernel with DDB and INVARIANTS works well > (the remaining options come from an included GENERIC). > > Now the funny part: a locally built kernel with just the DDB > option (and the rest included from GENERIC) *also* works well. > Somehow the DDB option makes a difference, even though kernel > debugger is never activated. One thing to try at this point would be to disable EARLY_AP_STARTUP in the kernel config. That is, take a configuration with which you're able to reproduce the hang during boot, and remove "options EARLY_AP_STARTUP". This feature has a fairly large impact on the bootup process and has had a few problems that manifested as hangs during boot. There was at least one other case where an innocuous change to the kernel configuration "fixed" the problem by introducing some second-order effect (causing kernel threads to be scheduled in a different order, for instance). Regardless of whether the suggestion above makes a difference, it would be helpful to see verbose dmesgs from both a clean boot and a boot that hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some assertions that will cause the system to panic when the hang occurs, making it easier to see what's going on. > > To re-assert: at the time of a hang the CPU fan starts revving up, > and the USB keyboard is unresponsive ( does not enter scroll > mode, caps lock and num lock do not toggle their LED indicators, > alt ctrl esc do not activate kernel debugger. Loader "Safe mode" > avoids the problem (presumably by disabling SMP). > > Meanwhile I have successfully upgraded two other similar > hosts from 11.0 to 11.1-RC3, no surprises there (but they do not > have the same disk controller). > > Not sure what to try next. > > Mark