From owner-freebsd-stable@freebsd.org Mon Jul 17 23:01:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59C9DDA36E6; Mon, 17 Jul 2017 23:01:23 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 065B67613F; Mon, 17 Jul 2017 23:01:23 +0000 (UTC) (envelope-from Mark.Martinec+freebsd@ijs.si) Received: from amavis-ori.ijs.si (localhost [IPv6:::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.ijs.si (Postfix) with ESMTPS id 3xBJgh48V6z9g; Tue, 18 Jul 2017 01:01:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:organization:subject:subject:from:from :date:date:content-transfer-encoding:content-type:content-type :mime-version:received:received:received:received; s=jakla4; t= 1500332476; x=1502924477; bh=ImLI30NzJkvUphL7G6XtDM8KFHbplOwj8aH K7UWXL0w=; b=KQEteV2qwTB2ethPcJggxvZE9LJ4dSOTTXC7MKq1gYP9H/Zxw/S uIbAY5ptlWlJbqsNIm7cDmZT7IDoLkLFX8SZu7pRFS0gTeg8v++umjZMmcW52Wwq zG4CvSDsg1oZOWAFcLxG6dSucMam7DNc3LcGamhufiyRoxrSmBinSsRY= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10026) with LMTP id OUB8vt3EGW9M; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP id 3xBJgc31qNz9f; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from nabiralnik.ijs.si (nabiralnik.ijs.si [IPv6:2001:1470:ff80::80:16]) by mildred.ijs.si (Postfix) with ESMTP id 3xBJgc2nJTz1Rj; Tue, 18 Jul 2017 01:01:16 +0200 (CEST) Received: from sleepy.ijs.si (2001:1470:ff80:e001::76) by nabiralnik.ijs.si with HTTP (HTTP/2.0 POST); Tue, 18 Jul 2017 01:01:16 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 18 Jul 2017 01:01:16 +0200 From: Mark Martinec To: freebsd-stable@freebsd.org Cc: freebsd-hackers@freebsd.org Subject: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching Organization: Jozef Stefan Institute Message-ID: X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.2.4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 23:01:23 -0000 Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update upgrade method I ended up with a system which gets stuck while trying to attach the second set of disks. This happened already after the first phase of the upgrade procedure (installing and re-booting with a new kernel). The first set of disks (ada0 .. ada2) are attached successfully, also a cd0, but then when the first of the set of four (a regular spinning disk) on an LSI controller is to be attached, the boot procedure just gets stuck there: kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) kernel: ada1: Command Queueing enabled kernel: ada1: 305245MB (625142448 512 byte sectors) kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0 kernel: ada2: ATA8-ACS SATA 3.x device kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8 kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes) kernel: ada2: Command Queueing enabled kernel: ada2: 114473MB (234441648 512 byte sectors) kernel: ada2: quirks=0x1<4K> kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 (stuck here, keyboard not responding, fans rising their pitch, presumably CPU is spinning) (instead of the normal continuation like: kernel: da0: Fixed Direct Access SPC-4 SCSI device kernel: da0: Serial Number .... kernel: da0: 600.000MB/s transfers kernel: da0: Command Queueing enabled kernel: da0: 1907729MB (3907029168 512 byte sectors) ) The controller for da0 .. da3 is an LSI: kernel: mps0: port 0x4000-0x40ff mem 0xd1740000-0xd1743fff,0xd1300000-0xd133ffff irq 16 at device 0.0 on pci1 kernel: mps0: Firmware: 14.00.01.00, Driver: 21.02.00.00-fbsd kernel: mps0: IOCCapabilities: 185c [...] kernel: mps0: SAS Address for SATA device = a4a4843003d0cf79 kernel: mps0: SAS Address from SATA device = a4a4843003d0cf79 kernel: mps0: SAS Address for SATA device = d3d48904eddff0d5 kernel: mps0: SAS Address from SATA device = d3d48904eddff0d5 [...] kernel: mps0: SAS Address for SATA device = 2a021c07585c665b kernel: mps0: SAS Address from SATA device = 2a021c07585c665b kernel: mps0: SAS Address for SATA device = 2a021c0758637b7c kernel: mps0: SAS Address from SATA device = 2a021c0758637b7c This host in this configuration worked perfectly well with 11.0 and many older versions of the OS. After some frustration I found out that the system can boot fine if a boot loader option "Safe mode" is set. This way I successfully finished the upgrade procedure (installing world). Playing with loader options that the "Safe mode" turns on ( /boot/menu-commands.4th ) it seems that kern.smp.disabled=1 is the crucial option, although my attempts at ruling out remaining options of the "Safe mode" turned out inconclusive - perhaps there is some random/race involved. Anyway, in "Safe mode" the machine always boots normally and attaches all disks. This experience is much like described in: https://forums.freebsd.org/threads/56524/ where the poster ended up disabling SMP to be able to have a working host. It is also somewhat similar to: https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051258.html where a FreeBSD 11.1 prerelease only boots on a single-CPU AWS host, but fails to boot on a 2-core CPU, with various symptoms, including: ( https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051260.html ) Feeding entropy: . spin lock 0xffffffff80db45c0 (smp rendezvous) held by 0xfffff80004378560 (tid 100074) too long timeout stopping cpus panic: spin lock held too long Please advise, thanks Mark