From owner-freebsd-bugs@FreeBSD.ORG Thu Sep 18 20:41:11 2014 Return-Path: Delivered-To: freebsd-bugs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8243667B for ; Thu, 18 Sep 2014 20:41:11 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 687D0FD for ; Thu, 18 Sep 2014 20:41:11 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id s8IKfBMW039532 for ; Thu, 18 Sep 2014 20:41:11 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 193758] New: either gptzfsboot or zfsloader hangs during boot after kernel and pool upgrade Date: Thu, 18 Sep 2014 20:41:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 8.4-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: mark@exonetric.com X-Bugzilla-Status: Needs Triage X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Sep 2014 20:41:11 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193758 Bug ID: 193758 Summary: either gptzfsboot or zfsloader hangs during boot after kernel and pool upgrade Product: Base System Version: 8.4-STABLE Hardware: amd64 OS: Any Status: Needs Triage Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: mark@exonetric.com On June 29, I updated an amd64 system with a GPT ZFS root, to FreeBSD 8.4-RELEASE-p13 #0 r268016: Sun Jun 29 12:58:11 UTC 2014 I rebooted and this worked without issue. On Sep 9, 2014, I upgraded both pools to the latest for 8.4-RELEASE and so the version property no longer applies. $ zpool get version NAME PROPERTY VALUE SOURCE pool0 version - default pool1 version - default $ zpool status pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 errors: No known data errors pool: pool1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 errors: No known data errors $ gpart show => 34 312477629 da0 GPT (149G) 34 128 1 freebsd-boot (64k) 162 67108864 2 freebsd-swap (32G) 67109026 245368637 3 freebsd-zfs (117G) => 34 312477629 da1 GPT (149G) 34 312477629 1 freebsd-zfs (149G) I applied the recommended update to gptzfsboot and the pmbr as follows gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0 On Sep. 16, I rebooted the system to find that it no longer booted and the system froze right after the first "/" symbol. I.e. the "spinner" did not spin. Even CTRL-ALT-DEL was insufficient to break it out of the frozen state, a reset or power cycle was required. We used a bootable 8.4 USB image to get access to the system after it became clear there was no way to boot from the internal drives. Using the fixit shell, we determined the pools were intact and undamaged and reapplied the bootcode again from the USB image as a speculative measure, all to no effect. The underlying block devices are 3ware RAID controller volumes as follows: $ egrep da[01] /var/run/dmesg.boot da0 at twa0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 100.000MB/s transfers da0: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C) da1 at twa0 bus 0 scbus0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C) At this point, we've configured a USB stick to handle the boot phase and so that's the workaround, but this seems like quite an extreme failure mode for gptzfsboot that could probably do with some attention. I've not yet attempted to replicate this on any other system and here's the CPU/RAM for this one: CPU: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz (2327.51-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17 Stepping = 6 Features=0xbfebfbff Features2=0xce3bd AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant real memory = 17716740096 (16896 MB) avail memory = 16535756800 (15769 MB) It's a TYAN Tempest i5100X S5375 motherboard with the following dmidecode details for the BIOS: Vendor: American Megatrends Inc. Version: 080014 Release Date: 01/30/2008 So, where is the boot process likely to have hung if it was so early and what does it mean if CTRL-ALT-DEL is ineffective? -- You are receiving this mail because: You are the assignee for the bug.