Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Sep 2014 20:41:11 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 193758] New: either gptzfsboot or zfsloader hangs during boot after kernel and pool upgrade
Message-ID:  <bug-193758-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193758

            Bug ID: 193758
           Summary: either gptzfsboot or zfsloader hangs during boot after
                    kernel and pool upgrade
           Product: Base System
           Version: 8.4-STABLE
          Hardware: amd64
                OS: Any
            Status: Needs Triage
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: mark@exonetric.com

On June 29, I updated an amd64 system with a GPT ZFS root, to FreeBSD
8.4-RELEASE-p13 #0 r268016: Sun Jun 29 12:58:11 UTC 2014

I rebooted and this worked without issue. On Sep 9, 2014, I upgraded both pools
to the latest for 8.4-RELEASE and so the version property no longer applies.

$ zpool get version 
NAME   PROPERTY  VALUE    SOURCE
pool0  version   -        default
pool1  version   -        default

$ zpool status
  pool: pool0
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    pool0       ONLINE       0     0     0
      da0p3     ONLINE       0     0     0

errors: No known data errors

  pool: pool1
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    pool1       ONLINE       0     0     0
      da1p1     ONLINE       0     0     0

errors: No known data errors

$ gpart show
=>       34  312477629  da0  GPT  (149G)
         34        128    1  freebsd-boot  (64k)
        162   67108864    2  freebsd-swap  (32G)
   67109026  245368637    3  freebsd-zfs  (117G)

=>       34  312477629  da1  GPT  (149G)
         34  312477629    1  freebsd-zfs  (149G)


I applied the recommended update to gptzfsboot and the pmbr as follows

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

On Sep. 16, I rebooted the system to find that it no longer booted and the
system froze right after the first "/" symbol. I.e. the "spinner" did not spin.
Even CTRL-ALT-DEL was insufficient to break it out of the frozen state, a reset
or power cycle was required.

We used a bootable 8.4 USB image to get access to the system after it became
clear there was no way to boot from the internal drives. Using the fixit shell,
we determined the pools were intact and undamaged and reapplied the bootcode
again from the USB image as a speculative measure, all to no effect.

The underlying block devices are 3ware RAID controller volumes as follows:

$ egrep da[01] /var/run/dmesg.boot
da0 at twa0 bus 0 scbus0 target 0 lun 0
da0: <AMCC 9650SE-4LP DISK 3.08> Fixed Direct Access SCSI-5 device 
da0: 100.000MB/s transfers
da0: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C)
da1 at twa0 bus 0 scbus0 target 1 lun 0
da1: <AMCC 9650SE-4LP DISK 3.08> Fixed Direct Access SCSI-5 device 
da1: 100.000MB/s transfers
da1: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C)

At this point, we've configured a USB stick to handle the boot phase and so
that's the workaround, but this seems like quite an extreme failure mode for
gptzfsboot that could probably do with some attention.

I've not yet attempted to replicate this on any other system and here's the
CPU/RAM for this one:

CPU: Intel(R) Xeon(R) CPU           E5410  @ 2.33GHz (2327.51-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Family = 6  Model = 17  Stepping = 6
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
 
Features2=0xce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 17716740096 (16896 MB)
avail memory = 16535756800 (15769 MB)

It's a TYAN Tempest i5100X S5375 motherboard with the following dmidecode
details for the BIOS:

        Vendor: American Megatrends Inc.
        Version: 080014 
        Release Date: 01/30/2008

So, where is the boot process likely to have hung if it was so early and what
does it mean if CTRL-ALT-DEL is ineffective?

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-193758-8>