Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Jan 2012 09:40:41 -0800
From:      Freddie Cash <fjwcash@gmail.com>
To:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Upgrade from 8.2-STABLE to 9.0-RELEASE wedges on SuperMicro H8DGiF-based system
Message-ID:  <CAOjFWZ6PbXCBoOinZRvXKmHDM8xWsYU657yPh5-i9TsmnFpdVg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Good morning,

Just wondering if anyone else has run into a similar issue.

We have a ZFS storage server that was running 8.2-STABLE (from around
beginning of Dec 2011) without any issues, that was upgraded to
9.0-RELEASE (to consolidate all the ZFS and networking fixes/updates
and bring it up to version parity with our other ZFS storage server
running 9.0) last Thursday.  The "svn switch" of the source tree, the
buildworld, the buildkernel, the installkernel, the reboot with the
new kernel, the installworld, the reboot into the new world, the
mergemaster processes all completed successfully.  About half-way
through the "make delete-old" process, the box locked up.  No messages
on the console, no log entries of any kind, everything just stopped.
Had to do a power-cycle.  And then everything went to hell.  :(

On reboot, the loader complained about not being able to determine
which disk it was booting from (even though the new loader had already
booted at least once), and gave strange messages about
panic/free/something or other (didn't write that error down).

I was able to boot using a 9.0 install CD, drop to a loader prompt,
unload the kernel/modules from CD, load the kernel/modules from the
harddrive, set currdev to the harddrive, and boot.  But no matter what
I did (gpart bootcode using pmbr/gptboot from CD or from HD; copy
loader from CD, copy /boot from CD), I could not get the loader on the
HD to load the kernel; always gave the same error message:  can't
determine which disk we're booting from.

After trying for 24 hours to make it work, I just re-installed off the
9.0-RELEASE CD.

Now, this box (alphadrive) will freeze after running for between 3 and
10 hours.  Even when left completely idle, it will lock up after about
3 hours.  :(

I have another system (betadrive) that's almost identical hardware
(chassis, backplane, SATA controllers are different, everything else
is the same) that went from 8.2-STABLE to 9.0-RC2 to 9.0-RC3 to
9.0-RELEASE without any issues.  I've tried copying /boot/loader.conf,
/etc/make.conf, /etc/src.conf, /etc/sysctl.conf, /etc/rc.conf from
betadrive to alphadrive, without any change in the freezing behaviour.

These are ZFS storage systems, with / (UFS) and swap on SSDs, with 16
or 24 SATA HDs in the pool (3x 5-disk raidz2 + spare and 4x 6-disk
raidz2 resp).  All of the ZFS settings are identical between the two
systems (pool name, pool properties, ZFS filesystems, ZFS properties
per filesystem).  Dedupe and compression (LZJB) are enabled on both
systems.

When alphadrive locks up, there are no entries made in any log files;
there are no log entries on the console; there are no entries in the
BIOS event log; there are no entries in the IPMI event log; the
CPU/case temps are below 40C (emergency shutoff is 75C) as shown via
IPMI; RAM usage is under 20 GB (24 GB per box) with the lowest being
under 2 GB used (I run top on the console so I can see the stats when
it locks up, and the time it locks up).  It just ... stops.

The system will even lock up when running in single-user mode, with
only / mounted (ZFS not loaded, zpool not imported).

Hardware (alphadrive):
  Chenbro 5U rackmount chassis with 24 hot-swap drive bays
  SuperMicro H8DGi-F motherboard
  AMD Opteron 2218 CPU (8-cores at 2.0 GHz)
  24 GB DDR3-SDRAM
  3x SuperMicro AOC-USAS-L8i SATA controllers (multi-lane break-out cables)
  8x Seagate 7200.12 1.5 TB SATA harddrives
 16x WD RE4 1.0 TB SATA harddrives
  1x Kingston 60 GB SSD (for /, swap, L2ARC)

Hardware (betadrive):
  SuperMicro 4U rackmount chassis with 16 hot-swap drive bays
  SuperMicro H8DGi-F motherboard
  AMD Opteron 2218 CPU (8-cores at 2.0 GHz)
  24 GB DDR3-SDRAM
  2x SuperMicro AOC-USAS2-L8i SATA controllers (multi-lane cables)
 16x WD RE4 2.0 TB SATA harddrives
  1x Kingston 60 GB SSD (for /, swap, L2ARC)

betadrive runs perfectly with FreeBSD 9.0-RELEASE.
alphadrive locks up with FreeBSD 9.0-RELEASE.

We're currently investigating hardware firmware revisions to see if
anything else is different between the two systems.

Has anyone experience anything similar?  Does anyone have any ideas on
what to look for?  Any suggestions on what to try next?

-- 
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ6PbXCBoOinZRvXKmHDM8xWsYU657yPh5-i9TsmnFpdVg>