Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Jun 2012 08:04:46 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Charles Owens <cowens@greatbaysoftware.com>
Subject:   Re: mfi(4) IO performance regression, post 8.1
Message-ID:  <201206150804.46341.jhb@freebsd.org>
In-Reply-To: <4FDABA0B.5030702@greatbaysoftware.com>
References:  <4FDABA0B.5030702@greatbaysoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
> Hello FreeBSD folk,
> 
> We're seeing what appears to be a storage performance regression as we 
> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it 
> appears that the regression happened between 8.1 and 8.2.
> 
> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs.  
> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 
> configuration, using UFS + geom_journal for filesystem.
> 
> Postgresql performance, as seen via pgbench, dropped by approx 20%.  
> This testing was done with our usual PAE-enabled kernels.  We then went 
> back to GENERIC kernels and did comparisons using "bonnie", results 
> below.  Following that is a kernel boot log.
> 
> Notably, we're seeing this regression only with our RAID mfi(4) based 
> systems.  Notably, from looking at FreeBSD source changelogs it appears 
> that the mfi(4) code has seen some changes since 8.1.

Between 8.1 and 8.2 mfi has not had any significant changes.  The only changes
made to sys/dev/mfi were to add a new constant:

> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi 
svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
Index: mfireg.h
===================================================================
--- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
+++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
@@ -975,7 +975,9 @@
        MFI_PD_STATE_OFFLINE = 0x10,
        MFI_PD_STATE_FAILED = 0x11,
        MFI_PD_STATE_REBUILD = 0x14,
-       MFI_PD_STATE_ONLINE = 0x18
+       MFI_PD_STATE_ONLINE = 0x18,
+       MFI_PD_STATE_COPYBACK = 0x20,
+       MFI_PD_STATE_SYSTEM = 0x40
 };
 
 union mfi_ld_ref {

The difference in write performance must be due to something else.  You 
mentioned you are using UFS + gjournal.  I think gjournal uses BIO_FLUSH, so I 
wonder if this is related:

------------------------------------------------------------------------
r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines

MFC 212160:

Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.

 o The last_offset used to generate the sort key for newly queued bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   "blocked bios" are released, so using the barrier's offset for
   last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
        o Update last_offset in bioq_insert_tail().

        o Only update last_offset in bioq_remove() if the removed bio is
          at the head of the queue (typically due to a call via
          bioq_takefirst()) and no barrier is active.

        o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
          set prev to the barrier and cur to it's next element.  Now that
          last_offset is kept at the barrier position, this change isn't
          strictly necessary, but since we have to take a decision branch
          anyway, it does avoid one, no-op, loop iteration in the while
          loop that immediately follows.

        o In bioq_disksort(), bypass the normal sort for bios with the
          BIO_ORDERED attribute and instead insert them into the queue
          with bioq_insert_tail().  bioq_insert_tail() not only gives
          the desired command order during insertion, but also provides
          barrier semantics so that commands disksorted in the future
          cannot pass the just enqueued transaction.

sys/sys/bio.h:
        Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
        Use an ordered command for SCSI/ATA-NCQ commands issued in
        response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
        Use an ordered tag when issuing a synchronize cache command.

        Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
        Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:   Spectra Logic Corporation
------------------------------------------------------------------------

Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line
changed in geom_io.c in 8.2?  That would be effectively reverting this
portion of the diff:

Index: geom_io.c
===================================================================
--- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
+++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
@@ -265,6 +265,7 @@
        g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
        bp = g_alloc_bio();
        bp->bio_cmd = BIO_FLUSH;
+       bp->bio_flags |= BIO_ORDERED;
        bp->bio_done = NULL;
        bp->bio_attribute = NULL;
        bp->bio_offset = cp->provider->mediasize;

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201206150804.46341.jhb>