Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Jul 2012 23:47:28 -0400
From:      Steve McCoy <smccoy@greatbaysoftware.com>
To:        jhb@freebsd.org
Cc:        Charles Owens <cowens@greatbaysoftware.com>, freebsd-stable@freebsd.org
Subject:   Re: mfi(4) IO performance regression, post 8.1
Message-ID:  <4FFF9A50.40006@greatbaysoftware.com>
In-Reply-To: <4FFF34BA.9030002@greatbaysoftware.com>
References:  <4FDABA0B.5030702@greatbaysoftware.com> <201206150804.46341.jhb@freebsd.org> <4FE3DA14.9090506@greatbaysoftware.com> <4FFF301E.30603@greatbaysoftware.com> <4FFF34BA.9030002@greatbaysoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 7/12/12 4:34 PM, Steve McCoy wrote:
> On 7/12/12 4:14 PM, Charles Owens wrote:
>> On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote:
>>>
>>> On 6/15/12 8:04 AM, John Baldwin wrote:
>>> > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote:
>>> >> Hello FreeBSD folk,
>>> >>
>>> >> We're seeing what appears to be a storage performance regression
>>> as we
>>> >> try to move from 8.1 (i386) to 8.3.   We looked at 8.2 also and it
>>> >> appears that the regression happened between 8.1 and 8.2.
>>> >>
>>> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core
>>> CPUs.
>>> >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10
>>> >> configuration, using UFS + geom_journal for filesystem.
>>> >>
>>> >> Postgresql performance, as seen via pgbench, dropped by approx 20%.
>>> >> This testing was done with our usual PAE-enabled kernels.  We then
>>> went
>>> >> back to GENERIC kernels and did comparisons using "bonnie", results
>>> >> below.  Following that is a kernel boot log.
>>> >>
>>> >> Notably, we're seeing this regression only with our RAID mfi(4) based
>>> >> systems.  Notably, from looking at FreeBSD source changelogs it
>>> appears
>>> >> that the mfi(4) code has seen some changes since 8.1.
>>> > Between 8.1 and 8.2 mfi has not had any significant changes.  The
>>> only changes
>>> > made to sys/dev/mfi were to add a new constant:
>>> >
>>> >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi
>>> > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi
>>> > Index: mfireg.h
>>> > ===================================================================
>>> > --- mfireg.h    (.../8.1/sys/dev/mfi)   (revision 237134)
>>> > +++ mfireg.h    (.../8.2/sys/dev/mfi)   (revision 237134)
>>> > @@ -975,7 +975,9 @@
>>> >          MFI_PD_STATE_OFFLINE = 0x10,
>>> >          MFI_PD_STATE_FAILED = 0x11,
>>> >          MFI_PD_STATE_REBUILD = 0x14,
>>> > -       MFI_PD_STATE_ONLINE = 0x18
>>> > +       MFI_PD_STATE_ONLINE = 0x18,
>>> > +       MFI_PD_STATE_COPYBACK = 0x20,
>>> > +       MFI_PD_STATE_SYSTEM = 0x40
>>> >   };
>>> >
>>> >   union mfi_ld_ref {
>>> >
>>> > The difference in write performance must be due to something else.
>>> You
>>> > mentioned you are using UFS + gjournal.  I think gjournal uses
>>> BIO_FLUSH, so I
>>> > wonder if this is related:
>>> >
>>> >
>>> ------------------------------------------------------------------------
>>> > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61
>>> lines
>>> >
>>> > MFC 212160:
>>> >
>>> > Correct bioq_disksort so that bioq_insert_tail() offers barrier
>>> semantic.
>>> > Add the BIO_ORDERED flag for struct bio and update bio clients to
>>> use it.
>>> >
>>> > The barrier semantics of bioq_insert_tail() were broken in two ways:
>>> >
>>> >   o In bioq_disksort(), an added bio could be inserted at the head of
>>> >     the queue, even when a barrier was present, if the sort key for
>>> >     the new entry was less than that of the last queued barrier bio.
>>> >
>>> >   o The last_offset used to generate the sort key for newly queued
>>> bios
>>> >     did not stay at the position of the barrier until either the
>>> >     barrier was de-queued, or a new barrier (which updates
>>> last_offset)
>>> >     was queued.  When a barrier is in effect, we know that the disk
>>> >     will pass through the barrier position just before the
>>> >     "blocked bios" are released, so using the barrier's offset for
>>> >     last_offset is the optimal choice.
>>> >
>>> > sys/geom/sched/subr_disk.c:
>>> > sys/kern/subr_disk.c:
>>> >          o Update last_offset in bioq_insert_tail().
>>> >
>>> >          o Only update last_offset in bioq_remove() if the removed
>>> bio is
>>> >            at the head of the queue (typically due to a call via
>>> >            bioq_takefirst()) and no barrier is active.
>>> >
>>> >          o In bioq_disksort(), if we have a barrier (insert_point is
>>> non-NULL),
>>> >            set prev to the barrier and cur to it's next element.
>>> Now that
>>> >            last_offset is kept at the barrier position, this change
>>> isn't
>>> >            strictly necessary, but since we have to take a decision
>>> branch
>>> >            anyway, it does avoid one, no-op, loop iteration in the
>>> while
>>> >            loop that immediately follows.
>>> >
>>> >          o In bioq_disksort(), bypass the normal sort for bios with
>>> the
>>> >            BIO_ORDERED attribute and instead insert them into the
>>> queue
>>> >            with bioq_insert_tail().  bioq_insert_tail() not only gives
>>> >            the desired command order during insertion, but also
>>> provides
>>> >            barrier semantics so that commands disksorted in the future
>>> >            cannot pass the just enqueued transaction.
>>> >
>>> > sys/sys/bio.h:
>>> >          Add BIO_ORDERED as bit 4 of the bio_flags field in struct
>>> bio.
>>> >
>>> > sys/cam/ata/ata_da.c:
>>> > sys/cam/scsi/scsi_da.c
>>> >          Use an ordered command for SCSI/ATA-NCQ commands issued in
>>> >          response to bios with the BIO_ORDERED flag set.
>>> >
>>> > sys/cam/scsi/scsi_da.c
>>> >          Use an ordered tag when issuing a synchronize cache command.
>>> >
>>> >          Wrap some lines to 80 columns.
>>> >
>>> > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>> > sys/geom/geom_io.c
>>> >          Mark bios with the BIO_FLUSH command as BIO_ORDERED.
>>> >
>>> > Sponsored by:   Spectra Logic Corporation
>>> >
>>> ------------------------------------------------------------------------
>>> >
>>> > Can you try perhaps commenting out the 'bp->bio_flags |=
>>> BIO_ORDERED' line
>>> > changed in geom_io.c in 8.2?  That would be effectively reverting this
>>> > portion of the diff:
>>> >
>>> > Index: geom_io.c
>>> > ===================================================================
>>> > --- geom_io.c   (.../8.1/sys/geom)      (revision 237134)
>>> > +++ geom_io.c   (.../8.2/sys/geom)      (revision 237134)
>>> > @@ -265,6 +265,7 @@
>>> >          g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name);
>>> >          bp = g_alloc_bio();
>>> >          bp->bio_cmd = BIO_FLUSH;
>>> > +       bp->bio_flags |= BIO_ORDERED;
>>> >          bp->bio_done = NULL;
>>> >          bp->bio_attribute = NULL;
>>> >          bp->bio_offset = cp->provider->mediasize;
>>> >
>>>
>>> John... thanks for the suggestion.  I've built and tested a kernel with
>>> this change made.  Result:  no change (same performance as with
>>> 8.2-GENERIC). Any thoughts as to where to go next?
>>
>> Hmm.  That seemed the most plausible candidate when I looked at this.
>>
>> Do you use quotas (there is one change in UFS related to quotas)?
>>
>> There are 5 changes that involve sys/kern/vfs_bio.c in 8.2:
>> 209459, 212229, 212562, 212583, and 213890.
>>
>> Can you possibly test out kernels from stable/8 at those revisions on an
>> 8.1
>> world and see if you can narrow it down futher?
>>
>> Barring that, can you do a binary search of kernels from stable/8
>> between 8.1
>> and 8.2 on an 8.1 world to see which commit caused the change in write
>> performance?
>>
>
> Hi John, I'm working with Charles to narrow this down.
>
> Looks like revision 212229 is the culprit, or at least around the same
> time to it, if this change isn't what slowed things down. The change to
> sys/kern/vfs_bio.c modifies some synchronization in dev_strategy():
>

Actually, hold that thought. I had a hunch that I wasn't thorough 
enough, so I decided to try 212228 — the performance is the same as with 
212229, so vfs_bio seems to be out of the picture. I'm going to binary 
search between 209459 and 212229, and see what I find.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FFF9A50.40006>