Date: Thu, 21 Jun 2012 22:36:04 -0400 From: Charles Owens <cowens@greatbaysoftware.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: mfi(4) IO performance regression, post 8.1 Message-ID: <4FE3DA14.9090506@greatbaysoftware.com> In-Reply-To: <201206150804.46341.jhb@freebsd.org> References: <4FDABA0B.5030702@greatbaysoftware.com> <201206150804.46341.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6/15/12 8:04 AM, John Baldwin wrote: > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: >> Hello FreeBSD folk, >> >> We're seeing what appears to be a storage performance regression as we >> try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it >> appears that the regression happened between 8.1 and 8.2. >> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 >> configuration, using UFS + geom_journal for filesystem. >> >> Postgresql performance, as seen via pgbench, dropped by approx 20%. >> This testing was done with our usual PAE-enabled kernels. We then went >> back to GENERIC kernels and did comparisons using "bonnie", results >> below. Following that is a kernel boot log. >> >> Notably, we're seeing this regression only with our RAID mfi(4) based >> systems. Notably, from looking at FreeBSD source changelogs it appears >> that the mfi(4) code has seen some changes since 8.1. > Between 8.1 and 8.2 mfi has not had any significant changes. The only changes > made to sys/dev/mfi were to add a new constant: > >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi > Index: mfireg.h > =================================================================== > --- mfireg.h (.../8.1/sys/dev/mfi) (revision 237134) > +++ mfireg.h (.../8.2/sys/dev/mfi) (revision 237134) > @@ -975,7 +975,9 @@ > MFI_PD_STATE_OFFLINE = 0x10, > MFI_PD_STATE_FAILED = 0x11, > MFI_PD_STATE_REBUILD = 0x14, > - MFI_PD_STATE_ONLINE = 0x18 > + MFI_PD_STATE_ONLINE = 0x18, > + MFI_PD_STATE_COPYBACK = 0x20, > + MFI_PD_STATE_SYSTEM = 0x40 > }; > > union mfi_ld_ref { > > The difference in write performance must be due to something else. You > mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I > wonder if this is related: > > ------------------------------------------------------------------------ > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines > > MFC 212160: > > Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. > Add the BIO_ORDERED flag for struct bio and update bio clients to use it. > > The barrier semantics of bioq_insert_tail() were broken in two ways: > > o In bioq_disksort(), an added bio could be inserted at the head of > the queue, even when a barrier was present, if the sort key for > the new entry was less than that of the last queued barrier bio. > > o The last_offset used to generate the sort key for newly queued bios > did not stay at the position of the barrier until either the > barrier was de-queued, or a new barrier (which updates last_offset) > was queued. When a barrier is in effect, we know that the disk > will pass through the barrier position just before the > "blocked bios" are released, so using the barrier's offset for > last_offset is the optimal choice. > > sys/geom/sched/subr_disk.c: > sys/kern/subr_disk.c: > o Update last_offset in bioq_insert_tail(). > > o Only update last_offset in bioq_remove() if the removed bio is > at the head of the queue (typically due to a call via > bioq_takefirst()) and no barrier is active. > > o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), > set prev to the barrier and cur to it's next element. Now that > last_offset is kept at the barrier position, this change isn't > strictly necessary, but since we have to take a decision branch > anyway, it does avoid one, no-op, loop iteration in the while > loop that immediately follows. > > o In bioq_disksort(), bypass the normal sort for bios with the > BIO_ORDERED attribute and instead insert them into the queue > with bioq_insert_tail(). bioq_insert_tail() not only gives > the desired command order during insertion, but also provides > barrier semantics so that commands disksorted in the future > cannot pass the just enqueued transaction. > > sys/sys/bio.h: > Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. > > sys/cam/ata/ata_da.c: > sys/cam/scsi/scsi_da.c > Use an ordered command for SCSI/ATA-NCQ commands issued in > response to bios with the BIO_ORDERED flag set. > > sys/cam/scsi/scsi_da.c > Use an ordered tag when issuing a synchronize cache command. > > Wrap some lines to 80 columns. > > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > sys/geom/geom_io.c > Mark bios with the BIO_FLUSH command as BIO_ORDERED. > > Sponsored by: Spectra Logic Corporation > ------------------------------------------------------------------------ > > Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line > changed in geom_io.c in 8.2? That would be effectively reverting this > portion of the diff: > > Index: geom_io.c > =================================================================== > --- geom_io.c (.../8.1/sys/geom) (revision 237134) > +++ geom_io.c (.../8.2/sys/geom) (revision 237134) > @@ -265,6 +265,7 @@ > g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name); > bp = g_alloc_bio(); > bp->bio_cmd = BIO_FLUSH; > + bp->bio_flags |= BIO_ORDERED; > bp->bio_done = NULL; > bp->bio_attribute = NULL; > bp->bio_offset = cp->provider->mediasize; > John... thanks for the suggestion. I've built and tested a kernel with this change made. Result: no change (same performance as with 8.2-GENERIC). Any thoughts as to where to go next? Thank you, Charles
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FE3DA14.9090506>