Date: Fri, 13 Jul 2012 09:39:54 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-stable@freebsd.org Cc: Charles Owens <cowens@greatbaysoftware.com>, Steve McCoy <smccoy@greatbaysoftware.com> Subject: Re: mfi(4) IO performance regression, post 8.1 Message-ID: <201207130939.54311.jhb@freebsd.org> In-Reply-To: <4FFF9A50.40006@greatbaysoftware.com> References: <4FDABA0B.5030702@greatbaysoftware.com> <4FFF34BA.9030002@greatbaysoftware.com> <4FFF9A50.40006@greatbaysoftware.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, July 12, 2012 11:47:28 pm Steve McCoy wrote: > On 7/12/12 4:34 PM, Steve McCoy wrote: > > On 7/12/12 4:14 PM, Charles Owens wrote: > >> On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote: > >>> > >>> On 6/15/12 8:04 AM, John Baldwin wrote: > >>> > On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: > >>> >> Hello FreeBSD folk, > >>> >> > >>> >> We're seeing what appears to be a storage performance regression > >>> as we > >>> >> try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it > >>> >> appears that the regression happened between 8.1 and 8.2. > >>> >> > >>> >> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core > >>> CPUs. > >>> >> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 > >>> >> configuration, using UFS + geom_journal for filesystem. > >>> >> > >>> >> Postgresql performance, as seen via pgbench, dropped by approx 20%. > >>> >> This testing was done with our usual PAE-enabled kernels. We then > >>> went > >>> >> back to GENERIC kernels and did comparisons using "bonnie", results > >>> >> below. Following that is a kernel boot log. > >>> >> > >>> >> Notably, we're seeing this regression only with our RAID mfi(4) ba= sed > >>> >> systems. Notably, from looking at FreeBSD source changelogs it > >>> appears > >>> >> that the mfi(4) code has seen some changes since 8.1. > >>> > Between 8.1 and 8.2 mfi has not had any significant changes. The > >>> only changes > >>> > made to sys/dev/mfi were to add a new constant: > >>> > > >>> >> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi > >>> > svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi > >>> > Index: mfireg.h > >>> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>> > --- mfireg.h (.../8.1/sys/dev/mfi) (revision 237134) > >>> > +++ mfireg.h (.../8.2/sys/dev/mfi) (revision 237134) > >>> > @@ -975,7 +975,9 @@ > >>> > MFI_PD_STATE_OFFLINE =3D 0x10, > >>> > MFI_PD_STATE_FAILED =3D 0x11, > >>> > MFI_PD_STATE_REBUILD =3D 0x14, > >>> > - MFI_PD_STATE_ONLINE =3D 0x18 > >>> > + MFI_PD_STATE_ONLINE =3D 0x18, > >>> > + MFI_PD_STATE_COPYBACK =3D 0x20, > >>> > + MFI_PD_STATE_SYSTEM =3D 0x40 > >>> > }; > >>> > > >>> > union mfi_ld_ref { > >>> > > >>> > The difference in write performance must be due to something else. > >>> You > >>> > mentioned you are using UFS + gjournal. I think gjournal uses > >>> BIO_FLUSH, so I > >>> > wonder if this is related: > >>> > > >>> > > >>> ---------------------------------------------------------------------= =2D-- > >>> > r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 > >>> lines > >>> > > >>> > MFC 212160: > >>> > > >>> > Correct bioq_disksort so that bioq_insert_tail() offers barrier > >>> semantic. > >>> > Add the BIO_ORDERED flag for struct bio and update bio clients to > >>> use it. > >>> > > >>> > The barrier semantics of bioq_insert_tail() were broken in two ways: > >>> > > >>> > o In bioq_disksort(), an added bio could be inserted at the head = of > >>> > the queue, even when a barrier was present, if the sort key for > >>> > the new entry was less than that of the last queued barrier bio. > >>> > > >>> > o The last_offset used to generate the sort key for newly queued > >>> bios > >>> > did not stay at the position of the barrier until either the > >>> > barrier was de-queued, or a new barrier (which updates > >>> last_offset) > >>> > was queued. When a barrier is in effect, we know that the disk > >>> > will pass through the barrier position just before the > >>> > "blocked bios" are released, so using the barrier's offset for > >>> > last_offset is the optimal choice. > >>> > > >>> > sys/geom/sched/subr_disk.c: > >>> > sys/kern/subr_disk.c: > >>> > o Update last_offset in bioq_insert_tail(). > >>> > > >>> > o Only update last_offset in bioq_remove() if the removed > >>> bio is > >>> > at the head of the queue (typically due to a call via > >>> > bioq_takefirst()) and no barrier is active. > >>> > > >>> > o In bioq_disksort(), if we have a barrier (insert_point is > >>> non-NULL), > >>> > set prev to the barrier and cur to it's next element. > >>> Now that > >>> > last_offset is kept at the barrier position, this change > >>> isn't > >>> > strictly necessary, but since we have to take a decision > >>> branch > >>> > anyway, it does avoid one, no-op, loop iteration in the > >>> while > >>> > loop that immediately follows. > >>> > > >>> > o In bioq_disksort(), bypass the normal sort for bios with > >>> the > >>> > BIO_ORDERED attribute and instead insert them into the > >>> queue > >>> > with bioq_insert_tail(). bioq_insert_tail() not only gi= ves > >>> > the desired command order during insertion, but also > >>> provides > >>> > barrier semantics so that commands disksorted in the fut= ure > >>> > cannot pass the just enqueued transaction. > >>> > > >>> > sys/sys/bio.h: > >>> > Add BIO_ORDERED as bit 4 of the bio_flags field in struct > >>> bio. > >>> > > >>> > sys/cam/ata/ata_da.c: > >>> > sys/cam/scsi/scsi_da.c > >>> > Use an ordered command for SCSI/ATA-NCQ commands issued in > >>> > response to bios with the BIO_ORDERED flag set. > >>> > > >>> > sys/cam/scsi/scsi_da.c > >>> > Use an ordered tag when issuing a synchronize cache comman= d. > >>> > > >>> > Wrap some lines to 80 columns. > >>> > > >>> > sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > >>> > sys/geom/geom_io.c > >>> > Mark bios with the BIO_FLUSH command as BIO_ORDERED. > >>> > > >>> > Sponsored by: Spectra Logic Corporation > >>> > > >>> ---------------------------------------------------------------------= =2D-- > >>> > > >>> > Can you try perhaps commenting out the 'bp->bio_flags |=3D > >>> BIO_ORDERED' line > >>> > changed in geom_io.c in 8.2? That would be effectively reverting t= his > >>> > portion of the diff: > >>> > > >>> > Index: geom_io.c > >>> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >>> > --- geom_io.c (.../8.1/sys/geom) (revision 237134) > >>> > +++ geom_io.c (.../8.2/sys/geom) (revision 237134) > >>> > @@ -265,6 +265,7 @@ > >>> > g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name); > >>> > bp =3D g_alloc_bio(); > >>> > bp->bio_cmd =3D BIO_FLUSH; > >>> > + bp->bio_flags |=3D BIO_ORDERED; > >>> > bp->bio_done =3D NULL; > >>> > bp->bio_attribute =3D NULL; > >>> > bp->bio_offset =3D cp->provider->mediasize; > >>> > > >>> > >>> John... thanks for the suggestion. I've built and tested a kernel wi= th > >>> this change made. Result: no change (same performance as with > >>> 8.2-GENERIC). Any thoughts as to where to go next? > >> > >> Hmm. That seemed the most plausible candidate when I looked at this. > >> > >> Do you use quotas (there is one change in UFS related to quotas)? > >> > >> There are 5 changes that involve sys/kern/vfs_bio.c in 8.2: > >> 209459, 212229, 212562, 212583, and 213890. > >> > >> Can you possibly test out kernels from stable/8 at those revisions on = an > >> 8.1 > >> world and see if you can narrow it down futher? > >> > >> Barring that, can you do a binary search of kernels from stable/8 > >> between 8.1 > >> and 8.2 on an 8.1 world to see which commit caused the change in write > >> performance? > >> > > > > Hi John, I'm working with Charles to narrow this down. > > > > Looks like revision 212229 is the culprit, or at least around the same > > time to it, if this change isn't what slowed things down. The change to > > sys/kern/vfs_bio.c modifies some synchronization in dev_strategy(): > > >=20 > Actually, hold that thought. I had a hunch that I wasn't thorough=20 > enough, so I decided to try 212228 =E2=80=94 the performance is the same = as with=20 > 212229, so vfs_bio seems to be out of the picture. I'm going to binary=20 > search between 209459 and 212229, and see what I find. Ok. Please let me know what you find. Thanks! =2D-=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201207130939.54311.jhb>