From owner-freebsd-stable@FreeBSD.ORG Fri Jun 15 17:18:23 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86487106566C for ; Fri, 15 Jun 2012 17:18:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by mx1.freebsd.org (Postfix) with ESMTP id 5010C8FC17 for ; Fri, 15 Jun 2012 17:18:23 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 76CACB911; Fri, 15 Jun 2012 13:18:22 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Fri, 15 Jun 2012 08:04:46 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <4FDABA0B.5030702@greatbaysoftware.com> In-Reply-To: <4FDABA0B.5030702@greatbaysoftware.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201206150804.46341.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 15 Jun 2012 13:18:22 -0400 (EDT) Cc: Charles Owens Subject: Re: mfi(4) IO performance regression, post 8.1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jun 2012 17:18:23 -0000 On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: > Hello FreeBSD folk, > > We're seeing what appears to be a storage performance regression as we > try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it > appears that the regression happened between 8.1 and 8.2. > > Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. > Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 > configuration, using UFS + geom_journal for filesystem. > > Postgresql performance, as seen via pgbench, dropped by approx 20%. > This testing was done with our usual PAE-enabled kernels. We then went > back to GENERIC kernels and did comparisons using "bonnie", results > below. Following that is a kernel boot log. > > Notably, we're seeing this regression only with our RAID mfi(4) based > systems. Notably, from looking at FreeBSD source changelogs it appears > that the mfi(4) code has seen some changes since 8.1. Between 8.1 and 8.2 mfi has not had any significant changes. The only changes made to sys/dev/mfi were to add a new constant: > svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi Index: mfireg.h =================================================================== --- mfireg.h (.../8.1/sys/dev/mfi) (revision 237134) +++ mfireg.h (.../8.2/sys/dev/mfi) (revision 237134) @@ -975,7 +975,9 @@ MFI_PD_STATE_OFFLINE = 0x10, MFI_PD_STATE_FAILED = 0x11, MFI_PD_STATE_REBUILD = 0x14, - MFI_PD_STATE_ONLINE = 0x18 + MFI_PD_STATE_ONLINE = 0x18, + MFI_PD_STATE_COPYBACK = 0x20, + MFI_PD_STATE_SYSTEM = 0x40 }; union mfi_ld_ref { The difference in write performance must be due to something else. You mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I wonder if this is related: ------------------------------------------------------------------------ r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines MFC 212160: Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. Add the BIO_ORDERED flag for struct bio and update bio clients to use it. The barrier semantics of bioq_insert_tail() were broken in two ways: o In bioq_disksort(), an added bio could be inserted at the head of the queue, even when a barrier was present, if the sort key for the new entry was less than that of the last queued barrier bio. o The last_offset used to generate the sort key for newly queued bios did not stay at the position of the barrier until either the barrier was de-queued, or a new barrier (which updates last_offset) was queued. When a barrier is in effect, we know that the disk will pass through the barrier position just before the "blocked bios" are released, so using the barrier's offset for last_offset is the optimal choice. sys/geom/sched/subr_disk.c: sys/kern/subr_disk.c: o Update last_offset in bioq_insert_tail(). o Only update last_offset in bioq_remove() if the removed bio is at the head of the queue (typically due to a call via bioq_takefirst()) and no barrier is active. o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), set prev to the barrier and cur to it's next element. Now that last_offset is kept at the barrier position, this change isn't strictly necessary, but since we have to take a decision branch anyway, it does avoid one, no-op, loop iteration in the while loop that immediately follows. o In bioq_disksort(), bypass the normal sort for bios with the BIO_ORDERED attribute and instead insert them into the queue with bioq_insert_tail(). bioq_insert_tail() not only gives the desired command order during insertion, but also provides barrier semantics so that commands disksorted in the future cannot pass the just enqueued transaction. sys/sys/bio.h: Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. sys/cam/ata/ata_da.c: sys/cam/scsi/scsi_da.c Use an ordered command for SCSI/ATA-NCQ commands issued in response to bios with the BIO_ORDERED flag set. sys/cam/scsi/scsi_da.c Use an ordered tag when issuing a synchronize cache command. Wrap some lines to 80 columns. sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c sys/geom/geom_io.c Mark bios with the BIO_FLUSH command as BIO_ORDERED. Sponsored by: Spectra Logic Corporation ------------------------------------------------------------------------ Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line changed in geom_io.c in 8.2? That would be effectively reverting this portion of the diff: Index: geom_io.c =================================================================== --- geom_io.c (.../8.1/sys/geom) (revision 237134) +++ geom_io.c (.../8.2/sys/geom) (revision 237134) @@ -265,6 +265,7 @@ g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name); bp = g_alloc_bio(); bp->bio_cmd = BIO_FLUSH; + bp->bio_flags |= BIO_ORDERED; bp->bio_done = NULL; bp->bio_attribute = NULL; bp->bio_offset = cp->provider->mediasize; -- John Baldwin