From owner-freebsd-stable@FreeBSD.ORG Tue Jul 10 04:44:36 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 24189106566B; Tue, 10 Jul 2012 04:44:36 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from ecbiz102.inmotionhosting.com (ecbiz102.inmotionhosting.com [70.39.235.94]) by mx1.freebsd.org (Postfix) with ESMTP id C75538FC0A; Tue, 10 Jul 2012 04:44:35 +0000 (UTC) Received: from c-50-136-23-27.hsd1.nh.comcast.net ([50.136.23.27]:64776 helo=jack.bspruce.com) by ecbiz102.inmotionhosting.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1SoRch-0003bv-50; Tue, 10 Jul 2012 00:00:27 -0400 Message-ID: <4FFBA8D8.3040604@greatbaysoftware.com> Date: Tue, 10 Jul 2012 00:00:24 -0400 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 MIME-Version: 1.0 To: John Baldwin References: <4FDABA0B.5030702@greatbaysoftware.com> <201206150804.46341.jhb@freebsd.org> <4FE3DA14.9090506@greatbaysoftware.com> <201206221022.57632.jhb@freebsd.org> In-Reply-To: <201206221022.57632.jhb@freebsd.org> X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ecbiz102.inmotionhosting.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - greatbaysoftware.com Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-stable@freebsd.org Subject: Re: ? IO performance regression, post 8.1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jul 2012 04:44:36 -0000 Charles Owens Great Bay Software, Inc. v: 603.617.4844 m: 603.866.0860 On 6/22/12 10:22 AM, John Baldwin wrote: > On Thursday, June 21, 2012 10:36:04 pm Charles Owens wrote: >> On 6/15/12 8:04 AM, John Baldwin wrote: >>> On Friday, June 15, 2012 12:28:59 am Charles Owens wrote: >>>> Hello FreeBSD folk, >>>> >>>> We're seeing what appears to be a storage performance regression as we >>>> try to move from 8.1 (i386) to 8.3. We looked at 8.2 also and it >>>> appears that the regression happened between 8.1 and 8.2. >>>> >>>> Our system is an Intel S5520UR Server with 12 GB RAM, dual 4-core CPUs. >>>> Storage is a LSI MegaSAS 1078 controller (mfi) in a RAID-10 >>>> configuration, using UFS + geom_journal for filesystem. >>>> >>>> Postgresql performance, as seen via pgbench, dropped by approx 20%. >>>> This testing was done with our usual PAE-enabled kernels. We then went >>>> back to GENERIC kernels and did comparisons using "bonnie", results >>>> below. Following that is a kernel boot log. >>>> >>>> Notably, we're seeing this regression only with our RAID mfi(4) based >>>> systems. Notably, from looking at FreeBSD source changelogs it appears >>>> that the mfi(4) code has seen some changes since 8.1. >>> Between 8.1 and 8.2 mfi has not had any significant changes. The only changes >>> made to sys/dev/mfi were to add a new constant: >>> >>>> svn diff svn+ssh://svn.freebsd.org/base/releng/8.1/sys/dev/mfi >>> svn+ssh://svn.freebsd.org/base/releng/8.2/sys/dev/mfi >>> Index: mfireg.h >>> =================================================================== >>> --- mfireg.h (.../8.1/sys/dev/mfi) (revision 237134) >>> +++ mfireg.h (.../8.2/sys/dev/mfi) (revision 237134) >>> @@ -975,7 +975,9 @@ >>> MFI_PD_STATE_OFFLINE = 0x10, >>> MFI_PD_STATE_FAILED = 0x11, >>> MFI_PD_STATE_REBUILD = 0x14, >>> - MFI_PD_STATE_ONLINE = 0x18 >>> + MFI_PD_STATE_ONLINE = 0x18, >>> + MFI_PD_STATE_COPYBACK = 0x20, >>> + MFI_PD_STATE_SYSTEM = 0x40 >>> }; >>> >>> union mfi_ld_ref { >>> >>> The difference in write performance must be due to something else. You >>> mentioned you are using UFS + gjournal. I think gjournal uses BIO_FLUSH, so I >>> wonder if this is related: >>> >>> ------------------------------------------------------------------------ >>> r212939 | gibbs | 2010-09-20 19:39:00 -0400 (Mon, 20 Sep 2010) | 61 lines >>> >>> MFC 212160: >>> >>> Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic. >>> Add the BIO_ORDERED flag for struct bio and update bio clients to use it. >>> >>> The barrier semantics of bioq_insert_tail() were broken in two ways: >>> >>> o In bioq_disksort(), an added bio could be inserted at the head of >>> the queue, even when a barrier was present, if the sort key for >>> the new entry was less than that of the last queued barrier bio. >>> >>> o The last_offset used to generate the sort key for newly queued bios >>> did not stay at the position of the barrier until either the >>> barrier was de-queued, or a new barrier (which updates last_offset) >>> was queued. When a barrier is in effect, we know that the disk >>> will pass through the barrier position just before the >>> "blocked bios" are released, so using the barrier's offset for >>> last_offset is the optimal choice. >>> >>> sys/geom/sched/subr_disk.c: >>> sys/kern/subr_disk.c: >>> o Update last_offset in bioq_insert_tail(). >>> >>> o Only update last_offset in bioq_remove() if the removed bio is >>> at the head of the queue (typically due to a call via >>> bioq_takefirst()) and no barrier is active. >>> >>> o In bioq_disksort(), if we have a barrier (insert_point is non-NULL), >>> set prev to the barrier and cur to it's next element. Now that >>> last_offset is kept at the barrier position, this change isn't >>> strictly necessary, but since we have to take a decision branch >>> anyway, it does avoid one, no-op, loop iteration in the while >>> loop that immediately follows. >>> >>> o In bioq_disksort(), bypass the normal sort for bios with the >>> BIO_ORDERED attribute and instead insert them into the queue >>> with bioq_insert_tail(). bioq_insert_tail() not only gives >>> the desired command order during insertion, but also provides >>> barrier semantics so that commands disksorted in the future >>> cannot pass the just enqueued transaction. >>> >>> sys/sys/bio.h: >>> Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio. >>> >>> sys/cam/ata/ata_da.c: >>> sys/cam/scsi/scsi_da.c >>> Use an ordered command for SCSI/ATA-NCQ commands issued in >>> response to bios with the BIO_ORDERED flag set. >>> >>> sys/cam/scsi/scsi_da.c >>> Use an ordered tag when issuing a synchronize cache command. >>> >>> Wrap some lines to 80 columns. >>> >>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>> sys/geom/geom_io.c >>> Mark bios with the BIO_FLUSH command as BIO_ORDERED. >>> >>> Sponsored by: Spectra Logic Corporation >>> ------------------------------------------------------------------------ >>> >>> Can you try perhaps commenting out the 'bp->bio_flags |= BIO_ORDERED' line >>> changed in geom_io.c in 8.2? That would be effectively reverting this >>> portion of the diff: >>> >>> Index: geom_io.c >>> =================================================================== >>> --- geom_io.c (.../8.1/sys/geom) (revision 237134) >>> +++ geom_io.c (.../8.2/sys/geom) (revision 237134) >>> @@ -265,6 +265,7 @@ >>> g_trace(G_T_BIO, "bio_flush(%s)", cp->provider->name); >>> bp = g_alloc_bio(); >>> bp->bio_cmd = BIO_FLUSH; >>> + bp->bio_flags |= BIO_ORDERED; >>> bp->bio_done = NULL; >>> bp->bio_attribute = NULL; >>> bp->bio_offset = cp->provider->mediasize; >>> >> John... thanks for the suggestion. I've built and tested a kernel with >> this change made. Result: no change (same performance as with >> 8.2-GENERIC). Any thoughts as to where to go next? > Hmm. That seemed the most plausible candidate when I looked at this. > > Do you use quotas (there is one change in UFS related to quotas)? > > There are 5 changes that involve sys/kern/vfs_bio.c in 8.2: > 209459, 212229, 212562, 212583, and 213890. > > Can you possibly test out kernels from stable/8 at those revisions on an 8.1 > world and see if you can narrow it down futher? > > Barring that, can you do a binary search of kernels from stable/8 between 8.1 > and 8.2 on an 8.1 world to see which commit caused the change in write > performance? > I've been sidetracked for a bit... and am now starting to work through the revisions you've suggested. Ahead of that, I've reinstalled the system with filesystems configured *without* geom_journal and repeated tests with both 8.1 and 8.2 kernels. Finding: no issue -- in fact 8.2 performs slightly better than 8.1, as we might hope for in general. (bonnie output below) Not sure if this really helps us narrow things down, but geom_journal is certainly part of the story. I'll have more results in the next day or so. 8.1 GENERIC -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 100 196050 99.9 344414 43.4 535679 43.2 152666 99.7 3233854 100.0 251145.9 215.2 8.2 GENERIC -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 100 190687 97.8 436428 43.0 537990 42.9 155766 98.3 4268089 100.0 231347.6 240.6