Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jul 2013 17:46:27 -0400
From:      Mark Johnston <markj@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, smh@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: syncer causing latency spikes
Message-ID:  <20130717214627.GC8289@charmander>
In-Reply-To: <201307171615.35484.jhb@freebsd.org>
References:  <20130717180720.GA8289@charmander> <20130717191852.GS5991@kib.kiev.ua> <201307171615.35484.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 17, 2013 at 04:15:35PM -0400, John Baldwin wrote:
> On Wednesday, July 17, 2013 3:18:52 pm Konstantin Belousov wrote:
> > On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote:
> > > During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
> > > happening up to 8 or 10 times a second. When this happens, a bunch of
> > > postgres threads become blocked in vn_write() waiting for the vnode lock
> > > to become free. It looks like the write-clustering code is limited to
> > > using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
> > > to anything greater than 256.
> > Syncer is probably just a victim of profiling.  Would postgres called
> > fsync(2), you then blame the fsync code for the pauses.
> > 
> > Just add a tunable to allow the user to manually-tune the nswbuf,
> > regardless of the buffer cache sizing.  And yes, nswbuf default max
> > probably should be bumped to something like 1024, at least on 64bit
> > architectures which do not starve for kernel memory.
> 
> Also, if you are seeing I/O stalls with mfi(4), then you might need a
> firmware update for your mfi(4) controller.  cc'ing smh@ who knows more about 
> that particular issue (IIRC).

I tried upgrading the firmware to the latest available image (I believe
it was from March), but that didn't help. I wouldn't call my problem a
stall in the sense of commands timing out (which I've seen before), it's
just that we manage to generate a large enough backlog that the
driver/controller take at least several seconds to clear it, during
which all I/O is stalled in the kernel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130717214627.GC8289>