Date: Thu, 3 Sep 2015 13:17:44 -0600 From: Warner Losh <imp@bsdimp.com> To: Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r287405 - head/sys/geom Message-ID: <CANCZdfp=8AV7Uu3HM65k%2BNuRxr-Uah%2Be9-NBV2Z9QOhKTfLCKA@mail.gmail.com> In-Reply-To: <20150903120606.GB75381@brick.home> References: <201509021729.t82HTULW036119@repo.freebsd.org> <20150903120606.GB75381@brick.home>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 3, 2015 at 6:06 AM, Edward Tomasz Napiera=C5=82a <trasz@freebsd= .org> wrote: > On 0902T1729, Warner Losh wrote: > > Author: imp > > Date: Wed Sep 2 17:29:30 2015 > > New Revision: 287405 > > URL: https://svnweb.freebsd.org/changeset/base/287405 > > > > Log: > > After the introduction of direct dispatch, the pacing code in g_down(= ) > > broke in two ways. One, the pacing variable was accessed in multiple > > threads in an unsafe way. Two, since large numbers of I/O could come > > down from the buf layer at one time, large numbers of allocation > > failures could happen all at once, resulting in a huge pace value tha= t > > would limit I/Os to 10 IOPS for minutes (or even hours) at a > > time. While a real solution to these problems requires substantial > > work (to go to a no-allocation after the first model, or to have some > > way to wait for more memory with some kind of reserve for pager and > > swapper requests), it is relatively easy to make this simplistic > > pacing less pathological. > > Shouldn't we emit some warning the first time this happens, to aid > in debugging strange IO performance degradation? Something like... No. We shouldn't. We should fix the damn underlying problem instead of polishing this turd of a rate limiter. It is really horrible by design and my patches make it less horrible only by degree. I know it is tempting, but this is no different than dropping a packet or many of the other places where we return ENOMEM and do retries or not based on the local policy. Well, apart from the horrible policy we implement.... > > > @@ -688,7 +699,7 @@ g_io_deliver(struct bio *bp, int error) > > bp->bio_driver2 =3D NULL; > > bp->bio_pflags =3D 0; > > g_io_request(bp, cp); > > if (warned_about_pace =3D=3D 0) { > printf("WARNING: GEOM io allocation failed; expect reduce= d > IO performance\n"); > warned_about_pace =3D 1; > } > We have no good facility to pace these messages out, and failures are typically very transient. The old code would explode a shortage that was < 1s into reduced I/O performance for minutes or hours (yes, hours). The new code typically will see shortages in the millisecond to tens of millisecond duration. Adding this would be just noise. We don't whine when we drop packets in the network layer. We just provide a count of dropped packets. This also causes a performance degradation. There's adequate statistical reporting in uma to see if this is happening. Anyway, I will spend exactly 0 more energy on polishing the POS pace turd. I'll spend somewhat more time and energy on exploring real fixes to this issue. Warner > - pace++; > > + pace =3D 1; > > return; > > } > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp=8AV7Uu3HM65k%2BNuRxr-Uah%2Be9-NBV2Z9QOhKTfLCKA>