Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Jan 2016 18:03:33 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Raghavendra Gowdappa <rgowdapp@redhat.com>
Cc:        Jeff Darcy <jdarcy@redhat.com>, Raghavendra G <raghavendra@gluster.com>,  freebsd-fs <freebsd-fs@freebsd.org>, Hubbard Jordan <jkh@ixsystems.com>,  Xavier Hernandez <xhernandez@datalab.es>,  Gluster Devel <gluster-devel@gluster.org>
Subject:   Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage
Message-ID:  <2123305838.166743619.1453244613417.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <7769801.11211464.1453183279015.JavaMail.zimbra@redhat.com>
References:  <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <568F6D07.6070500@datalab.es> <CADRNtgRM17Eg3Z=LWifVNo=ai72dMiEVRKS3RwNfQ-dK7Pspew@mail.gmail.com> <1924941590.6473225.1452248249994.JavaMail.zimbra@redhat.com> <981529129.154244852.1452304799182.JavaMail.zimbra@uoguelph.ca> <1256214214.7158114.1452310490692.JavaMail.zimbra@redhat.com> <1045057902.165261325.1453156629344.JavaMail.zimbra@uoguelph.ca> <7769801.11211464.1453183279015.JavaMail.zimbra@redhat.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Raghavendra Gowdappa wrote:
> 
> 
> ----- Original Message -----
> > From: "Rick Macklem" <rmacklem@uoguelph.ca>
> > To: "Raghavendra Gowdappa" <rgowdapp@redhat.com>
> > Cc: "Jeff Darcy" <jdarcy@redhat.com>, "Raghavendra G"
> > <raghavendra@gluster.com>, "freebsd-fs"
> > <freebsd-fs@freebsd.org>, "Hubbard Jordan" <jkh@ixsystems.com>, "Xavier
> > Hernandez" <xhernandez@datalab.es>, "Gluster
> > Devel" <gluster-devel@gluster.org>
> > Sent: Tuesday, January 19, 2016 4:07:09 AM
> > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of
> > CPU usage
> > 
> > Raghavendra Gowdappa wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Rick Macklem" <rmacklem@uoguelph.ca>
> > > > To: "Jeff Darcy" <jdarcy@redhat.com>
> > > > Cc: "Raghavendra G" <raghavendra@gluster.com>, "freebsd-fs"
> > > > <freebsd-fs@freebsd.org>, "Hubbard Jordan"
> > > > <jkh@ixsystems.com>, "Xavier Hernandez" <xhernandez@datalab.es>,
> > > > "Gluster
> > > > Devel" <gluster-devel@gluster.org>
> > > > Sent: Saturday, January 9, 2016 7:29:59 AM
> > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot
> > > > of
> > > > CPU usage
> > > > 
> > > > Jeff Darcy wrote:
> > > > > > > I don't know anything about gluster's poll implementation so I
> > > > > > > may
> > > > > > > be totally wrong, but would it be possible to use an eventfd (or
> > > > > > > a
> > > > > > > pipe if eventfd is not supported) to signal the need to add more
> > > > > > > file descriptors to the poll call ?
> > > > > > >
> > > > > > >
> > > > > > > The poll call should listen on this new fd. When we need to
> > > > > > > change
> > > > > > > the fd list, we should simply write to the eventfd or pipe from
> > > > > > > another thread.  This will cause the poll call to return and we
> > > > > > > will
> > > > > > > be able to change the fd list without having a short timeout nor
> > > > > > > having to decide on any trade-off.
> > > > > > 
> > > > > >
> > > > > > Thats a nice idea. Based on my understanding of why timeouts are
> > > > > > being
> > > > > > used, this approach can work.
> > > > > 
> > > > > The own-thread code which preceded the current poll implementation
> > > > > did
> > > > > something similar, using a pipe fd to be woken up for new *outgoing*
> > > > > messages.  That code still exists, and might provide some insight
> > > > > into
> > > > > how to do this for the current poll code.
> > > > I took a look at event-poll.c and found something interesting...
> > > > - A pipe called "breaker" is already set up by event_pool_new_poll()
> > > > and
> > > >   closed by event_pool_destroy_poll(), however it never gets used for
> > > >   anything.
> > > 
> > > I did a check on history, but couldn't find any information on why it was
> > > removed. Can you send this patch to http://review.gluster.org ? We can
> > > review and merge the patch over there. If you are not aware, development
> > > work flow can be found at:
> > > 
> > > http://www.gluster.org/community/documentation/index.php/Developers
> > > 
> > Actually, the patch turned out to be a flop. Sometimes a fuse mount would
> > end
> > up with an empty file system with the patch. (I don't know why it was
> > broken,
> > but maybe the original author tan into issues as well?)
> 
> +static void
> +event_pool_changed (struct event_pool *event_pool)
> +{
> +
> +        /* Write a byte into the breaker pipe to wake up poll(). */
> +        if (event_pool->breaker[1] >= 0)
> +                write(event_pool->breaker[1], "X", 1);
> +}
> 
> breaker is set to non-blocking on both read and write ends. So, probably
> write might be failing sometimes with EAGAIN/EBUSY and thereby preventing
> the socket from being registered. Probably that might be the reason?
> 
>         if (event_pool->breaker[1] >= 0) {
>                  do {
>                         ret = write(event_pool->breaker[1], "X", 1);
>                  } while (ret != 1);
>         }
> 
> Also similar logic might be required while flushing out junk from read end
> too.
> 
Just fyi, I did have a loop on the read() code. Past experience is that a
write() only fails when the pipe is full and, as someone else noted, that
should have triggered the poll().

I did have log message stuff in the code I was testing, but I didn't look
at it closely once I saw it was broken. I just went back to what worked.
I may test this some more someday, but until then, the code in 3.7.6 works ok.

Thanks for your comments, rick

> > 
> > Anyhow, I am now using the 3.7.6 event-poll.c code except that I have
> > increased
> > the timeout from 1msec->10msec. (Going from 1->5->10 didn't seem to cause a
> > problem, but I got slower test runs when I increased to 20msec, so I've
> > settled on
> > 10mses. This does reduce the CPU usage when the GlusterFS file systems
> > aren't
> > active.)
> > I will submit this one line change to your workflow if it continues to test
> > ok.
> > 
> > Thanks for everyone's input, rick
> > 
> > > > 
> > > > So, I added a few lines of code that writes a byte to it whenever the
> > > > list
> > > > of
> > > > file descriptors is changed and read when poll() returns, if its
> > > > revents
> > > > is
> > > > set.
> > > > I also changed the timeout to -1 (infinity) and it seems to work for a
> > > > trivial
> > > > test.
> > > > --> Btw, I also noticed the "changed" variable gets set to 1 on a
> > > > change,
> > > > but
> > > >     never reset to 0. I didn't change this, since it looks "racey".
> > > >     (ie.
> > > >     I
> > > >     think you could easily get a race between a thread that clears it
> > > >     and
> > > >     one
> > > >     that adds a new fd.)
> > > > 
> > > > A slightly safer version of the patch would set a long (100msec ??)
> > > > timeout
> > > > instead
> > > > of -1.
> > > > 
> > > > Anyhow, I've attached the patch in case anyone would like to try it and
> > > > will
> > > > create a bug report for this after I've had more time to test it.
> > > > (I only use a couple of laptops, so my testing will be minimal.)
> > > > 
> > > > Thanks for all the help, rick
> > > > 
> > > > > _______________________________________________
> > > > > freebsd-fs@freebsd.org mailing list
> > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> > > > > 
> > > > 
> > > 
> > 
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2123305838.166743619.1453244613417.JavaMail.zimbra>