Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jan 2016 10:12:32 +0530
From:      Raghavendra G <raghavendra@gluster.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Hubbard Jordan <jkh@ixsystems.com>, freebsd-fs <freebsd-fs@freebsd.org>,  Gluster Devel <gluster-devel@gluster.org>
Subject:   Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage
Message-ID:  <CADRNtgStOg8UZfxNt-SzvvPf7d1J7CC_gi49ww3BbixU0Ey-rg@mail.gmail.com>
In-Reply-To: <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca>
References:  <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <20151230103152.GS13942@ndevos-x240.usersys.redhat.com> <2D8C2729-D556-479B-B4E2-66E1BB222F41@ixsystems.com> <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Sorry for the delayed reply. Had missed out this mail. Please find my
comments inlined.

On Thu, Dec 31, 2015 at 4:56 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Jordan Hubbard wrote:
> >
> > > On Dec 30, 2015, at 2:31 AM, Niels de Vos <ndevos@redhat.com> wrote:
> > >
> > >> I'm guessing that Linux uses the event-epoll stuff instead of
> event-poll,
> > >> so it wouldn't exhibit this. Is that correct?
> > >
> > > Well, both. most (if not all) Linux builds will use event-poll. But,
> > > that calls epoll_wait() with a timeout of 1 millisecond as well.
> > >
> > >> Thanks for any information on this, rick
> > >> ps: I am tempted to just crank the timeout of 1msec up to 10 or
> 20msec.
> > >
> > > Yes, that is probably what I would do too. And have both poll functio=
ns
> > > use the same timeout, have it defined in libglusterfs/src/event.h. We
> > > could make it a configurable option too, but I do not think it is ver=
y
> > > useful to have.
> >
> > I guess this begs the question - what=E2=80=99s the actual purpose of p=
olling
> for an
> > event with a 1 millisecond timeout?  If it was some sort of heartbeat
> check,
> > one might imagine that would be better served by a timer with nothing
> close
> > to 1 millisecond as an interval (that would be one seriously aggressive
> > heartbeat) and if filesystem events are incoming that glusterfs needs t=
o
> > respond to, why timeout at all?
> >
> If I understand the code (I probably don't) the timeout allows the loop
> to call a function that may add new fd's to be polled. (If I'm right,
> the new ones might not get serviced.)
>

Yes, that's correct. Since in poll we pass the fds to be polled in an array
as an argument, the only place where we can add/remove fds to be polled is
at the time we call poll sycall. To make adding/removing fds from polling
to be more responsive, poll timeouts "frequently enough". The trade-off we
are considering here is between:

1. Number of calls to poll
           vs
2. Responsiveness of adding/removing a new fd from polling.

For clients, there is not much change of the list of fds that are polled.
However, for bricks/server this list can vary frequently as new clients are
connected/disconnected.

Since epoll provides a way to add new fds for polling while an epoll_wait
is in progress (unlike poll), the timeout of epoll_wait is infinite. Also
note that on systems where both epoll and poll are available, epoll is
preferred over poll.


> I'll post once I've tried a longer timeout and if it seems ok, I will
> put it in the Redhat bugs database (as mentioned in the last post).
> In its current form, it's fine for testing.
>
> > I also have a broader question to go with the specific one:  We (at
> > iXsystems) were attempting to engage with some of the Red Hat folks bac=
k
> > when the FreeBSD port was first done, in the hope of getting it more
> > =E2=80=9Cofficially supported=E2=80=9D for FreeBSD and perhaps even don=
ating some more
> > serious stress-testing and integration work for it, but when those Red
> Hat
> > folks moved on we lost continuity and the effort stalled.  Who at Red H=
at
> > would / could we work with in getting this back on track?  We=E2=80=99d=
 like to
> > integrate glusterfs with FreeNAS 10, and in fact have already done so b=
ut
> > it=E2=80=99s still early days and we=E2=80=99re not even really sure wh=
at we have yet.
> >
> Just fyi..sofar, working with FreeBSD11/head and the port of 3.7.6 (the
> port tarball
> is in FreeBSD PR#194409), the only GlusterFS problem I've encountered is
> the above one. I'm not sure why this isn't in /usr/ports, but that would =
be
> nice as it might get more people trying it. (I'm a src comitter, but not =
a
> ports one.)
>
> However, I have several patches for the FreeBSD fuse interface and for
> a mount_glusterfs mount to work ok you need a couple of them.
> 1 - When an open decides to do DIRECT_IO after the file has done buffer
>     cache I/O the buffer cache needs to be invalidated so you don't get
>     stale cached data.
> 2 - For a WRONLY write, you need to force DIRECT_IO (or do a read/write
> open).
>     If you don't do this, the buffer cache code will get stuck when tryin=
g
>     to read a block in before writing a partial block. (I think this is
>     what FreeBSD PR#194293 is caused by.)
>
> Because I won't be able to do svn until April, these patches won't make i=
t
> into head for a while, but they will both be in PR#194293 within hours.
>
> The others add features like extended attributes, advisory byte range
> locking
> and the changes needed to export the fuse/glusterfs mount via the FreeBSD
> kernel nfsd. If anyone wants/needs these patches, email and I can send yo=
u
> them.
>
> A bit off your topic, but until you have the fixes for FreeBSD fuse, you
> probably can't do a lot of serious testing.
> (I don't know, but I'd guess that FreeNAS has about the same fuse module
>  code as FreeBSD's head, since it hasn't been changed much in head
> recently.)
>
> Thanks everyone for your help with this, rick
>
> > Thanks,
> >
> > - Jordan
> >
> >
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



--=20
Raghavendra G



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADRNtgStOg8UZfxNt-SzvvPf7d1J7CC_gi49ww3BbixU0Ey-rg>