Date: Fri, 8 Jan 2016 10:12:32 +0530 From: Raghavendra G <raghavendra@gluster.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Hubbard Jordan <jkh@ixsystems.com>, freebsd-fs <freebsd-fs@freebsd.org>, Gluster Devel <gluster-devel@gluster.org> Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage Message-ID: <CADRNtgStOg8UZfxNt-SzvvPf7d1J7CC_gi49ww3BbixU0Ey-rg@mail.gmail.com> In-Reply-To: <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca> References: <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <20151230103152.GS13942@ndevos-x240.usersys.redhat.com> <2D8C2729-D556-479B-B4E2-66E1BB222F41@ixsystems.com> <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Sorry for the delayed reply. Had missed out this mail. Please find my comments inlined. On Thu, Dec 31, 2015 at 4:56 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Jordan Hubbard wrote: > > > > > On Dec 30, 2015, at 2:31 AM, Niels de Vos <ndevos@redhat.com> wrote: > > > > > >> I'm guessing that Linux uses the event-epoll stuff instead of > event-poll, > > >> so it wouldn't exhibit this. Is that correct? > > > > > > Well, both. most (if not all) Linux builds will use event-poll. But, > > > that calls epoll_wait() with a timeout of 1 millisecond as well. > > > > > >> Thanks for any information on this, rick > > >> ps: I am tempted to just crank the timeout of 1msec up to 10 or > 20msec. > > > > > > Yes, that is probably what I would do too. And have both poll functio= ns > > > use the same timeout, have it defined in libglusterfs/src/event.h. We > > > could make it a configurable option too, but I do not think it is ver= y > > > useful to have. > > > > I guess this begs the question - what=E2=80=99s the actual purpose of p= olling > for an > > event with a 1 millisecond timeout? If it was some sort of heartbeat > check, > > one might imagine that would be better served by a timer with nothing > close > > to 1 millisecond as an interval (that would be one seriously aggressive > > heartbeat) and if filesystem events are incoming that glusterfs needs t= o > > respond to, why timeout at all? > > > If I understand the code (I probably don't) the timeout allows the loop > to call a function that may add new fd's to be polled. (If I'm right, > the new ones might not get serviced.) > Yes, that's correct. Since in poll we pass the fds to be polled in an array as an argument, the only place where we can add/remove fds to be polled is at the time we call poll sycall. To make adding/removing fds from polling to be more responsive, poll timeouts "frequently enough". The trade-off we are considering here is between: 1. Number of calls to poll vs 2. Responsiveness of adding/removing a new fd from polling. For clients, there is not much change of the list of fds that are polled. However, for bricks/server this list can vary frequently as new clients are connected/disconnected. Since epoll provides a way to add new fds for polling while an epoll_wait is in progress (unlike poll), the timeout of epoll_wait is infinite. Also note that on systems where both epoll and poll are available, epoll is preferred over poll. > I'll post once I've tried a longer timeout and if it seems ok, I will > put it in the Redhat bugs database (as mentioned in the last post). > In its current form, it's fine for testing. > > > I also have a broader question to go with the specific one: We (at > > iXsystems) were attempting to engage with some of the Red Hat folks bac= k > > when the FreeBSD port was first done, in the hope of getting it more > > =E2=80=9Cofficially supported=E2=80=9D for FreeBSD and perhaps even don= ating some more > > serious stress-testing and integration work for it, but when those Red > Hat > > folks moved on we lost continuity and the effort stalled. Who at Red H= at > > would / could we work with in getting this back on track? We=E2=80=99d= like to > > integrate glusterfs with FreeNAS 10, and in fact have already done so b= ut > > it=E2=80=99s still early days and we=E2=80=99re not even really sure wh= at we have yet. > > > Just fyi..sofar, working with FreeBSD11/head and the port of 3.7.6 (the > port tarball > is in FreeBSD PR#194409), the only GlusterFS problem I've encountered is > the above one. I'm not sure why this isn't in /usr/ports, but that would = be > nice as it might get more people trying it. (I'm a src comitter, but not = a > ports one.) > > However, I have several patches for the FreeBSD fuse interface and for > a mount_glusterfs mount to work ok you need a couple of them. > 1 - When an open decides to do DIRECT_IO after the file has done buffer > cache I/O the buffer cache needs to be invalidated so you don't get > stale cached data. > 2 - For a WRONLY write, you need to force DIRECT_IO (or do a read/write > open). > If you don't do this, the buffer cache code will get stuck when tryin= g > to read a block in before writing a partial block. (I think this is > what FreeBSD PR#194293 is caused by.) > > Because I won't be able to do svn until April, these patches won't make i= t > into head for a while, but they will both be in PR#194293 within hours. > > The others add features like extended attributes, advisory byte range > locking > and the changes needed to export the fuse/glusterfs mount via the FreeBSD > kernel nfsd. If anyone wants/needs these patches, email and I can send yo= u > them. > > A bit off your topic, but until you have the fixes for FreeBSD fuse, you > probably can't do a lot of serious testing. > (I don't know, but I'd guess that FreeNAS has about the same fuse module > code as FreeBSD's head, since it hasn't been changed much in head > recently.) > > Thanks everyone for your help with this, rick > > > Thanks, > > > > - Jordan > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > --=20 Raghavendra G
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADRNtgStOg8UZfxNt-SzvvPf7d1J7CC_gi49ww3BbixU0Ey-rg>