From owner-freebsd-fs@freebsd.org Fri Jan 8 04:42:32 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A91E7A6604E for ; Fri, 8 Jan 2016 04:42:32 +0000 (UTC) (envelope-from raghavendra.hg@gmail.com) Received: from mail-pf0-x22a.google.com (mail-pf0-x22a.google.com [IPv6:2607:f8b0:400e:c00::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7EFD512EF for ; Fri, 8 Jan 2016 04:42:32 +0000 (UTC) (envelope-from raghavendra.hg@gmail.com) Received: by mail-pf0-x22a.google.com with SMTP id 65so3943568pff.2 for ; Thu, 07 Jan 2016 20:42:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=kPqpF9SRtW8WW3/pVLEFqOuZ/7tAsy0DyRQSYlHqMZc=; b=T5kSnrCxN3NiMN6Z5oHMKhXHWkHBIgpNvh2tfMzQn+QpTY+8jQc/ufcjGMPP8VjkZO /ODPRY+5zoq+R9DdjvqxFeEbiTJIOmHrlkMXK0p1XrG3OZoRDTUuzyoMpdtbZ23lWqkD mAuOlIKlKqIJeFMnZ0oCfUoA384iBUNE4cFNmViMFhzELUmyYaqhC6iffOHOvEuOokEF 2Z6I4+ZjlARJ7+weqe7kgaENs6iPIBJjJY4wgGqBgo5msJhwSkQKKOkGm5Jd8Gapvmn+ iWLMmP1JcaVgq75i5WJkVXMT24cvDKnIUnDNbMoT4ZLgSwfOuEjjkaT3S5m1VqzQtTs+ nbCg== MIME-Version: 1.0 X-Received: by 10.98.67.212 with SMTP id l81mr1586380pfi.90.1452228152089; Thu, 07 Jan 2016 20:42:32 -0800 (PST) Sender: raghavendra.hg@gmail.com Received: by 10.66.76.229 with HTTP; Thu, 7 Jan 2016 20:42:32 -0800 (PST) In-Reply-To: <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca> References: <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <20151230103152.GS13942@ndevos-x240.usersys.redhat.com> <2D8C2729-D556-479B-B4E2-66E1BB222F41@ixsystems.com> <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca> Date: Fri, 8 Jan 2016 10:12:32 +0530 X-Google-Sender-Auth: 952Gb87wSa3jndUZyont4yAojH4 Message-ID: Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage From: Raghavendra G To: Rick Macklem Cc: Hubbard Jordan , freebsd-fs , Gluster Devel Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jan 2016 04:42:32 -0000 Sorry for the delayed reply. Had missed out this mail. Please find my comments inlined. On Thu, Dec 31, 2015 at 4:56 AM, Rick Macklem wrote: > Jordan Hubbard wrote: > > > > > On Dec 30, 2015, at 2:31 AM, Niels de Vos wrote: > > > > > >> I'm guessing that Linux uses the event-epoll stuff instead of > event-poll, > > >> so it wouldn't exhibit this. Is that correct? > > > > > > Well, both. most (if not all) Linux builds will use event-poll. But, > > > that calls epoll_wait() with a timeout of 1 millisecond as well. > > > > > >> Thanks for any information on this, rick > > >> ps: I am tempted to just crank the timeout of 1msec up to 10 or > 20msec. > > > > > > Yes, that is probably what I would do too. And have both poll functio= ns > > > use the same timeout, have it defined in libglusterfs/src/event.h. We > > > could make it a configurable option too, but I do not think it is ver= y > > > useful to have. > > > > I guess this begs the question - what=E2=80=99s the actual purpose of p= olling > for an > > event with a 1 millisecond timeout? If it was some sort of heartbeat > check, > > one might imagine that would be better served by a timer with nothing > close > > to 1 millisecond as an interval (that would be one seriously aggressive > > heartbeat) and if filesystem events are incoming that glusterfs needs t= o > > respond to, why timeout at all? > > > If I understand the code (I probably don't) the timeout allows the loop > to call a function that may add new fd's to be polled. (If I'm right, > the new ones might not get serviced.) > Yes, that's correct. Since in poll we pass the fds to be polled in an array as an argument, the only place where we can add/remove fds to be polled is at the time we call poll sycall. To make adding/removing fds from polling to be more responsive, poll timeouts "frequently enough". The trade-off we are considering here is between: 1. Number of calls to poll vs 2. Responsiveness of adding/removing a new fd from polling. For clients, there is not much change of the list of fds that are polled. However, for bricks/server this list can vary frequently as new clients are connected/disconnected. Since epoll provides a way to add new fds for polling while an epoll_wait is in progress (unlike poll), the timeout of epoll_wait is infinite. Also note that on systems where both epoll and poll are available, epoll is preferred over poll. > I'll post once I've tried a longer timeout and if it seems ok, I will > put it in the Redhat bugs database (as mentioned in the last post). > In its current form, it's fine for testing. > > > I also have a broader question to go with the specific one: We (at > > iXsystems) were attempting to engage with some of the Red Hat folks bac= k > > when the FreeBSD port was first done, in the hope of getting it more > > =E2=80=9Cofficially supported=E2=80=9D for FreeBSD and perhaps even don= ating some more > > serious stress-testing and integration work for it, but when those Red > Hat > > folks moved on we lost continuity and the effort stalled. Who at Red H= at > > would / could we work with in getting this back on track? We=E2=80=99d= like to > > integrate glusterfs with FreeNAS 10, and in fact have already done so b= ut > > it=E2=80=99s still early days and we=E2=80=99re not even really sure wh= at we have yet. > > > Just fyi..sofar, working with FreeBSD11/head and the port of 3.7.6 (the > port tarball > is in FreeBSD PR#194409), the only GlusterFS problem I've encountered is > the above one. I'm not sure why this isn't in /usr/ports, but that would = be > nice as it might get more people trying it. (I'm a src comitter, but not = a > ports one.) > > However, I have several patches for the FreeBSD fuse interface and for > a mount_glusterfs mount to work ok you need a couple of them. > 1 - When an open decides to do DIRECT_IO after the file has done buffer > cache I/O the buffer cache needs to be invalidated so you don't get > stale cached data. > 2 - For a WRONLY write, you need to force DIRECT_IO (or do a read/write > open). > If you don't do this, the buffer cache code will get stuck when tryin= g > to read a block in before writing a partial block. (I think this is > what FreeBSD PR#194293 is caused by.) > > Because I won't be able to do svn until April, these patches won't make i= t > into head for a while, but they will both be in PR#194293 within hours. > > The others add features like extended attributes, advisory byte range > locking > and the changes needed to export the fuse/glusterfs mount via the FreeBSD > kernel nfsd. If anyone wants/needs these patches, email and I can send yo= u > them. > > A bit off your topic, but until you have the fixes for FreeBSD fuse, you > probably can't do a lot of serious testing. > (I don't know, but I'd guess that FreeNAS has about the same fuse module > code as FreeBSD's head, since it hasn't been changed much in head > recently.) > > Thanks everyone for your help with this, rick > > > Thanks, > > > > - Jordan > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > --=20 Raghavendra G