From owner-freebsd-fs@freebsd.org Fri Jan 8 08:22:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0E8BA67A40 for ; Fri, 8 Jan 2016 08:22:09 +0000 (UTC) (envelope-from raghavendra.hg@gmail.com) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9EC1119F0 for ; Fri, 8 Jan 2016 08:22:09 +0000 (UTC) (envelope-from raghavendra.hg@gmail.com) Received: by mail-pa0-x22d.google.com with SMTP id ho8so17803045pac.2 for ; Fri, 08 Jan 2016 00:22:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Uw5UiKy00yLqqWCkOm+zdzTikhjrzX2Xudw1dcsbaTo=; b=P8Nzx7EfK/DhFzJZiDwCELOyIK4xud//jehd3N5f53qsvJbXAvdQF2ep6MB4qfNeJV 8HUjGtJS111VrSz2foTMiRNVnRMKXAJ2oVRUmf/qGyuT07f+F+8gbd+QbqvXPJkgldQ9 3LH0hK6iBqVpCFpyG8oG465CpK9TdP0Jjc8qqk16CBDzDgltCUQor4F4HGk28Lj+pqv8 v95tdd6HpQ6YPS8vyUTKGCqvsAGdyqrQEXrkx4SX8hL/pdgd4NP4Vkq/+zdnTN68n48V 1nNXYKmx/IsWLNM6Z94gTLNVK6xnckYsgXHd3xn9Eb0qkRK97Q2eylMZ8lxc+16xBTQp 7MFQ== MIME-Version: 1.0 X-Received: by 10.66.218.225 with SMTP id pj1mr153155860pac.40.1452241329099; Fri, 08 Jan 2016 00:22:09 -0800 (PST) Sender: raghavendra.hg@gmail.com Received: by 10.66.76.229 with HTTP; Fri, 8 Jan 2016 00:22:09 -0800 (PST) In-Reply-To: <568F6D07.6070500@datalab.es> References: <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <20151230103152.GS13942@ndevos-x240.usersys.redhat.com> <2D8C2729-D556-479B-B4E2-66E1BB222F41@ixsystems.com> <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca> <568F6D07.6070500@datalab.es> Date: Fri, 8 Jan 2016 13:52:09 +0530 X-Google-Sender-Auth: pFn_1I4NhtXCAWaBxhJIR3W34jw Message-ID: Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage From: Raghavendra G To: Xavier Hernandez Cc: Rick Macklem , freebsd-fs , Hubbard Jordan , Gluster Devel Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jan 2016 08:22:09 -0000 On Fri, Jan 8, 2016 at 1:32 PM, Xavier Hernandez wrote: > > On 08/01/16 05:42, Raghavendra G wrote: > >> Sorry for the delayed reply. Had missed out this mail. Please find my >> comments inlined. >> >> On Thu, Dec 31, 2015 at 4:56 AM, Rick Macklem > > wrote: >> >> Jordan Hubbard wrote: >> > >> > > On Dec 30, 2015, at 2:31 AM, Niels de Vos > > wrote: >> > > >> > >> I'm guessing that Linux uses the event-epoll stuff instead of >> event-poll, >> > >> so it wouldn't exhibit this. Is that correct? >> > > >> > > Well, both. most (if not all) Linux builds will use event-poll. >> But, >> > > that calls epoll_wait() with a timeout of 1 millisecond as well. >> > > >> > >> Thanks for any information on this, rick >> > >> ps: I am tempted to just crank the timeout of 1msec up to 10 or >> 20msec. >> > > >> > > Yes, that is probably what I would do too. And have both poll >> functions >> > > use the same timeout, have it defined in >> libglusterfs/src/event.h. We >> > > could make it a configurable option too, but I do not think it i= s >> very >> > > useful to have. >> > >> > I guess this begs the question - what=E2=80=99s the actual purpose= of >> polling for an >> > event with a 1 millisecond timeout? If it was some sort of >> heartbeat check, >> > one might imagine that would be better served by a timer with >> nothing close >> > to 1 millisecond as an interval (that would be one seriously >> aggressive >> > heartbeat) and if filesystem events are incoming that glusterfs >> needs to >> > respond to, why timeout at all? >> > >> If I understand the code (I probably don't) the timeout allows the >> loop >> to call a function that may add new fd's to be polled. (If I'm right= , >> the new ones might not get serviced.) >> >> >> Yes, that's correct. Since in poll we pass the fds to be polled in an >> array as an argument, the only place where we can add/remove fds to be >> polled is at the time we call poll sycall. To make adding/removing fds >> from polling to be more responsive, poll timeouts "frequently enough". >> The trade-off we are considering here is between: >> >> 1. Number of calls to poll >> vs >> 2. Responsiveness of adding/removing a new fd from polling. >> >> For clients, there is not much change of the list of fds that are >> polled. However, for bricks/server this list can vary frequently as new >> clients are connected/disconnected. >> >> Since epoll provides a way to add new fds for polling while an >> epoll_wait is in progress (unlike poll), the timeout of epoll_wait is >> infinite. Also note that on systems where both epoll and poll are >> available, epoll is preferred over poll. >> > > I don't know anything about gluster's poll implementation so I may be > totally wrong, but would it be possible to use an eventfd (or a pipe if > eventfd is not supported) to signal the need to add more file descriptors > to the poll call ? > > The poll call should listen on this new fd. When we need to change the fd > list, we should simply write to the eventfd or pipe from another thread. > This will cause the poll call to return and we will be able to change the > fd list without having a short timeout nor having to decide on any > trade-off. > Thats a nice idea. Based on my understanding of why timeouts are being used, this approach can work. > > Just an idea... > > Xavi > > >> >> I'll post once I've tried a longer timeout and if it seems ok, I wil= l >> put it in the Redhat bugs database (as mentioned in the last post). >> In its current form, it's fine for testing. >> >> > I also have a broader question to go with the specific one: We (a= t >> > iXsystems) were attempting to engage with some of the Red Hat folk= s >> back >> > when the FreeBSD port was first done, in the hope of getting it mo= re >> > =E2=80=9Cofficially supported=E2=80=9D for FreeBSD and perhaps eve= n donating some >> more >> > serious stress-testing and integration work for it, but when those >> Red Hat >> > folks moved on we lost continuity and the effort stalled. Who at >> Red Hat >> > would / could we work with in getting this back on track? We=E2= =80=99d >> like to >> > integrate glusterfs with FreeNAS 10, and in fact have already done >> so but >> > it=E2=80=99s still early days and we=E2=80=99re not even really su= re what we have >> yet. >> > >> Just fyi..sofar, working with FreeBSD11/head and the port of 3.7.6 >> (the port tarball >> is in FreeBSD PR#194409), the only GlusterFS problem I've encountere= d >> is >> the above one. I'm not sure why this isn't in /usr/ports, but that >> would be >> nice as it might get more people trying it. (I'm a src comitter, but >> not a >> ports one.) >> >> However, I have several patches for the FreeBSD fuse interface and f= or >> a mount_glusterfs mount to work ok you need a couple of them. >> 1 - When an open decides to do DIRECT_IO after the file has done >> buffer >> cache I/O the buffer cache needs to be invalidated so you don't >> get >> stale cached data. >> 2 - For a WRONLY write, you need to force DIRECT_IO (or do a >> read/write open). >> If you don't do this, the buffer cache code will get stuck when >> trying >> to read a block in before writing a partial block. (I think thi= s >> is >> what FreeBSD PR#194293 is caused by.) >> >> Because I won't be able to do svn until April, these patches won't >> make it >> into head for a while, but they will both be in PR#194293 within >> hours. >> >> The others add features like extended attributes, advisory byte >> range locking >> and the changes needed to export the fuse/glusterfs mount via the >> FreeBSD >> kernel nfsd. If anyone wants/needs these patches, email and I can >> send you >> them. >> >> A bit off your topic, but until you have the fixes for FreeBSD fuse, >> you >> probably can't do a lot of serious testing. >> (I don't know, but I'd guess that FreeNAS has about the same fuse >> module >> code as FreeBSD's head, since it hasn't been changed much in head >> recently.) >> >> Thanks everyone for your help with this, rick >> >> > Thanks, >> > >> > - Jordan >> > >> > >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> >> >> >> >> -- >> Raghavendra G >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> >> _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > --=20 Raghavendra G