From owner-freebsd-fs@freebsd.org Tue Jan 19 11:53:33 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4AAE6A87333 for ; Tue, 19 Jan 2016 11:53:33 +0000 (UTC) (envelope-from rgowdapp@redhat.com) Received: from mx4-phx2.redhat.com (mx4-phx2.redhat.com [209.132.183.25]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx1.redhat.com", Issuer "DigiCert SHA2 Extended Validation Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 30572154E for ; Tue, 19 Jan 2016 11:53:33 +0000 (UTC) (envelope-from rgowdapp@redhat.com) Received: from zmail13.collab.prod.int.phx2.redhat.com (zmail13.collab.prod.int.phx2.redhat.com [10.5.83.15]) by mx4-phx2.redhat.com (8.13.8/8.13.8) with ESMTP id u0JBrUQc011680; Tue, 19 Jan 2016 06:53:30 -0500 Date: Tue, 19 Jan 2016 06:53:29 -0500 (EST) From: Raghavendra Gowdappa To: Martin Simmons Cc: rmacklem@uoguelph.ca, jdarcy@redhat.com, raghavendra@gluster.com, gluster-devel@gluster.org, freebsd-fs@freebsd.org, xhernandez@datalab.es, jkh@ixsystems.com Message-ID: <1210026335.11409907.1453204409914.JavaMail.zimbra@redhat.com> In-Reply-To: <201601191005.u0JA57Nl004520@higson.cam.lispworks.com> References: <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <1924941590.6473225.1452248249994.JavaMail.zimbra@redhat.com> <981529129.154244852.1452304799182.JavaMail.zimbra@uoguelph.ca> <1256214214.7158114.1452310490692.JavaMail.zimbra@redhat.com> <1045057902.165261325.1453156629344.JavaMail.zimbra@uoguelph.ca> <7769801.11211464.1453183279015.JavaMail.zimbra@redhat.com> <201601191005.u0JA57Nl004520@higson.cam.lispworks.com> Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_11409905_1746344817.1453204409912" X-Originating-IP: [10.70.1.92] X-Mailer: Zimbra 8.0.6_GA_5922 (ZimbraWebClient - GC25 (Linux)/8.0.6_GA_5922) Thread-Topic: FreeBSD port of GlusterFS racks up a lot of CPU usage Thread-Index: OZOLDNW4oGfXyYSZl51nSYlZYSJouA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jan 2016 11:53:33 -0000 ------=_Part_11409905_1746344817.1453204409912 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit ----- Original Message ----- > From: "Martin Simmons" > To: "Raghavendra Gowdappa" > Cc: rmacklem@uoguelph.ca, jdarcy@redhat.com, raghavendra@gluster.com, gluster-devel@gluster.org, > freebsd-fs@freebsd.org, xhernandez@datalab.es, jkh@ixsystems.com > Sent: Tuesday, January 19, 2016 3:35:07 PM > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage > > >>>>> On Tue, 19 Jan 2016 01:01:19 -0500 (EST), Raghavendra Gowdappa said: > > > > ----- Original Message ----- > > > From: "Rick Macklem" > > > To: "Raghavendra Gowdappa" > > > Cc: "Jeff Darcy" , "Raghavendra G" > > > , "freebsd-fs" > > > , "Hubbard Jordan" , "Xavier > > > Hernandez" , "Gluster > > > Devel" > > > Sent: Tuesday, January 19, 2016 4:07:09 AM > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of > > > CPU usage > > > > > > Raghavendra Gowdappa wrote: > > > > > > > > > > > > ----- Original Message ----- > > > > > From: "Rick Macklem" > > > > > To: "Jeff Darcy" > > > > > Cc: "Raghavendra G" , "freebsd-fs" > > > > > , "Hubbard Jordan" > > > > > , "Xavier Hernandez" , > > > > > "Gluster > > > > > Devel" > > > > > Sent: Saturday, January 9, 2016 7:29:59 AM > > > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot > > > > > of > > > > > CPU usage > > > > > > > > > > Jeff Darcy wrote: > > > > > > > > I don't know anything about gluster's poll implementation so I > > > > > > > > may > > > > > > > > be totally wrong, but would it be possible to use an eventfd > > > > > > > > (or a > > > > > > > > pipe if eventfd is not supported) to signal the need to add > > > > > > > > more > > > > > > > > file descriptors to the poll call ? > > > > > > > > > > > > > > > > > > > > > > > > The poll call should listen on this new fd. When we need to > > > > > > > > change > > > > > > > > the fd list, we should simply write to the eventfd or pipe from > > > > > > > > another thread. This will cause the poll call to return and we > > > > > > > > will > > > > > > > > be able to change the fd list without having a short timeout > > > > > > > > nor > > > > > > > > having to decide on any trade-off. > > > > > > > > > > > > > > > > > > > > > Thats a nice idea. Based on my understanding of why timeouts are > > > > > > > being > > > > > > > used, this approach can work. > > > > > > > > > > > > The own-thread code which preceded the current poll implementation > > > > > > did > > > > > > something similar, using a pipe fd to be woken up for new > > > > > > *outgoing* > > > > > > messages. That code still exists, and might provide some insight > > > > > > into > > > > > > how to do this for the current poll code. > > > > > I took a look at event-poll.c and found something interesting... > > > > > - A pipe called "breaker" is already set up by event_pool_new_poll() > > > > > and > > > > > closed by event_pool_destroy_poll(), however it never gets used for > > > > > anything. > > > > > > > > I did a check on history, but couldn't find any information on why it > > > > was > > > > removed. Can you send this patch to http://review.gluster.org ? We can > > > > review and merge the patch over there. If you are not aware, > > > > development > > > > work flow can be found at: > > > > > > > > http://www.gluster.org/community/documentation/index.php/Developers > > > > > > > Actually, the patch turned out to be a flop. Sometimes a fuse mount would > > > end > > > up with an empty file system with the patch. (I don't know why it was > > > broken, > > > but maybe the original author tan into issues as well?) > > > > +static void > > +event_pool_changed (struct event_pool *event_pool) > > +{ > > + > > + /* Write a byte into the breaker pipe to wake up poll(). */ > > + if (event_pool->breaker[1] >= 0) > > + write(event_pool->breaker[1], "X", 1); > > +} > > > > breaker is set to non-blocking on both read and write ends. So, probably > > write might be failing sometimes with EAGAIN/EBUSY and thereby preventing > > the socket from being registered. Probably that might be the reason? > > In what situations does write return EAGAIN/EBUSY for a pipe? If it is only > when the pipe's internal buffer is full, then there is still unread data and > the reading end will wake up anyway, so there should be no need to write > more. That's a valid point. I was kind of thinking out loud since I couldn't figure out what was wrong in earlier approach. I was particularly interested in whether write can fail because of any error. Since we were not checking the return value, a write failure can result in a socket not being registered. Though I couldn't think of any particular reason for failure, felt that its a loose end that can be tied up. On a different note, I found a bug in current poll implementation which could've resulted in high cpu usage. I've attached the diff, which wouldn't do an array copy (of registered events to ufd) every "timeout" seconds and which I hope which will bring down cpu usage. The issue was event_pool->changed was never reset and hence every timeout seconds, we would interpret that event_pool has changed even though it has not. > > > > if (event_pool->breaker[1] >= 0) { > > do { > > ret = write(event_pool->breaker[1], "X", 1); > > } while (ret != 1); > > } > > This will busy-wait, which doesn't look good. > > > > Also similar logic might be required while flushing out junk from read end > > too. > > Failure to read everything should be benign because it will just cause poll > to > return again immediately next time. > > > > > > > > Anyhow, I am now using the 3.7.6 event-poll.c code except that I have > > > increased > > > the timeout from 1msec->10msec. (Going from 1->5->10 didn't seem to cause > > > a > > > problem, but I got slower test runs when I increased to 20msec, so I've > > > settled on > > > 10mses. This does reduce the CPU usage when the GlusterFS file systems > > > aren't > > > active.) > > > I will submit this one line change to your workflow if it continues to > > > test > > > ok. > > > > > > Thanks for everyone's input, rick > > > > > > > > > > > > > So, I added a few lines of code that writes a byte to it whenever the > > > > > list > > > > > of > > > > > file descriptors is changed and read when poll() returns, if its > > > > > revents > > > > > is > > > > > set. > > > > > I also changed the timeout to -1 (infinity) and it seems to work for > > > > > a > > > > > trivial > > > > > test. > > > > > --> Btw, I also noticed the "changed" variable gets set to 1 on a > > > > > change, > > > > > but > > > > > never reset to 0. I didn't change this, since it looks "racey". > > > > > (ie. > > > > > I > > > > > think you could easily get a race between a thread that clears it > > > > > and > > > > > one > > > > > that adds a new fd.) > > > > > > > > > > A slightly safer version of the patch would set a long (100msec ??) > > > > > timeout > > > > > instead > > > > > of -1. > > > > > > > > > > Anyhow, I've attached the patch in case anyone would like to try it > > > > > and > > > > > will > > > > > create a bug report for this after I've had more time to test it. > > > > > (I only use a couple of laptops, so my testing will be minimal.) > > > > > > > > > > Thanks for all the help, rick > > > > > > > > > > > _______________________________________________ > > > > > > freebsd-fs@freebsd.org mailing list > > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > > > To unsubscribe, send any mail to > > > > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > ------=_Part_11409905_1746344817.1453204409912 Content-Type: text/x-patch; name=poll.diff Content-Disposition: attachment; filename=poll.diff Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2xpYmdsdXN0ZXJmcy9zcmMvZXZlbnQtcG9sbC5jIGIvbGliZ2x1c3RlcmZz L3NyYy9ldmVudC1wb2xsLmMKaW5kZXggMDA4ZGExMC4uMzc0ZDNjZSAxMDA2NDQKLS0tIGEvbGli Z2x1c3RlcmZzL3NyYy9ldmVudC1wb2xsLmMKKysrIGIvbGliZ2x1c3RlcmZzL3NyYy9ldmVudC1w b2xsLmMKQEAgLTQzMiw2ICs0MzIsOCBAQCBldmVudF9kaXNwYXRjaF9wb2xsX3Jlc2l6ZSAoc3Ry dWN0IGV2ZW50X3Bvb2wgKmV2ZW50X3Bvb2wsCiAgICAgICAgICAgICAgICAgICAgICAgICB1ZmRz W2ldLnJldmVudHMgPSAwOwogICAgICAgICAgICAgICAgIH0KIAorICAgICAgICAgICAgICAgIGV2 ZW50X3Bvb2wtPmNoYW5nZWQgPSAwOworCiAgICAgICAgICAgICAgICAgc2l6ZSA9IGk7CiAgICAg ICAgIH0KIHVubG9jazoK ------=_Part_11409905_1746344817.1453204409912--