From owner-freebsd-fs@freebsd.org Wed Jan 20 04:42:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A0D6A878BC for ; Wed, 20 Jan 2016 04:42:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 203191C64 for ; Wed, 20 Jan 2016 04:42:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:h8cbJhymIaOupFvXCy+O+j09IxM/srCxBDY+r6Qd0O4TIJqq85mqBkHD//Il1AaPBtWFrasc06GJ7ujJYi8p39WoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6kO74TNaIBjjLw09fr2zQd6MyZ3vj6vtptX6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD89pozcNLUL37cqIkVvQYSW1+ayFmrPDtrgTJGAuT+mMHACJRlhtTHxOD4gv3U53qvm39rOU63SCbOcj/S/cwWC++7qFlT1jmkioKPSU1tW/M2fB32ZhSrxKouRF5z5TdKK6SK/Z3ZrvUNYcASm1eUs9JTwRbD4+8ZpdJBO0Ea7V2tY748mEPphj2IACnB+fiz3ccnHr/1q4+3uEJDAbJwQEkB9JIu32C/4a9D7sbTe3glPqA9j7Edf4DnG6lsIU= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQA9D59W/61jaINEGoQMbQaIUbJyAQ2BYxgKhSNKAoF/FAEBAQEBAQEBgQmCLYIHAQEBAwEBAQEgBCAHIAsFBwQCAQgOAwQBAQECAg0ZAgInAQkeCAIECAcEAQgLCQSHcggOLK9dhHaKUAEBAQEBAQEBAQEBAQEBAQEBAQEBARV7gSKEHYR0hDIBAUmCa4FJBYdjhlY9iCSFSIUrhEpKhyWFNIpsg24CIAEBQoIRG4F7IDQBBoVpOoEIAQEB X-IronPort-AV: E=Sophos;i="5.22,319,1449550800"; d="scan'208";a="262129284" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 19 Jan 2016 23:42:38 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id AE19815F55D; Tue, 19 Jan 2016 23:42:38 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id BemTlmwyJkM0; Tue, 19 Jan 2016 23:42:37 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 1855615F565; Tue, 19 Jan 2016 23:42:37 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MrkxAnBGVeD7; Tue, 19 Jan 2016 23:42:36 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id E77A715F55D; Tue, 19 Jan 2016 23:42:36 -0500 (EST) Date: Tue, 19 Jan 2016 23:42:36 -0500 (EST) From: Rick Macklem To: Raghavendra Gowdappa Cc: Martin Simmons , jdarcy@redhat.com, raghavendra@gluster.com, gluster-devel@gluster.org, freebsd-fs@freebsd.org, xhernandez@datalab.es, jkh@ixsystems.com Message-ID: <456488053.167003584.1453264956923.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <1210026335.11409907.1453204409914.JavaMail.zimbra@redhat.com> References: <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <1924941590.6473225.1452248249994.JavaMail.zimbra@redhat.com> <981529129.154244852.1452304799182.JavaMail.zimbra@uoguelph.ca> <1256214214.7158114.1452310490692.JavaMail.zimbra@redhat.com> <1045057902.165261325.1453156629344.JavaMail.zimbra@uoguelph.ca> <7769801.11211464.1453183279015.JavaMail.zimbra@redhat.com> <201601191005.u0JA57Nl004520@higson.cam.lispworks.com> <1210026335.11409907.1453204409914.JavaMail.zimbra@redhat.com> Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF43 (Win)/8.0.9_GA_6191) Thread-Topic: FreeBSD port of GlusterFS racks up a lot of CPU usage Thread-Index: OZOLDNW4oGfXyYSZl51nSYlZYSJouJYp5+QD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jan 2016 04:42:41 -0000 Raghavendra Gowdappa wrote: > > > ----- Original Message ----- > > From: "Martin Simmons" > > To: "Raghavendra Gowdappa" > > Cc: rmacklem@uoguelph.ca, jdarcy@redhat.com, raghavendra@gluster.com, > > gluster-devel@gluster.org, > > freebsd-fs@freebsd.org, xhernandez@datalab.es, jkh@ixsystems.com > > Sent: Tuesday, January 19, 2016 3:35:07 PM > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of > > CPU usage > > > > >>>>> On Tue, 19 Jan 2016 01:01:19 -0500 (EST), Raghavendra Gowdappa said: > > > > > > ----- Original Message ----- > > > > From: "Rick Macklem" > > > > To: "Raghavendra Gowdappa" > > > > Cc: "Jeff Darcy" , "Raghavendra G" > > > > , "freebsd-fs" > > > > , "Hubbard Jordan" , "Xavier > > > > Hernandez" , "Gluster > > > > Devel" > > > > Sent: Tuesday, January 19, 2016 4:07:09 AM > > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot > > > > of > > > > CPU usage > > > > > > > > Raghavendra Gowdappa wrote: > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Rick Macklem" > > > > > > To: "Jeff Darcy" > > > > > > Cc: "Raghavendra G" , "freebsd-fs" > > > > > > , "Hubbard Jordan" > > > > > > , "Xavier Hernandez" , > > > > > > "Gluster > > > > > > Devel" > > > > > > Sent: Saturday, January 9, 2016 7:29:59 AM > > > > > > Subject: Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a > > > > > > lot > > > > > > of > > > > > > CPU usage > > > > > > > > > > > > Jeff Darcy wrote: > > > > > > > > > I don't know anything about gluster's poll implementation so > > > > > > > > > I > > > > > > > > > may > > > > > > > > > be totally wrong, but would it be possible to use an eventfd > > > > > > > > > (or a > > > > > > > > > pipe if eventfd is not supported) to signal the need to add > > > > > > > > > more > > > > > > > > > file descriptors to the poll call ? > > > > > > > > > > > > > > > > > > > > > > > > > > > The poll call should listen on this new fd. When we need to > > > > > > > > > change > > > > > > > > > the fd list, we should simply write to the eventfd or pipe > > > > > > > > > from > > > > > > > > > another thread. This will cause the poll call to return and > > > > > > > > > we > > > > > > > > > will > > > > > > > > > be able to change the fd list without having a short timeout > > > > > > > > > nor > > > > > > > > > having to decide on any trade-off. > > > > > > > > > > > > > > > > > > > > > > > > Thats a nice idea. Based on my understanding of why timeouts > > > > > > > > are > > > > > > > > being > > > > > > > > used, this approach can work. > > > > > > > > > > > > > > The own-thread code which preceded the current poll > > > > > > > implementation > > > > > > > did > > > > > > > something similar, using a pipe fd to be woken up for new > > > > > > > *outgoing* > > > > > > > messages. That code still exists, and might provide some insight > > > > > > > into > > > > > > > how to do this for the current poll code. > > > > > > I took a look at event-poll.c and found something interesting... > > > > > > - A pipe called "breaker" is already set up by > > > > > > event_pool_new_poll() > > > > > > and > > > > > > closed by event_pool_destroy_poll(), however it never gets used > > > > > > for > > > > > > anything. > > > > > > > > > > I did a check on history, but couldn't find any information on why it > > > > > was > > > > > removed. Can you send this patch to http://review.gluster.org ? We > > > > > can > > > > > review and merge the patch over there. If you are not aware, > > > > > development > > > > > work flow can be found at: > > > > > > > > > > http://www.gluster.org/community/documentation/index.php/Developers > > > > > > > > > Actually, the patch turned out to be a flop. Sometimes a fuse mount > > > > would > > > > end > > > > up with an empty file system with the patch. (I don't know why it was > > > > broken, > > > > but maybe the original author tan into issues as well?) > > > > > > +static void > > > +event_pool_changed (struct event_pool *event_pool) > > > +{ > > > + > > > + /* Write a byte into the breaker pipe to wake up poll(). */ > > > + if (event_pool->breaker[1] >= 0) > > > + write(event_pool->breaker[1], "X", 1); > > > +} > > > > > > breaker is set to non-blocking on both read and write ends. So, probably > > > write might be failing sometimes with EAGAIN/EBUSY and thereby preventing > > > the socket from being registered. Probably that might be the reason? > > > > In what situations does write return EAGAIN/EBUSY for a pipe? If it is > > only > > when the pipe's internal buffer is full, then there is still unread data > > and > > the reading end will wake up anyway, so there should be no need to write > > more. > > That's a valid point. I was kind of thinking out loud since I couldn't figure > out what was wrong in earlier approach. I was particularly interested in > whether write can fail because of any error. Since we were not checking the > return value, a write failure can result in a socket not being registered. > Though I couldn't think of any particular reason for failure, felt that its > a loose end that can be tied up. > > On a different note, I found a bug in current poll implementation which > could've resulted in high cpu usage. I've attached the diff, which wouldn't > do an array copy (of registered events to ufd) every "timeout" seconds and > which I hope which will bring down cpu usage. The issue was > event_pool->changed was never reset and hence every timeout seconds, we > would interpret that event_pool has changed even though it has not. > > > > > > > > if (event_pool->breaker[1] >= 0) { > > > do { > > > ret = write(event_pool->breaker[1], "X", 1); > > > } while (ret != 1); > > > } > > > > This will busy-wait, which doesn't look good. > > Sorry. I should have looked before I replied before. In FreeBSD, a pipe write() will fail with errno EAGAIN if the pipe is full. (For that case, I don't think a retry would be necessary. It does look like there is a very slight possibility that the write() could fail with errno EINTR when a signal is posted to the thread/process. (I have no idea if this will ever happen in GlusterFS, but looping while the write() fails with errno EINTR would probably be more correct than no loop. When I looked at this I found that I had broken the read() on the pipe when I added some logging code (not in the patch I posted) for the read() failed with EAGAIN (ie. no data to be read) case. This probably explains why the patch failed. I will test a fixed up patch someday and if it seems to work fine for a fair amount of testing, I will submit it for review. As noted before, it isn't a serious problem, so I don't think there is any rush on this. Sorry for the confusion I caused, rick > > > > > Also similar logic might be required while flushing out junk from read > > > end > > > too. > > > > Failure to read everything should be benign because it will just cause poll > > to > > return again immediately next time. > > > > > > > > > > > > Anyhow, I am now using the 3.7.6 event-poll.c code except that I have > > > > increased > > > > the timeout from 1msec->10msec. (Going from 1->5->10 didn't seem to > > > > cause > > > > a > > > > problem, but I got slower test runs when I increased to 20msec, so I've > > > > settled on > > > > 10mses. This does reduce the CPU usage when the GlusterFS file systems > > > > aren't > > > > active.) > > > > I will submit this one line change to your workflow if it continues to > > > > test > > > > ok. > > > > > > > > Thanks for everyone's input, rick > > > > > > > > > > > > > > > > So, I added a few lines of code that writes a byte to it whenever > > > > > > the > > > > > > list > > > > > > of > > > > > > file descriptors is changed and read when poll() returns, if its > > > > > > revents > > > > > > is > > > > > > set. > > > > > > I also changed the timeout to -1 (infinity) and it seems to work > > > > > > for > > > > > > a > > > > > > trivial > > > > > > test. > > > > > > --> Btw, I also noticed the "changed" variable gets set to 1 on a > > > > > > change, > > > > > > but > > > > > > never reset to 0. I didn't change this, since it looks "racey". > > > > > > (ie. > > > > > > I > > > > > > think you could easily get a race between a thread that clears > > > > > > it > > > > > > and > > > > > > one > > > > > > that adds a new fd.) > > > > > > > > > > > > A slightly safer version of the patch would set a long (100msec ??) > > > > > > timeout > > > > > > instead > > > > > > of -1. > > > > > > > > > > > > Anyhow, I've attached the patch in case anyone would like to try it > > > > > > and > > > > > > will > > > > > > create a bug report for this after I've had more time to test it. > > > > > > (I only use a couple of laptops, so my testing will be minimal.) > > > > > > > > > > > > Thanks for all the help, rick > > > > > > > > > > > > > _______________________________________________ > > > > > > > freebsd-fs@freebsd.org mailing list > > > > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > > > > To unsubscribe, send any mail to > > > > > > > "freebsd-fs-unsubscribe@freebsd.org" > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > > >