From owner-freebsd-java@FreeBSD.ORG Mon Jun 16 14:45:16 2014 Return-Path: Delivered-To: freebsd-java@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 20C0D8AB for ; Mon, 16 Jun 2014 14:45:16 +0000 (UTC) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D5A572878 for ; Mon, 16 Jun 2014 14:45:14 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1WwY9h-0006ir-RB for freebsd-java@freebsd.org; Mon, 16 Jun 2014 16:45:06 +0200 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: freebsd-java@freebsd.org Subject: Re: JVM BUG(s) - Hadoop's threads hanging References: Date: Mon, 16 Jun 2014 16:45:04 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.17 (Win32) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 049f1f45ba4289533fd3e2bb72fd36da X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jun 2014 14:45:16 -0000 On Mon, 16 Jun 2014 14:44:10 +0200, Dmitry Sivachenko wrote: > Hello! > > I set up hadoop-1.2.1 (from ports) on FreeBSD-10/stable with openjdk > version 1.7.0_60. > > On the first glance it is doing well except one annoying thing: after > executing some tasks, tasktracker process starts to eat CPU when idle. > Sometimes it is 10-20% (numbers from top(1) output), sometimes it is > 100-150%. > > In tasktrackers's log I see numerious records like this: > > 2014-06-09 13:08:29,858 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > injecting delay59 times > 2014-06-09 13:08:29,859 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > recreating selector 59 times, canceled keys 944 times > 2014-06-09 13:09:29,862 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > injecting delay58 times > 2014-06-09 13:09:29,862 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > recreating selector 58 times, canceled keys 928 times > 2014-06-09 13:10:29,901 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > injecting delay58 times > 2014-06-09 13:10:29,901 INFO org.mortbay.log: > org.mortbay.io.nio.SelectorManager$SelectSet@abdcc1c JVM BUG(s) - > recreating selector 58 times, canceled keys 928 times > <...> > > > The more jobs I run, more java threads start to consume CPU after all > tasks finished. After several job execution, top(1) output looks like > this (splitted by thread, the same PID): > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 79045 hadoop 47 0 1948M 867M uwait 2 20:49 37.50% > java{java} > 79045 hadoop 31 0 1948M 867M uwait 31 1:45 19.29% > java{java} > 79045 hadoop 33 0 1948M 867M uwait 21 2:51 19.19% > java{java} > 79045 hadoop 30 0 1948M 867M uwait 17 2:51 18.65% > java{java} > 79045 hadoop 30 0 1948M 867M uwait 11 1:52 18.36% > java{java} > 79045 hadoop 30 0 1948M 867M uwait 22 1:45 18.36% > java{java} > 79045 hadoop 31 0 1948M 867M uwait 29 2:50 18.26% > java{java} > 79045 hadoop 31 0 1948M 867M uwait 6 1:57 18.16% > java{java} > 79045 hadoop 31 0 1948M 867M uwait 13 4:55 17.97% > java{java} > 79045 hadoop 31 0 1948M 867M uwait 26 3:39 17.77% > java{java} > 79045 hadoop 33 0 1948M 867M uwait 8 1:21 17.48% > java{java} > 79045 hadoop 30 0 1948M 867M uwait 1 3:32 16.70% > java{java} > 79045 hadoop 32 0 1948M 867M uwait 24 3:12 16.70% > java{java} > 79045 hadoop 26 0 1948M 867M uwait 4 1:27 10.35% > java{java} > 72417 root 20 0 19828K 3252K CPU21 21 0:00 0.29% top > 836 root 20 0 36104K 1952K select 14 6:51 0.00% snmpd > 79045 hadoop 20 0 1948M 867M uwait 20 6:51 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 27 3:45 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 30 2:37 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 15 0:54 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 2 0:48 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 14 0:48 0.00% > java{java} > 79045 hadoop 20 0 1948M 867M uwait 2 0:48 0.00% > java{java} > <....> > > This is on absolutely idle cluster, no single task is running. > I am attaching truss(1) output for that java process. > > I asked that on user@hadoop.apache.org without any luck. > > Any clues? (I can provide any additional information if needed) Hi, From your information it is hard to say something about it. The bug can be in FreeBSD, OpenJDK (the Oracle part or in the BSD port part), in Hadoop or in your own code running on top of Hadoop. My first idea would be to eliminate some of the possibilities. - Run a Linux machine with the same versions of the software. - Try FreeBSD 9-stable. - Try an older version of OpenJDK on FreeBSD. - Try a very simple 'Hello world' style application on Hadoop which mimics the thread usage. Did you ever run your Hadoop application on FreeBSD before without this symptom? If so, what are the differences between then and now? Another possibly valuable piece of information is what those threads are doing. Using jstack you can very easily get a dump of all stacks of all threads in Java. Regards, Ronald.